AI Governance & Safety April 11, 2026 · 12 min read

Privacy-Preserving Computation: Encrypted Processing, Federated Learning, and the Explainability Paradox

Part 6 showed Meridian rejecting federated learning (single-tenant architecture) and deferring homomorphic encryption (47x latency). This article explains the mechanics behind those decisions, introduces secure multi-party computation, and reveals the tension between GDPR's explainability mandate and privacy protection. Concludes with a capstone PET decision framework spanning Parts 7 through 9.

By Vikas Pratap Singh

#data-governance #data-privacy #data-protection #privacy-engineering #federated-learning #homomorphic-encryption #ai-governance

Executive Briefing

What this covers: Homomorphic encryption (computing on encrypted data), federated learning (training models without centralizing data), secure multi-party computation (joint analysis across organizations), and the SHAP/LIME explainability-privacy tension created by GDPR Article 22 and EU AI Act Article 86. Part 6 showed Meridian deferring HE (47x latency) and rejecting FL (single-tenant architecture). This article explains the mechanics behind those decisions and closes with a capstone PET decision framework spanning Parts 7 through 9.
Who should read it: Privacy Architects, ML platform engineers, and Data Governance leads evaluating advanced privacy-preserving technologies for Restricted and Highly Confidential data.
Key takeaway: SHAP explanations, mandated by regulation, can leak training data. Membership inference attacks using SHAP values can determine whether a specific person's data was in the training set. The new frontier of privacy engineering is building systems that are simultaneously private, useful, and explainable.

Data Privacy Guide: Overview | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8 | Part 9 | Part 10

This is Part 9 of a three-part advanced series on Privacy-Enhancing Technologies. Part 7 covered operational techniques (masking, tokenization, k-anonymity). Part 8 covered mathematical privacy guarantees (differential privacy, synthetic data). Part 9 covers privacy-preserving computation and the explainability-privacy tension, then closes with a capstone PET decision framework.

This is the most technically demanding part of the series. It covers four distinct domains: homomorphic encryption, federated learning, secure multi-party computation, and the explainability-privacy paradox. The practical takeaway for each can be understood without the mathematical details. The capstone decision table at the end synthesizes all three PET articles into a single framework.

The Limitation of Modifying Data

Part 7 masked, tokenized, and generalized data. Part 8 added noise and generated synthetic replacements. Both approaches share a fundamental limitation: they modify or degrade the original data. Masking destroys information. Differential privacy adds noise that reduces accuracy. Synthetic data approximates reality but never fully replaces it.

Part 9 introduces a different paradigm. Instead of modifying data to preserve privacy, these techniques change where and how computation happens. The data stays in its original form. The computation adapts.

Homomorphic Encryption: Computing on What You Cannot See

Homomorphic encryption (HE) allows computation on encrypted data without decrypting it. Encrypt your data, send the ciphertext to a server, let the server compute on the ciphertext, get back an encrypted result, decrypt it, and the answer is correct. The server never saw the plaintext.

The intuition: Imagine putting your tax forms into a locked box. An accountant reaches in through special gloves, performs all the calculations, and writes the result on a slip of paper that comes out of a slot. The accountant never sees your actual numbers, but the tax return is correct.

Three levels of HE exist, each with different computational power:

Type	Operations	Practical Use
Partially Homomorphic (PHE)	One type only: addition OR multiplication	Encrypted voting, simple aggregations
Somewhat Homomorphic (SHE)	Both, but limited depth before noise corrupts results	Password checking, caller ID lookup
Fully Homomorphic (FHE)	Arbitrary computation of unbounded depth	Encrypted ML inference, complex analytics

The “noise problem” is the core engineering challenge. Every HE ciphertext contains noise that grows with each operation, especially multiplication. If noise exceeds a threshold, decryption produces garbage. FHE schemes solve this with “bootstrapping”: periodically cleaning the accumulated noise by re-encrypting the result. Think of it as photocopying a photocopy. Each copy degrades. Bootstrapping is like going back to scan the original, except the scheme does this without ever decrypting, which is why it is computationally expensive. Current FHE computations are 10^3 to 10^6 times slower than plaintext equivalents. In practical terms: a database query that returns in 10 milliseconds on unencrypted data takes 10 seconds under a basic HE scheme, and could take minutes to hours under full FHE for complex operations.

Who Ships HE Today

Apple integrated the BFV (Brakerski-Fan-Vercauteren) homomorphic encryption scheme, optimized for integer arithmetic, into iOS 18+ via a Swift package. Four production use cases: Live Caller ID Lookup (checking phone numbers against a spam database without revealing which number you are calling), Enhanced Visual Search (identifying landmarks in photos without sending the photo content to Apple’s servers), Mail app business logo retrieval, and parental communication safety (classifying URLs on-device without Apple seeing the content).

Microsoft uses HE in Edge Password Monitor: your browser checks whether your passwords have appeared in known breaches without sending the actual passwords to Microsoft’s servers or revealing which breach entries were checked.

CryptoLab partnered with Macrogen (South Korea) to provide encrypted genomic analysis. Patients submit encrypted DNA data. The analysis runs on the ciphertext. Results are returned encrypted. The compute provider never sees the genomic sequence.

For practitioners: HE is production-ready for specific, well-defined computations (lookups, aggregations, simple ML inference). It is not yet practical for arbitrary complex analytics. Apple’s approach is instructive: use HE for the specific operation that needs privacy (the lookup), not for the entire pipeline.

Federated Learning: Training Without Centralizing

Federated learning (FL) trains ML models across distributed data sources without moving the data to a central location. The model travels to the data, not the other way around.

How it works: A central server sends the current model to participating devices or institutions. Each participant trains on their local data and sends back only the model updates (gradients or weight differences). The server aggregates updates into a new global model. Repeat.

Google’s Gboard is the largest known FL deployment. Google trains 30+ on-device language models for next-word prediction across 7+ languages and 15+ countries, all with formal differential privacy guarantees. The strongest models achieve epsilon = 0.994, the first time a sub-1 epsilon guarantee was achieved for models trained directly on user data. The key innovation: DP-FTRL (Follow-the-Regularized-Leader), a training method that achieves differential privacy guarantees without requiring the server to randomly select which devices participate in each round, a constraint that was impractical at Google’s scale.

Healthcare: NVIDIA’s EXAM study (Nature Medicine, 2021) trained a federated model across 20 hospitals on five continents to predict oxygen needs of COVID-19 patients. The federated model outperformed models trained at any single hospital. The Cancer AI Alliance, launched October 2024 with over $40M in funding, uses FL across four NCI-designated cancer centers.

FL is not automatically private. Model updates can leak information about individual training examples through gradient inversion attacks: by analyzing the gradients a participant sends back, an attacker can reconstruct approximations of the actual training data, including images and text. The standard defense is combining FL with differential privacy: clip each participant’s gradient contribution and add Gaussian noise before aggregation. Google’s Gboard deployment does exactly this.

Secure multi-party computation (SMPC) lets multiple organizations jointly compute a function over their combined data while each party sees only the final result. Nobody shares raw data. Nobody trusts anybody. The math guarantees that no party learns anything beyond what can be inferred from the output.

Two main approaches: Garbled circuits (for two parties: imagine Alice has a salary and Bob has a salary, and they want to know who earns more without revealing their numbers. Alice encodes the comparison as a scrambled logic puzzle, Bob runs his encrypted input through it, and only the answer comes out. Neither learns the other’s salary.) and secret sharing (for multiple parties: each party splits their value into random shares distributed to all parties, who compute on shares locally and recombine for the result). Shamir’s Secret Sharing (1979) guarantees that fewer than a threshold number of shares reveal zero information, even to computationally unbounded adversaries. The intuition: imagine splitting a secret number into 5 puzzle pieces, distributed to 5 parties. Any 3 pieces can reconstruct the secret, but 2 pieces reveal absolutely nothing, not even a hint. The parties compute on their pieces independently and combine results at the end.

The Boston Wage Gap Study

The Boston Women’s Workforce Council has used SMPC six times (2015-2023) to measure the city-wide gender and racial wage gap. Employers submit encrypted payroll data through MPC-backed software (built by Boston University’s Hariri Institute). The system aggregates salaries by gender, race, job category, and tenure. No individual employer or employee salary is ever visible to the Council or to other employers.

The result: the gender wage gap in Greater Boston declined by 30% over the measurement period. SMPC made this analysis possible because employers would never have shared raw salary data with competitors.

Other Production Deployments

Dutch cross-bank AML (MPC4AML): TNO and banks ABN AMRO and Rabobank are developing SMPC-based anti-money laundering detection across transaction networks. Each bank’s data stays within its own perimeter. The protocol computes risk scores across the combined network.

Partisia biometric matching (Japan, 2025): A privacy-preserving student ID system at the Okinawa Institute of Science and Technology. Facial biometrics are stored encrypted and matched without ever being decrypted. “Neither Partisia nor the company that runs the structure has the full biometric information.”

For practitioners: SMPC is a procurement decision, not a build decision. You will not implement garbled circuits or secret sharing protocols. You will evaluate whether a vendor (Partisia, Inpher) or academic partner can run your joint analysis. The question to ask: does the use case justify the coordination overhead of getting multiple parties to agree on protocols, data formats, and computation logic?

The Explainability-Privacy Paradox

This is where Data Privacy engineering gets genuinely difficult.

GDPR Article 22 requires “meaningful information about the logic involved” in automated decisions. The EU AI Act (Article 86, compliance deadline August 2, 2026) mandates “clear and meaningful explanations” for high-risk AI systems. The SCHUFA ruling (CJEU Case C-634/21) held that automated credit scores constitute automated decision-making where the score significantly shapes contractual outcomes, entitling individuals to explanation and the right to contest decisions.

Penalties under the EU AI Act (Article 99) range from EUR 7.5 million (1% of turnover) for supplying incorrect information, to EUR 15 million (3%) for obligations including transparency, to EUR 35 million (7%) for deploying prohibited AI practices.

What SHAP and LIME Actually Do

Before explaining the attack, it helps to understand what these tools produce.

SHAP and LIME are the two dominant methods for explaining individual model predictions.

SHAP borrows an idea from cooperative game theory called Shapley values. The analogy: imagine four coworkers contribute to a project that earns a bonus. To fairly divide the bonus, you would test every possible team combination and measure how much each person’s addition improved the outcome. SHAP does the same for model features: for each input (age, income, ZIP code), it measures how much that feature pushed the prediction up or down by testing every possible combination of features. The result is a score per feature that explains why the model made a specific decision.

LIME takes a different approach. It creates a simple, interpretable approximation of the model’s behavior around a single prediction. Feed the model slightly modified versions of one input, observe how the predictions change, and fit a simple linear model to those changes. The linear model’s coefficients become the explanation.

Both produce feature importance scores: “your loan was denied primarily because of your debt-to-income ratio (40% importance) and credit history length (25% importance).”

The Problem: Explanations Leak Training Data

Here is the core tension. The explanations that regulators require can be reverse-engineered to extract private information about the people whose data trained the model.

Research demonstrates three specific attack vectors:

Membership inference: Can an attacker determine whether YOUR data was used to train this model? The answer matters: if yes, and the model was trained on medical records, the attacker just learned you have a relationship with that medical provider. The mechanism: models behave differently on data they were trained on versus data they have never seen. SHAP values capture this difference. An attacker builds a copy of the model (a “shadow model”), compares SHAP patterns between known training data and known non-training data, and uses the difference to classify whether a target individual was in the original training set.

Reconstruction attacks: Feature importance scores can be inverted to approximate original training data distributions.

The granularity paradox: More detailed, faithful explanations inherently leak more information. Detailed explanations, while more informative for the user, “offer direct inferences about individual data points.” The very act of complying with GDPR’s transparency mandate can violate its privacy mandate.

For practitioners: If you are deploying a model trained on Restricted-tier data AND providing SHAP or LIME explanations, you have an active privacy risk today. The immediate action: assess whether your explanation granularity could enable membership inference, especially for models trained on small, identifiable populations such as employee data, patient cohorts, or internal customer segments.

Based on their computational approaches, SHAP appears more susceptible than LIME to membership inference. LIME’s perturbation-based approach queries points outside the training distribution, which provides some inherent privacy protection. SHAP’s computation using the actual training data distribution creates a more direct leakage path.

The agent-era restatement: When regulation says “explain your model” and privacy law says “protect training data,” you have a genuine conflict. The resolution is not to pick one over the other. It is to use techniques that satisfy both at reduced fidelity: DP-SHAP (adding calibrated noise to Shapley values), SHAP entropy regularization (encouraging more uniform feature importance distributions), Federated SHAP (2026, computing explanations without sharing raw data), or inherently interpretable models (logistic regression, shallow decision trees) where the model structure is the explanation.

The Capstone: Classification Drives PET Selection

This is the mapping framework that connects all three parts. Classification is the input. PET selection is the output.

Classification	Data at Rest	Analytics	External Sharing	ML Training	ML Explanation
Public	Standard encryption	No PET needed	Standard TLS	Standard training	Full SHAP/LIME
Internal	Standard encryption	Column masking (Part 7)	Access controls	Standard with access controls	Full SHAP/LIME
Confidential	Tokenization (Part 7)	K-anonymity (Part 7), DP (Part 8)	Synthetic data (Part 8)	DP-SGD (Part 8), FL for cross-org (Part 9)	SHAP/LIME with DP noise
Restricted	Tokenization + HSM (Part 7)	HE (Part 9), SMPC (Part 9)	SMPC or synthetic only (Part 8)	FL + DP (Part 9)	DP-SHAP or interpretable models only

The LINDDUN privacy threat model provides a structured methodology for identifying which privacy threats (such as linking, re-identification, and data disclosure) apply at each classification level. For each identified threat, the mapping table above points to the appropriate PET family.

NIST Privacy Framework 1.1 (April 2025 draft) now includes an AI-specific section addressing privacy risks from AI and personal data interaction. PET selection should flow from organizational governance decisions, not ad hoc technical choices.

What to Remember From This Series

Three insights from Parts 7 through 9 that change how practitioners think about Data Privacy:

From Part 7: 87% of the U.S. population can be uniquely identified from just three attributes: ZIP code, date of birth, and sex. Removing names is not anonymization. Classification tiers exist because data is sensitive in ways that are not obvious from looking at individual columns.

From Part 8: Epsilon is meaningless without context. A company can truthfully claim epsilon = 2 per use case while the total daily privacy loss is 16. When evaluating differential privacy claims, always ask: what is the unit of privacy, and what is the total budget across all use cases?

From Part 9: The regulation that demands model explanations (GDPR, EU AI Act) and the regulation that demands privacy (GDPR) are in direct tension. SHAP explanations can leak training data. Building systems that are simultaneously private, useful, and explainable is the defining challenge of the next generation of Data Privacy engineering.

Priority	Action	Why It Matters
Immediate	Map every Restricted-tier data asset to a specific PET from the capstone table	Classification without PET selection is a label without enforcement
This quarter	Evaluate SHAP/LIME usage on any model trained on Restricted data; assess membership inference risk	EU AI Act compliance deadline is August 2, 2026
Next quarter	Pilot federated learning for one cross-organizational ML use case	FL + DP eliminates the data centralization bottleneck
Ongoing	Track the PET maturity spectrum: masking and tokenization are production-proven; HE and SMPC are entering production; privacy-preserving explanations are research frontier	Invest in production-ready techniques now; budget for frontier techniques in 12-18 months

This concludes the PET deep-dive series. Return to the Data Privacy Guide Overview for the full series index.