Privacy in Practice: Diagnosing the Gaps and Building the Foundation
A fictitious B2B SaaS company receives a DPIA request it cannot answer. This walkthrough applies the privacy framework from Part 3 to build Data Classification, retention schedules, consent architecture, and sub-processor transparency from scratch.
Data Privacy Guide: Overview | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8 | Part 9 | Part 10
The DPIA Request That Started Everything
On a Tuesday in January, Meridian Analytics’ Head of Privacy, Sarah Chen, received a 47-question Data Protection Impact Assessment questionnaire from Allianz’s procurement team. Allianz was one of Meridian’s largest EU clients, representing roughly 8% of annual recurring revenue. They were also one of the first clients to start using Meridian Copilot, the AI assistant Meridian had shipped four months earlier.
Meridian is a B2B SaaS analytics platform. About 500 employees, Series C funded, 2,000 enterprise clients across the US and EU. Forty percent of its revenue comes from EU customers. The core product is business intelligence dashboards for enterprise clients. Meridian Copilot lets users type natural language questions (“Show me Q4 revenue by region for the insurance vertical”) and receive AI-generated answers drawn from their own data.
The Allianz DPIA was not unusual. Under GDPR Article 35, controllers must assess risks when processing is “likely to result in a high risk to the rights and freedoms of natural persons.” Large enterprises routinely require their vendors to complete DPIA questionnaires as part of procurement and renewal. Meridian had answered them before, for the core dashboard product. But this questionnaire was different. It was scoped specifically to Meridian Copilot.
Sarah started filling in answers. By question twelve, she stopped.
Here are five of the questions she could not answer with specifics:
Question 9: “Describe the data flows for the Meridian Copilot feature. Where does user query data travel from the point of input to the generation of a response? List all systems, services, and third parties involved.”
Sarah knew the query went to an LLM provider. She knew there was a vector database involved. She did not know whether Allianz’s query data was logged by the LLM provider, whether it was stored separately from other clients’ data, or whether the vector embeddings were retained after the response was generated. Engineering had built Copilot in a sprint. The architecture documentation covered functionality, not data flows.
Question 14: “What is the legal basis for processing personal data through the AI assistant feature? If legitimate interest, provide the balancing test documentation.”
Meridian’s Terms of Service included a clause granting Meridian the right to “use customer data to provide and improve the Service.” Sarah knew this was too vague to serve as a legal basis under GDPR. There was no separate consent mechanism for Copilot. There was no documented balancing test. The legal team had not updated the ToS when Copilot launched.
Question 23: “List all sub-processors that receive or process personal data in connection with the AI assistant. For each, state the data received, processing purpose, location of processing, and applicable transfer mechanism.”
Meridian’s privacy policy mentioned “third-party service providers” generically. Sarah could name AWS as the infrastructure provider. Beyond that, she needed to call engineering to find out which LLM provider they used, whether the vector database was self-hosted or a managed service, and whether any data flowed to the monitoring tools used for Copilot’s performance dashboards.
Question 31: “What is the retention period for user queries submitted to the AI assistant? For model training data, if applicable? For inference logs? Provide the specific policy and rationale for each.”
Meridian’s retention policy said: “We retain your data as long as necessary to fulfill the purposes for which it was collected.” This is the kind of vague retention language the Dutch DPA penalized Netflix EUR 4.75 million for, as part of broader transparency failures. Sarah could not provide a specific retention period for any AI-related data category because none existed.
Question 38: “Has a Data Protection Impact Assessment been completed for the AI assistant feature? If so, provide the assessment. If not, explain why one was not deemed necessary.”
No DPIA had been completed. The product team had not flagged Copilot for privacy review before launch. The feature used customer data in a fundamentally new way, passing it through third-party AI infrastructure, but the existing product launch checklist did not include a privacy gate for AI features.
Sarah forwarded the questionnaire to Meridian’s CTO with a two-line message: “We cannot answer this DPIA. If Allianz escalates, we could lose the account, and every other EU client with a similar requirement will ask the same questions.”
The CTO’s response was immediate: “What do we need to fix this?”
The honest answer was: almost everything. Meridian had shipped an AI feature on top of a privacy infrastructure designed for a pre-AI product. The infrastructure had not caught up. Allianz’s DPIA exposed that gap in 47 specific, answerable-or-not questions.
Meridian is fictitious. The scenario is not. If your organization has shipped AI features without updating Data Classification, retention policies, consent architecture, and sub-processor documentation, Meridian’s situation is likely yours.
The 8-component privacy framework from Part 3 provides the blueprint for what to build. This article walks through how to build the first four components, using Meridian as the worked example.
The Diagnostic: Mapping Meridian Against the Framework
Before building anything, Sarah’s team needed to know exactly where they stood. They spent a week mapping Meridian’s current state against each of the eight framework components from Part 3. The diagnostic was uncomfortable.
| Component | Framework Requirement | Meridian’s Current State | Gap Severity |
|---|---|---|---|
| Data Classification | AI-specific categories: training data, inference data, model artifacts, synthetic data | Standard 4-tier classification only (Public, Internal, Confidential, Restricted). Copilot training data classified as “Internal.” No distinction between a customer database and an ML training set. | Critical |
| Retention | ML-specific schedules per data category, with specific periods and rationale | ”Retained as long as necessary for business purposes.” No AI-specific retention periods. No model version lifecycle policy. | Critical |
| Consent | Three-tier layered architecture separating service, improvement, and AI training consent | Single Terms of Service checkbox at signup. No separate consent for Copilot data processing. No opt-out mechanism for AI features. | Critical |
| Sub-processor Transparency | Named registry with purpose, data received, legal basis, and transfer mechanism per sub-processor | Legal team mentions “third-party service providers” generically. No public registry. DataPulse (acquired startup) vendor list undocumented. | Critical |
| Cross-border Transfers | Per-system transfer mapping with specific legal mechanisms | Privacy policy states “Data may be transferred internationally.” No mapping of which data goes where, under which mechanism. | High |
| AI Regulatory Compliance | EU AI Act risk classification per AI system | Not started. No one at Meridian has assessed whether Copilot falls under limited or high-risk classification. | High |
| PETs | Privacy-enhancing technology assessment per AI use case | No assessment conducted. No evaluation of whether queries could be processed with differential privacy or on-premise inference. | Medium |
| Governance | Hub-and-spoke model with embedded privacy champions in product and engineering | Central legal team handles all privacy questions. No privacy champion in the Copilot engineering team. No AI governance function. | High |
Four critical gaps. Three high. One medium. Every critical gap mapped directly to a question Sarah could not answer in the Allianz DPIA.
Sarah presented the diagnostic to Meridian’s executive team with a simple framing: “We have eight components to build. We cannot do all eight simultaneously. The Allianz DPIA, and every enterprise DPIA that follows, requires four of them as a minimum: Data Classification, retention, consent, and sub-processor transparency. These are the Foundation and Control layers from the framework. If we build these four first, we can answer the DPIA. The remaining four, cross-border transfers, AI Act compliance, PETs, and governance, are the Operations and Compliance layers. They come next.”
The executive team approved a 90-day program to build the Foundation and Control layers. This article covers what they built.
Building the Foundation Layer
Component 1: Data Classification
Meridian’s existing Data Classification was a standard four-tier model: Public, Internal, Confidential, Restricted. It was adequate for the dashboard product. Customer financial data was Confidential. Aggregated analytics were Internal. Nothing controversial.
The problem was that Copilot introduced data types that did not fit any of these tiers. When a customer types “Show me Q4 revenue by region” into Copilot, the query itself is Confidential (it contains business context about the customer’s operations). But the query also becomes something else: an input to an ML inference pipeline. If Meridian logs that query-response pair for quality monitoring, it becomes operational data. If Meridian later uses aggregated query patterns to fine-tune the model, it becomes training data. The same piece of data changes classification as it moves through the pipeline.
The framework from Part 3 defines four AI-specific classification categories: Training Data, Model Artifacts, Inference Data, and Synthetic Data. Sarah’s team extended Meridian’s taxonomy to include all four.
Here is how Meridian’s Copilot data mapped to the extended taxonomy:
| Data Element | Traditional Classification | AI Classification | Examples at Meridian |
|---|---|---|---|
| Customer dashboard queries | Confidential | Inference Data | ”Show me Q4 revenue by region” prompts sent to Copilot |
| Copilot model weights | Internal | Model Artifacts | Fine-tuned LLM weights trained on aggregated query patterns |
| Query-response pairs used for fine-tuning | Confidential | Training Data | Anonymized customer queries used to improve Copilot accuracy |
| Copilot-generated summaries | Internal | Inference Data | AI-generated text responses to customer queries |
| DataPulse customer interaction logs | Unknown (undocumented) | Training Data | Legacy interaction data from acquired company, never classified |
| Query vector embeddings | Internal | Inference Data | Numerical representations of customer queries stored in Pinecone |
| Synthetic test queries | Internal | Synthetic Data | Generated queries used for Copilot regression testing |
The DataPulse row was the hardest to resolve. Meridian had acquired DataPulse six months earlier. DataPulse had its own customer base, its own data collection practices, and its own (minimal) privacy documentation. Some DataPulse interaction logs had been fed into Copilot’s training pipeline before anyone at Meridian reviewed the original consent basis. Sarah flagged this as the single highest-risk item in the entire diagnostic: data of unknown provenance, with an undocumented legal basis, already embedded in model training.
The classification exercise took Sarah’s team three weeks. The output was not just a taxonomy document. It was a Data Classification label on every dataset, table, and pipeline in Meridian’s infrastructure. The engineering team integrated classification labels into their metadata catalog, so every dataset in the Copilot pipeline carried its AI classification alongside its traditional tier.
This is what California’s AB 2013 requires. Effective January 1, 2026, developers of generative AI systems must disclose whether training datasets include personal information, copyrighted material, or synthetic data. If your classification framework does not distinguish these categories, you cannot produce the required disclosure.
The diagram above shows how data flows through Meridian’s Copilot pipeline. Classification labels attach at each stage: when the customer submits a query (Inference Data, Confidential), when the query is embedded and stored (Inference Data, Internal), when the LLM generates a response (Inference Data, Confidential), and when anonymized query patterns are batched for model fine-tuning (Training Data, Confidential). The key insight: a single customer query carries different classifications at different points in the pipeline, and each classification triggers different handling requirements.
What this answered in the DPIA. After the classification exercise, Sarah could respond to Question 9 (“Describe the data flows for the Meridian Copilot feature”) with a specific, auditable answer. She could map every data element to a classification category, identify which categories contained personal data, and trace the flow from customer query to model response to (optional) training pipeline. The answer was five pages long. It was also accurate.
Component 2: Retention Schedules
Meridian’s retention policy was a single sentence: “We retain your data as long as necessary to fulfill the purposes for which it was collected.”
This language appears in more privacy policies than it should. The Dutch DPA cited Netflix’s use of similar phrasing as a violation of GDPR’s transparency requirements. The problem is not that the phrase is wrong. The problem is that it communicates nothing. “As long as necessary” means whatever the company decides it means, which means users and regulators cannot hold the company to a specific standard.
Meridian needed to replace that single sentence with specific periods, tied to specific data categories, with documented rationale for each.
The framework from Part 3 provides the structure: retention schedules must cover raw training data, aggregated training data, model weights (current and previous versions), inference logs, and synthetic data. Sarah’s team populated the schedule with Meridian-specific periods:
| Data Category | Current Retention | New Retention | Rationale |
|---|---|---|---|
| Customer dashboard data | ”As long as account is active” | Active account + 12 months post-termination | Contractual obligation under the MSA, plus a reasonable wind-down period for data export. Aligns with GDPR Article 5(1)(e) storage limitation principle. |
| Copilot inference logs (financial services clients) | No policy | 90 days rolling | Financial services clients using Copilot for data analysis may trigger GDPR Article 22 explainability obligations. 90 days provides a window to respond to inquiries about AI-assisted decisions. |
| Copilot inference logs (all other clients) | No policy | 30 days rolling | Sufficient for quality monitoring and debugging. No regulatory obligation to retain longer. Shorter retention reduces the privacy surface area. |
| Copilot training data (containing personal elements) | No policy | Delete after model training + 30-day validation window | Minimize exposure per EDPB guidance. The 30-day window allows validation that the trained model meets quality thresholds before source data is purged. |
| Copilot training data (aggregated, anonymized) | No policy | 24 months | Lower risk. Anonymization validated per EDPB case-by-case standard. Retained for model reproducibility and audit purposes. |
| Previous model versions | No policy | 90 days post-replacement | Provides a rollback window if the new model version underperforms. After 90 days, delete to reduce the surface area for erasure obligations. |
| Current model weights | No policy | Retain while model is in production | Operational necessity. Document which training datasets contributed to the model for erasure traceability. |
| Query vector embeddings | No policy | Same as inference logs (30 or 90 days by client tier) | Embeddings are derived from customer queries and may be reversible to personal data. Treat with the same retention as the source query. |
| DataPulse legacy data | Unknown | Audit within 60 days, apply new schedule or delete | Cannot retain data with unknown classification indefinitely. The 60-day window allows the team to assess provenance and consent basis. Data that cannot be documented must be deleted. |
The DataPulse legacy data row required a hard decision. Sarah’s team discovered that approximately 340,000 interaction records from DataPulse’s original customer base had been ingested into Copilot’s training pipeline. The original consent basis was DataPulse’s Terms of Service, which granted broad data usage rights but did not mention AI training, model improvement, or transfer to an acquiring company. Under GDPR Article 6, the legal basis for processing must be established before processing begins. Using DataPulse data for Copilot training without re-establishing consent or conducting a legitimate interest assessment was a compliance risk.
Meridian’s privacy team made two decisions. First, they quarantined the DataPulse data, removing it from the active training pipeline. Second, they documented which model versions had been trained with DataPulse data and queued a machine unlearning assessment: a technical evaluation of whether those model versions could be retrained without the DataPulse data, or whether the influence of that data on the model weights was negligible. The EDPB’s December 2024 opinion is clear that AI models trained on personal data cannot automatically be considered anonymous. If the DataPulse data contained personal information, the model weights may carry that personal data forward even after the source data is deleted.
This was not a comfortable finding. It meant Meridian had to assess not just what data it held, but what data its models had already absorbed.
What this answered in the DPIA. Question 31 (“What is the retention period for user queries submitted to the AI assistant?”) now had a specific answer: 90 days for financial services clients, 30 days for others, with documented rationale for each period. The answer also included the model version lifecycle policy and the DataPulse remediation plan. It was no longer “as long as necessary.” It was auditable.
Building the Control Layer
Component 3: Consent Architecture
Meridian’s consent model was a single checkbox at signup: “I agree to Meridian’s Terms of Service and Privacy Policy.” One click covered everything, from rendering dashboards to training AI models to sharing data with third-party infrastructure providers. This is the blanket consent model that the Italian Garante found insufficient when fining OpenAI EUR 15 million in December 2024. The Garante found that OpenAI had no adequate legal basis for using personal data to train ChatGPT, reinforcing the principle that AI model training requires its own documented legal justification, separate from the core service.
Sarah’s team designed a three-tier consent architecture following the model from the Part 3 framework. The key design principle: withdrawing consent from a higher tier never breaks a lower tier.
Tier 1: Service Processing. This covers all data processing necessary to deliver Meridian’s core product: rendering dashboards, running queries, storing customer data, generating Copilot responses. The legal basis is contractual necessity under GDPR Article 6(1)(b). No additional consent is needed. If you are a Meridian customer, processing your data to show you your dashboards is why the contract exists. Copilot inference, taking a customer’s query, generating a response, and returning it, falls here. The data is processed to deliver the service the customer purchased.
Tier 2: Product Improvement. This covers using aggregated, anonymized usage patterns to improve Meridian’s products. Examples: analyzing which dashboard visualizations customers interact with most, identifying common Copilot query patterns to improve the query parser, A/B testing UI layouts. The legal basis is legitimate interest under GDPR Article 6(1)(f), with a documented balancing test showing that the processing serves both Meridian’s business interest and the customer’s interest in a better product. Customers can opt out in their account settings. Opting out of Tier 2 does not affect Tier 1: the customer still gets the full product.
Sarah’s team documented the legitimate interest balancing test for Tier 2. The test weighed Meridian’s interest (improving product quality based on usage patterns) against the data subject’s rights (potential concern about behavioral analysis). The mitigating factors: data is aggregated before analysis, individual-level patterns are not extracted, and a genuine opt-out is available. The CNIL’s guidance on AI system development was the reference standard: legitimate interest requires a documented balancing test, not just a stated interest.
Tier 3: AI Model Training. This covers using customer query-response pairs to fine-tune Copilot’s underlying model. This is where data collected for one purpose (answering a customer’s question) gets repurposed for a different purpose (training a model that answers other customers’ questions). The legal basis is explicit consent under GDPR Article 6(1)(a). It must be freely given, specific, informed, and unambiguous. It cannot be pre-checked. It must be separate from Tier 1 and Tier 2 consent. A customer who declines Tier 3 consent still gets the full Copilot service; the model just will not learn from their interactions.
The consent settings page Sarah’s team designed had three clear sections, each with a toggle switch and a plain-language explanation:
Your Data, Your Choice
Dashboard & Copilot Service (always on) We process your data to run your dashboards and answer your Copilot questions. This is part of the service you purchased.
Product Improvement (on by default, opt-out available) We use aggregated, anonymized usage patterns to make Meridian’s products better for everyone. No individual data is extracted. You can opt out at any time. [Learn more]
AI Model Training (off by default, opt-in required) With your permission, we use anonymized versions of your Copilot interactions to improve Copilot’s accuracy for all customers. Your data is anonymized before training. You can withdraw permission at any time, and we will exclude your data from the next training cycle. [Learn more]
The default states mattered. Tier 2 defaulted to on because Meridian assessed, through the documented balancing test, that aggregated product improvement serves both parties and does not require explicit consent. Tier 3 defaulted to off because using query data for model training is a materially different purpose from delivering the service. The CNIL’s recommendations and the Italian Garante’s OpenAI decision both support this distinction.
What happens when a customer opts out of Tier 3? Sarah’s team defined a four-step process:
- The customer’s data is flagged for exclusion from the next model training run. In-flight training cycles complete with existing data; the exclusion takes effect on the next scheduled run.
- Existing inference logs for that customer begin the standard retention countdown (30 or 90 days, depending on client tier). No new logs are created for training purposes.
- Model versions that were trained with the customer’s data are documented. Meridian records which training datasets contributed to which model versions, so the lineage is auditable.
- A machine unlearning assessment is queued. If the customer’s data contribution was significant (determined by volume thresholds defined by the ML team), Meridian evaluates whether retraining without that data is feasible and warranted.
Tier 1 and Tier 2 services continue completely unaffected. The customer sees no change in their dashboard experience or Copilot response quality. The model may still improve from other customers’ contributions; it just will not learn from this customer’s interactions.
What this answered in the DPIA. Question 14 (“What is the legal basis for processing personal data through the AI assistant?”) now had a layered answer: contractual necessity for inference (Tier 1), legitimate interest with documented balancing test for product improvement (Tier 2), and explicit opt-in consent for model training (Tier 3). Each tier had a specific GDPR article reference and supporting documentation. Allianz’s data was processed under Tier 1 by default, with Tier 2 and Tier 3 subject to Allianz’s own consent preferences.
Component 4: Sub-Processor Registry
Meridian’s privacy policy said: “We may share your information with third-party service providers who perform services on our behalf.” This is the exact genre of language the Dutch DPA found insufficient in the Netflix case. Under GDPR Article 28, controllers must identify their processors and sub-processors with enough specificity that data subjects can understand who handles their data and for what purpose.
Sarah’s team conducted a sub-processor audit over two weeks. The process was more revealing than anyone anticipated.
Step 1: Audit every vendor contract. Sarah’s team pulled every active vendor contract from Meridian’s procurement system. For the core dashboard product, this was straightforward: AWS for infrastructure, Snowflake for data warehousing, a handful of monitoring and observability tools.
Step 2: Audit the Copilot architecture. This is where the gaps appeared. The engineering team had built Copilot using several third-party services that had never been reviewed by the privacy team. An LLM provider for natural language processing. A vector database service for semantic search over customer data. A logging platform for monitoring Copilot response quality. Each of these services received some form of customer data, and none had been added to any privacy documentation.
Step 3: Audit DataPulse’s vendor list. DataPulse, the acquired startup, had its own set of vendors. Its customer data sat on MongoDB Atlas. It used a separate analytics platform. Its vendor contracts were stored in a shared Google Drive folder that Sarah’s team had to request access to. Two of DataPulse’s vendors had been decommissioned since the acquisition, but the data had not been migrated or deleted.
The completed registry:
| Sub-Processor | Data Received | Purpose | Legal Basis | Transfer Mechanism | Location |
|---|---|---|---|---|---|
| AWS (us-east-1, eu-west-1) | All customer data (dashboard + Copilot) | Cloud infrastructure, compute, storage | Contractual necessity | EU data: processed in eu-west-1. US data: processed in us-east-1. EU-US transfers via DPF certification. | US, Ireland |
| Anthropic | Customer queries (anonymized at point of transmission) | LLM inference for Copilot responses | Legitimate interest (Tier 1 service delivery) | DPF certification | US |
| Pinecone | Query vector embeddings | Semantic search for Copilot context retrieval | Legitimate interest (Tier 1 service delivery) | DPF certification | US |
| Snowflake | Aggregated customer analytics, usage metrics | Data warehousing and business intelligence | Contractual necessity | SCCs + supplementary technical measures (encryption at rest and in transit) | US, Netherlands |
| Datadog | System telemetry, anonymized performance metrics | Application monitoring and alerting | Legitimate interest | DPF certification | US |
| MongoDB Atlas (DataPulse legacy) | Historical customer interaction logs from DataPulse | Pending migration to Meridian infrastructure | Under review (original legal basis was DataPulse ToS) | SCCs (pending DPF assessment) | US (Virginia) |
The MongoDB Atlas row was the one that kept Sarah awake. DataPulse’s customer data was sitting on infrastructure that Meridian had inherited but not yet governed. The original Terms of Service under which DataPulse collected the data did not contemplate transfer to an acquiring company for AI training purposes. Under GDPR Article 6, Meridian could not simply assume the legal basis transferred with the acquisition. The data needed to be either re-consented, justified under a new legal basis with a documented legitimate interest assessment, or deleted.
Sarah’s team set a 60-day deadline for resolving the DataPulse data. The options were:
- Re-consent. Contact DataPulse’s original customers, explain the acquisition and the intended data use, and obtain fresh consent under Tier 3. Feasible for active customers. Not feasible for churned customers whose contact information may be stale.
- Legitimate interest assessment. Document a legitimate interest basis for retaining the data for product improvement (Tier 2), with a genuine opt-out. Requires a balancing test and notification to data subjects.
- Delete. Purge the data entirely. Simplest from a compliance perspective. Loses potential training value, but eliminates the risk.
For churned DataPulse customers (roughly 40% of the dataset), Sarah’s team recommended deletion. For active customers who had been migrated to Meridian accounts, they recommended notification plus a Tier 2 legitimate interest assessment, with Tier 3 consent collected separately. The engineering team was given 60 days to execute the migration and deletion.
Making the registry public. Under GDPR Article 28, processors must make sub-processor information available to controllers. Best practice goes further: publish the registry so any customer can see it without requesting it. Meridian created a dedicated page at meridiananalytics.com/sub-processors (a common pattern; companies like Notion, Slack, and Snowflake maintain similar pages) that listed every sub-processor with the data they receive and their purpose. The page included a change log with dates, so customers could see when sub-processors were added or removed.
The registry also included an email notification mechanism: enterprise customers could subscribe to receive advance notice when Meridian intended to add a new sub-processor. This gave customers like Allianz a 30-day window to review the new sub-processor before data processing began, a contractual right increasingly common in enterprise SaaS agreements.
What this answered in the DPIA. Question 23 (“List all sub-processors that receive or process personal data in connection with the AI assistant”) now had a complete, auditable answer. Every sub-processor was named. Every data flow was documented. Every legal basis was stated. Every transfer mechanism was specified. Sarah could point Allianz to a public URL that would stay current as sub-processors changed.
What Meridian Could Now Answer
Six weeks after the Allianz DPIA arrived, Sarah submitted the completed questionnaire. Here is what had changed:
| DPIA Question | Before | After |
|---|---|---|
| Data flows for the AI assistant | ”We could not provide specifics” | 5-page data flow map with classification labels at every stage |
| Legal basis for AI data processing | ”Covered by Terms of Service” | Three-tier consent architecture with specific GDPR article references for each tier |
| Sub-processor list | ”Third-party service providers” | Named registry with 6 sub-processors, each documented with data received, purpose, legal basis, and transfer mechanism |
| Retention periods | ”As long as necessary” | Specific periods per data category: 30-90 day inference logs, training data retention tied to model lifecycle, 90-day model version rollback window |
| DPIA for the AI feature | ”Not completed” | In progress, expected completion within 60 days (requires cross-border transfer mapping from the Operations layer) |
Sarah could not yet answer every question. The cross-border transfer documentation (Component 5 from the framework) was still being built. The EU AI Act risk classification for Copilot (Component 6) had not been completed. The privacy-enhancing technology assessment (Component 7) and governance operating model (Component 8) were planned for the next phase.
But the four components Meridian built, Data Classification, retention schedules, consent architecture, and sub-processor registry, answered the questions that had been unanswerable six weeks earlier. That was enough to satisfy Allianz’s procurement team and keep the renewal on track.
Do Next
| Priority | Action | Why It Matters |
|---|---|---|
| This week | Run the 8-component diagnostic against your own organization. Score each component using the table in Section 2. | You cannot fix what you have not measured. Meridian’s “we are mostly fine” assumption collapsed under one DPIA request from a single client. |
| This week | Search your privacy policy for “as long as necessary” and “third-party service providers.” Count the instances. | These are the exact phrases that cost Netflix EUR 4.75 million. If they appear in your policy, your retention and transparency controls are likely underspecified. |
| This month | Extend your Data Classification taxonomy with the four AI categories from Part 3: Training Data, Model Artifacts, Inference Data, Synthetic Data. Classify every dataset in your ML pipeline. | If your classification does not distinguish between a customer database and an ML training dataset, you cannot comply with EU AI Act Article 10 or California AB 2013. |
| This month | Build a sub-processor registry. Start with AI pipeline vendors: model providers, vector databases, annotation services, monitoring tools. | Generic “third-party service providers” language is an enforcement target. Name your sub-processors, document the data they receive, and publish the registry. |
| This quarter | Design and deploy a three-tier consent architecture. Separate AI training consent from service consent. Default AI training consent to off. | The Italian Garante’s OpenAI fine established that blanket ToS consent is insufficient for model training. Layered consent with explicit opt-in for AI training is becoming the enforcement standard. |
| This quarter | Audit any data inherited through acquisitions. Assess the original consent basis and determine whether it covers your current use. | Meridian’s DataPulse data was the highest-risk finding in the entire diagnostic: data of unknown provenance already embedded in model training. If you have acquired companies with their own data practices, this risk is likely sitting in your pipeline today. |
What Comes Next
Meridian now has the Foundation and Control layers in place: classified data, specific retention schedules, layered consent, and a public sub-processor registry. But the Allianz DPIA is not fully answered. Five questions about cross-border transfers remain open. The EU AI Act risk classification for Copilot has not been completed. No privacy-enhancing technology assessment has been conducted. And the governance model, the structure that ensures these artifacts stay current as the product evolves, does not exist yet.
Part 6 walks through building the remaining four framework components: cross-border transfer documentation for AI workloads, the EU AI Act risk classification for Copilot, a PET assessment, and the hub-and-spoke governance model that ties everything together. Meridian’s privacy infrastructure is half built. The second half is what turns it into an operational program.
Sources & References
- EU AI Act - Article 10: Data and Data Governance(2024)
- GDPR Article 22 - Automated Decision-Making(2016)
- EDPB Opinion 28/2024 on AI Models and Personal Data(2024)
- Italian Garante Fines OpenAI EUR 15M for GDPR Violations(2024)
- noyb WIN: Dutch Authority Fines Netflix EUR 4.75M(2024)
- CNIL Recommendations for AI System Development(2024)
- California AB 2013 - AI Training Data Transparency(2024)
- Crowell & Moring - California AB 2013 Disclosure Requirements(2025)
- EDPB 2025 Coordinated Enforcement - Right to Erasure(2025)
- Clearview AI - Dutch DPA EUR 30.5M Fine(2024)
- GDPR Article 35 - Data Protection Impact Assessment(2016)
- ICO Guide to DPIAs(2024)
- Consent in AI Applications - GDPR Local(2025)
- IAPP-EY Annual Governance Report 2023(2023)
- Allianz DPIA Requirements for Vendors(2025)
- GDPR Article 28 - Processor Requirements(2016)
- GDPR Article 30 - Records of Processing Activities(2016)
Stay in the loop
Get new articles on data governance, AI, and engineering delivered to your inbox.
No spam. Unsubscribe anytime.