Willison's Agentic Engineering Patterns: What Data Practitioners Should Steal
Bad code crashes visibly. Bad data looks plausible. That asymmetry makes agent-assisted data work riskier than software, and it is why Simon Willison's Agentic Engineering Patterns guide matters for data practitioners. His Red/Green TDD maps to data contracts before transformation. His testing discipline gives teams a framework for agent verification. But some patterns need adaptation: 'pipelines are cheap' is only half true, and hoarding knowledge is harder when institutional context lives in people's heads, not in code.
Why Data Practitioners Should Care
When a coding agent produces bad software, it usually crashes. A test fails, a build breaks, a user sees an error. The feedback is immediate and visible.
When an agent produces bad data, nothing crashes. The pipeline runs. The dashboard updates. The numbers look plausible. Then someone makes a decision based on those numbers, and the failure surfaces days or weeks later, far from the point where the error was introduced. This is the fundamental asymmetry that makes agent-assisted data work riskier than agent-assisted software development.
Simon Willison, the co-creator of Django, started publishing Agentic Engineering Patterns in February 2026 as a living guide. It now spans 15 chapters across six sections, and it is the most specific, example-driven reference for working with coding agents that exists today. Software engineers are reading it. Data practitioners mostly are not.
I have spent the past month building data pipelines, governance workflows, and even this blog’s daily briefing agent with AI coding agents. Not every Willison pattern translates equally to data work. Some are directly applicable; others need adaptation; a few miss aspects of data that have no software equivalent. This article focuses on the patterns that genuinely transfer and names where the translation breaks down.
Two Principles That Reframe Everything
Principle 1: “Writing Code Is Cheap Now”
Willison frames this as the central disruption. Code production cost has dropped to near zero. This breaks engineering habits at two levels:
At the macro level, planning and feature prioritization assumed coding was expensive. When it is not, the ROI calculus for every feature, refactoring task, and quality improvement changes.
At the micro level, daily decisions about documentation, edge-case testing, and cleanup assumed each addition cost hours. When an agent handles them asynchronously, the default shifts from “skip it” to “run the agent and check the result.”
The nuance Willison adds is critical: “Delivering new code has dropped in price to almost free, but delivering good code remains significantly more expensive than that.” The attributes that separate cheap code from good code (verification, error handling, testing, documentation, security) still require human judgment.
For data practitioners: “Pipelines are cheap now” is the equivalent reframe, with a caveat. Authoring a new dbt model, writing a Data Quality check, generating a schema migration: the cost of writing these artifacts has collapsed. But data pipelines have a second cost that software does not: compute. An agent can write SQL in seconds; executing it against a 10TB table still costs real money and clock time. If cheap authoring leads to more pipelines running more compute, the total infrastructure cost goes up, not down. The part that translates cleanly: delivering good data (accurate, complete, timely, consistent) still requires someone who knows what good looks like. The cost of producing has dropped. The value of evaluating has risen.
Principle 2: “Hoard Things You Know How to Do”
This is Willison’s most practical advice. His argument: agents can execute any pattern you have seen work. Your job is to know what patterns exist and which solution fits the current problem. He maintains a public blog, TIL notes, and a large tools collection, each a working artifact built through real projects. Every one is a seed an agent can grow into a working solution.
His compound pattern is powerful: feed an agent two working examples, and it combines them into something new. He demonstrated this by combining Tesseract.js (browser OCR) and PDF.js (PDF rendering) into a browser-based PDF OCR tool, all by giving an agent both working examples as context.
For data practitioners: “Hoard Data Quality institutional knowledge” is the translation, but this is where the hoarding problem is harder than in software. Willison hoards code repos, blog posts, and working tools: artifacts that already exist as files. In data work, the most valuable institutional knowledge often lives in people’s heads, not in code. Why did we exclude that vendor from the aggregation last quarter? What is the correct join key between these two systems that share no common identifier? When the finance team says “revenue,” do they mean gross or net? These are business rules encoded in tribal memory, not in dbt models. Hoarding for data means documenting the undocumented: writing down the business logic that no test suite captures. Without it, you are asking the agent to reinvent your organization’s data context from scratch every session.
This connects to what I described in Judgment-in-the-Loop: “The person with judgment in the loop knows enough about the domain to recognize when AI output looks right but is wrong.” Willison’s hoarding pattern is judgment-in-the-loop expressed as a concrete habit.
The Patterns That Matter Most for Data Work
Red/Green TDD: Data Contracts Before Transformation
“Use red/green TDD” is a succinct prompt that embeds substantial discipline. Write tests first, confirm they fail, then implement until they pass.
For data work, this translates to: define expected outputs (schema contracts, row counts, value distributions) before the agent generates the transformation. Run the contract against the current state to confirm it fails. Then let the agent implement. When the contract passes, you have a strong verification baseline and can reduce, not eliminate, line-by-line review.
What this looks like in practice. Pick one dbt model. Write three expectations before the agent touches it: expected row count range, non-null columns, and one business rule (e.g., “revenue is never negative”). Run them against the current output to confirm they pass. Now let the agent refactor. If the expectations still pass, you have evidence the refactoring preserved correctness.
This is Data Quality engineering adapted for the agentic era. The contract is the evaluation criterion. The agent is the implementer. The human defines what “correct” means; the agent figures out how to get there. In data work, passing contracts and tests raises confidence, but semantic correctness still requires domain review because the hardest failures are often business-logic failures, not syntax failures.
The Testing Discipline: Baseline, Verify, Review
Data engineering already has its own testing tradition: Great Expectations, dbt tests, Soda, Monte Carlo. What Willison adds is not the tools but the discipline of making them the primary interface with agents. Three practices:
Baseline before you build. Willison starts every agent session with “first run the tests.” For data work: “First run the Data Quality checks.” Before any agent-assisted pipeline modification, let the agent discover existing quality rules and establish a baseline. If a check was passing before and fails after, you know exactly where the problem was introduced.
Manual review is not optional. When agents generate code you have not reviewed line by line, manual inspection remains essential because automated checks can pass while business logic is still wrong. For data work, this means looking at actual output rows: valid syntax but wrong business logic, correct aggregations at the wrong granularity, plausible numbers that miss a recently changed business rule. This is the “almost right” problem from the Data Quality article. It passes every automated check and fails at 2 AM when the business user trusts it.
The agent-era restatement: Automated tests verify what you already know to check. Manual testing catches what you did not think to specify. In data work, the gap between those two categories is where the costliest errors live.
No unreviewed agent output in production. Willison frames this as his primary anti-pattern. For data, the stakes are higher: bad code crashes visibly; bad data looks plausible until a downstream consumer makes a decision based on it. The anti-pattern is not using agents. The anti-pattern is trusting agent output without verification.
Willison is not the only voice describing this shift. Karpathy arrives at the same conclusion from AI research (covered in From Vibe Coding to Agentic Engineering). The difference: Karpathy demonstrates what the workflow inversion looks like; Willison provides the how with specific, named patterns. For data practitioners, the combination matters: the shift is real, and Willison’s patterns are the most concrete guide for navigating it safely.
What Willison Gets Right That Others Miss
Quality is a choice, not a consequence. Most discourse frames agent-generated code as inherently lower quality. Willison rejects this. Quality degradation happens when teams do not invest in verification. When agents handle the execution, the time saved can be redirected to verification. The net quality can go up, not down.
The cost-benefit recalculation is the real disruption. Decades of engineering habits (skip the refactoring, skip the documentation, skip the edge-case test) were rational responses to high costs. When those costs drop, the rational response changes. For data teams: decades of deferred Data Quality improvements (fix the naming conventions, backfill the missing metadata, align the schemas) can now be delegated to agents. The excuse of “not enough time” disappears.
Do Next
| Priority | Action | Why it matters |
|---|---|---|
| This week | Read Willison’s guide, starting with “Writing code is cheap now” and “Hoard things you know how to do” | These two chapters establish the mental model. Everything else builds on them. |
| This week | Apply “first run the Data Quality checks” to one existing pipeline before modifying it with an agent | Establishing a quality baseline before agent execution is the single highest-leverage pattern for data work. |
| This month | Adopt red/green TDD for one dbt project: define schema contracts and row count expectations before letting the agent generate transformations | Data contracts before transformation is the data equivalent of Willison’s most important testing pattern. |
| This month | Start a “hoarding” practice: document three working patterns from your data work (a validated quality check, a schema migration approach, a governance decision template) | Without documented patterns, you are asking agents to reinvent your institutional knowledge every session. |
| This quarter | Document the undocumented: write down three business rules that live in tribal memory but not in any test suite or dbt model | These are the rules agents cannot discover from code. Until they are written down, every agent session starts without your organization’s most important context. |
Sources & References
- Simon Willison: Agentic Engineering Patterns (full guide)(2026)
- Simon Willison: 'Writing Code Is Cheap Now' (Chapter 1)(2026)
- Simon Willison: 'Hoard Things You Know How to Do' (Chapter 2)(2026)
- Simon Willison: 'Red/Green TDD' (Chapter 6)(2026)
- Simon Willison: 'First Run the Tests' (Chapter 7)(2026)
- Simon Willison: 'Agentic Manual Testing' (Chapter 8)(2026)
- Simon Willison: 'Anti-Patterns: Things to Avoid' (Chapter 15)(2026)
- Simon Willison: Tesseract.js + PDF.js OCR Case Study(2026)
- Simon Willison's Newsletter: Agentic Engineering Patterns(2026)
- Simon Willison: Fireside Chat at Pragmatic Summit(2026)
- Kent Beck: Augmented Coding and TDD with AI Agents(2025)
- Ben Werdmuller: Commentary on Agentic Engineering Patterns(2026)
Stay in the loop
Get new articles on data governance, AI, and engineering delivered to your inbox.
No spam. Unsubscribe anytime.