AI Products & Strategy March 25, 2026 · 8 min read

Pike's Five Rules Are Now the Five Rules of Agent Development

Rob Pike wrote five rules of programming in 1989 at Bell Labs. Thirty-seven years later, they map onto AI agent development with striking precision: measure before tuning, start simple, and get the data right. Nobody has made this connection explicitly. Here is the mapping, the evidence, and the framework it gives you.

By Vikas Pratap Singh
#ai-agents #agent-development #context-engineering #agentic-engineering #data-quality #software-engineering

Part 3 of 12: The Practitioner’s Guide to AI Agents

In Article 1, we defined what an agent is. If you have not read it: an agent is a system that uses an LLM to decide which actions to take in a loop until a goal is met. That definition tells you what to build. This article tells you how to think about building it.

The framework comes from an unexpected place: a document written in 1989.

I learned this through a side project: a document analyzer that extracts structured data from uploaded files. The UI was simple. Upload a document, get a JSON object back with the key fields extracted. My first instinct was to throw a bigger model at the problem. I started with Qwen 72B, then moved to Qwen 235B, assuming that a larger model would produce more accurate extractions.

It did not. The improvement was marginal. What actually moved the needle was two changes that had nothing to do with model size: tuning the extraction prompt to be more specific about the output schema, and increasing the resolution of the uploaded documents so the model could read the text more clearly. The 72B model with a better prompt and cleaner input data outperformed the 235B model with the original prompt and lower-resolution scans.

The bottleneck was never the model. It was the data entering the context window and the instructions shaping the extraction. I did not have the vocabulary for it at the time, but what I was learning was a violation of every rule in this article.

1989, Bell Labs, and a Document That Won’t Die

Rob Pike published “Notes on Programming in C” while working at Bell Labs. It was not a paper. It was not a book. It was a set of notes, circulated informally, containing five rules of programming that fit on an index card.

The rules said: do not guess where your program spends its time; measure before you optimize; prefer simple algorithms because they are faster and less buggy at real-world scale; and get the data structures right, because once you do, the algorithms become obvious.

For 37 years, these rules circulated among programmers as received wisdom. They appeared on university syllabi and coding interviews. They were quoted in blog posts and conference talks. And in March 2026, someone posted them on Hacker News, where they reached 901 points and generated hundreds of comments.

The top-voted comments were not nostalgic. They were noting, with some surprise, that the rules felt more relevant in 2026 than they did in 1989. Several commenters pointed to AI and agents specifically.

That signal deserves attention. When a 37-year-old set of five rules about C programming trends on the front page of HN the same month that agent development is the dominant topic in AI engineering, something structural is happening. The pattern recognition is not coincidence; it is convergence. Agent development is repeating the exact mistakes Pike’s rules were designed to prevent.

Nobody has mapped the rules explicitly to agent development. So I will, rule by rule, with the evidence that makes each mapping concrete.

Rule 1: “You Can’t Tell Where a Program Is Going to Spend Its Time”

Pike’s original point: Your intuition about performance bottlenecks is wrong. Do not guess. Profile.

The agent translation: Teams assume the bottleneck is the model. They spend months on model selection, upgrade from Sonnet to Opus, double the context window from 128K to 1M, and invest in fine-tuning. The actual bottleneck is almost always somewhere else.

The AgentDrift study (March 2026) demonstrated this precisely. Across 1,563 contaminated tool-output turns and seven LLMs, standard quality metrics stayed stable while safety violations appeared in 65-93% of turns. The bottleneck was not model capability. It was Data Quality at the tool-result boundary, the place where information enters the context window from external tools. Every model tested, including the most capable ones, was equally vulnerable. A better model did not fix the problem.

Karpathy reinforced this on the No Priors podcast in March 2026, calling agent failures “skill issues” rather than model limitations. The human’s ability to structure context is the bottleneck, not the model’s ability to reason.

The agent-era restatement: You cannot tell where an agent is going to fail. Do not upgrade the model until you have proven the model is the problem.

Rule 2: “Measure. Don’t Tune for Speed Until You’ve Measured”

Pike’s original point: Profiling before optimization. Without measurement, you are guessing.

The agent translation: Most teams building agents have no evaluation framework. They tune prompts by feel, adjust context by intuition, and declare success when the output “looks right.” This is the equivalent of optimizing a C program by staring at the source code and guessing which function is slow.

The METR randomized controlled trial (2025) is the starkest evidence. Researchers tracked 16 experienced developers using AI coding tools on real tasks. The developers were 19% slower with AI assistance but believed they were 20% faster. A 39-percentage-point perception gap, invisible without measurement.

The Stack Overflow 2025 Developer Survey found that 66% of developers say their biggest frustration with AI coding tools is “solutions that are almost right, but not quite.” “Almost right” is the category that measurement was invented to catch. Without evals, every output feels close enough.

Hamel Husain’s framework captures the hierarchy: “Documentation tells the agent what to do. Telemetry tells it whether it worked. Evals tell it whether the output is good.” Most teams have documentation. Some have telemetry. Very few have evals.

View code: an assertion-based eval for agent behavior
def eval_agent_step(agent_output: dict, expectations: dict) -> dict:
    """Evaluate a single agent step against expectations."""
    results = {}
    if "expected_tool" in expectations:
        results["correct_tool"] = agent_output.get("tool_called") == expectations["expected_tool"]
    if "required_fields" in expectations:
        results["has_fields"] = all(
            f in agent_output.get("result", {}) for f in expectations["required_fields"]
        )
    if "max_tokens" in expectations:
        results["within_budget"] = agent_output.get("tokens", 0) <= expectations["max_tokens"]
    passed = all(results.values())
    return {"passed": passed, "checks": results}

# Usage
result = eval_agent_step(
    agent_output={"tool_called": "get_weather", "result": {"temp": 72, "city": "Chicago"}, "tokens": 450},
    expectations={"expected_tool": "get_weather", "required_fields": ["temp", "city"], "max_tokens": 1000},
)
# {"passed": True, "checks": {"correct_tool": True, "has_fields": True, "within_budget": True}}

The agent-era restatement: Measure your agent’s output quality before tuning anything. Without evals, you are optimizing by hallucination: yours, not the model’s.

Rule 3: “Fancy Algorithms Are Slow When n Is Small, and n Is Usually Small”

Pike’s original point: Simple solutions outperform complex ones for most real-world inputs. The overhead of a sophisticated algorithm only pays off at scale, and most problems are not at scale.

The agent translation: Teams reach for multi-agent orchestration, RAG pipelines, vector databases, and complex tool chains when a single well-prompted agent with good context would solve the problem.

The Karpathy Loop is the purest expression of this rule. Karpathy’s AutoResearch project uses one agent, one file (program.md), one metric (training loss), and a fixed time constraint.

AutoResearch works like this: the agent reads program.md (the experiment log), proposes a code change to train.py, runs the experiment for five minutes, records the result, and updates program.md. That is the entire architecture. It ran roughly 700 experiments and found roughly 20 genuine improvements, producing an 11% training speedup. No orchestration framework. No multi-agent handoffs. No vector database. One agent, iterating.

Anthropic’s own “Building Effective AI Agents” guide recommends the same: start with the simplest possible agent architecture and add complexity only when measured performance demands it.

Simon Willison’s agentic engineering patterns point to the same conclusion from a different angle. His premise that “writing code is cheap now” implies the right response to complexity is not a more complex architecture but more iterations with a simple one.

The agent-era restatement: Start with one agent, one prompt, one evaluation metric. Add complexity only when measurement proves the simple version is insufficient. Most agent problems are prompt problems, not architecture problems.

Rule 4: “Fancy Algorithms Have Big Constants and Are Buggier”

Pike’s original point: Complex code has more hiding places for bugs. It is harder to implement correctly, harder to debug, and harder to maintain.

The agent translation: Every additional agent, tool, hop, or memory layer in a multi-agent architecture introduces a failure surface. The math is unforgiving. If each step in an agent workflow has 85% accuracy, a ten-step workflow succeeds only 20% of the time (0.85^10 = 0.20). Each tool call, each context injection, each agent handoff is a potential corruption point.

ReliabilityBench (January 2026) demonstrated this by applying chaos engineering principles to agent evaluation. Success rates dropped from 96.9% to 88.1% with relatively minor perturbations: network delays, API format changes, partial tool failures. In production, perturbations are not the exception. They are the norm.

The core finding from real-world agent architectures: three of four data-flow boundaries have validation. The one between tool results and the context window, where most failures originate, has none.

The agent-era restatement: Every component you add to an agent architecture multiplies the failure surface. Simple agents with good context are more reliable than complex agents with poor context. Reliability beats capability.

The Full Landscape of Agent Failures

Rules 3 and 4 focus on complexity as a failure multiplier. But complexity is not the only source of failure. Agents break in six distinct ways, and conflating them leads to the wrong fix.

  1. Planning failures. The agent decomposes the task incorrectly. It picks wrong sub-goals, oversimplifies a multi-step problem, or overcomplicates a simple one. A planning failure means the agent’s strategy is wrong before it calls a single tool.

  2. Tool selection failures. The agent picks the wrong tool for the job. Not that the tool returns bad data, but that the agent chose a calculator when it needed a search engine, or queried the wrong database entirely.

  3. Loop pathologies. Infinite loops, premature stopping, oscillation between two states, retry storms. The agent’s control flow breaks down. It does the same thing over and over or quits before the task is complete.

  4. Cost explosion. The agent enters a reasoning loop with a large context window and burns through tokens before anyone notices. A single runaway agent can consume hundreds of dollars in minutes.

  5. State management failures. The context window fills up and critical information gets pushed out. Summary compression loses key details. Cached data goes stale mid-session.

  6. Context quality failures. The data entering the context window is wrong, stale, contradictory, or missing. This is the primary focus of the series, covered in depth in Articles 5 through 8.

This series focuses on context quality (category 6) because it is the least understood and the most consequential. The other five categories are real and common. They are also more visible: a loop that never terminates is obvious; stale data that corrupts reasoning is silent.

When Agents Break: Five Recovery Patterns

When failures do happen, the agent needs a way to recover rather than crash or silently produce garbage. Five patterns cover most recovery scenarios.

  1. Timeouts. Set a timeout per tool call. If a tool hangs, log it, skip it, and continue with what you have.

  2. Malformed responses. When the LLM returns an unexpected stop_reason or a tool_use block with invalid JSON, catch it, log it, and retry once. If it fails again, return a partial result.

  3. Context overflow. Track cumulative tokens. When approaching the limit, summarize older context and continue with the compressed version. Never silently truncate.

  4. Loop termination. Set max_iterations. When reached, return the best result so far with a note that the agent did not fully converge.

  5. Partial recovery. Checkpoint progress by writing intermediate results to a file or state object. If the agent crashes, resume from the last checkpoint rather than restarting from scratch.

These patterns are not optional in production. An agent without timeouts and loop limits is a liability. Article 4 in this series implements all five patterns in working code.

Rule 5: “Data Dominates”

Pike’s full quote: “If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident.”

Pike’s original point: The choice of data structures matters more than the choice of algorithms. Get the data right and the code follows.

The agent translation: This is the most powerful mapping, and the one that connects directly to everything I have written about agent quality. In AI agents, the context window IS the data structure. The model is the algorithm. If you fill the context window with the right information, the model’s reasoning follows naturally. If you fill it with noise, stale data, or contradictions, no model upgrade can compensate.

Karpathy called the context window “the LLM’s RAM” in 2025. The analogy understates the point. The context window is not just storage. Its contents determine behavior. It is closer to a program than to memory.

The Context Engineering paper (March 2026) formalizes this, defining five quality criteria for context: relevance, sufficiency, isolation, economy, and provenance. It frames context as “the agent’s operating system,” the substrate that determines what the model can and cannot do.

Chroma’s “context rot” research adds the constraint: more context does not mean better performance. Across 18 models, performance degraded continuously as context grew. The right data, not more data, is what matters.

The three-part agent quality series on this blog (Data Quality problem, missing quality layer, judgment-in-the-loop) is, in retrospect, an extended argument for Pike’s Rule 5 applied to agents. The context window is the unmonitored data pipeline. The missing quality layer is the absent data structure. The human role is the judgment that decides what enters the context and what does not.

The agent-era restatement: Context dominates. If you have chosen the right context and organized it well, the model’s reasoning will almost always be self-evident. Context Engineering, not model selection, is the central discipline of agent development.

The Synthesis

The five rules form a progression, not a checklist. Rules 1 and 2 say: do not guess, measure. Rules 3 and 4 say: start simple. Rule 5 says: get the data right and everything else follows.

Pike's Five Rules mapped to Agent Development: Measure First, Start Simple, Get the Data Right

Applied to agents, the progression becomes: do not guess where agents fail (measure with evals), start with the simplest possible architecture (one agent, one loop), and invest your time in context quality rather than model capability.

Pike’s Rule (1989)C ProgrammingAgent Development (2026)Series Deep Dive
1. Can’t tell where time is spentProfile before optimizingThe bottleneck is context, not the modelArticle 6: Evals
2. Measure before tuningProfiler before compiler flagsEvals before prompt tuningArticle 6: Evals
3. Fancy algorithms are slow when n is smallSimple solutions firstOne agent, one prompt, one metric firstArticle 8: Self-Improving Agents
4. Fancy algorithms are buggierComplexity = bugsComplexity = compound errorArticle 7: Guardrails and Safety
5. Data dominatesData structures > algorithmsContext Engineering > model selectionArticle 5: Context Is the Program

Some of these mappings are direct translations. Rules 1, 2, and 4 describe the same engineering discipline whether the program is C or an agent. Rule 3 is a strong analogy: “n is usually small” translates to “most agent problems are simpler than teams think.” Rule 5 is the strongest mapping of all. It is not an analogy. The context window IS a data structure. Pike’s claim that data structures determine algorithmic behavior is literally true for agents: the context determines the model’s reasoning.

Where the Series Goes from Here

This article gives you the decision framework. The remaining six articles in the series apply it.

Article 4: Build a Real Agent This Weekend bridges the gap between theory and working code. A complete research assistant agent with three tools, error handling, context management, and a basic eval.

Article 5: Context Is the Program takes Rule 5 and makes it operational. What enters the context window, why it matters, and what the five quality criteria from the Context Engineering paper mean in practice. Includes a code example showing what happens when a tool returns stale data and how a freshness check catches it.

Article 6: Evals: How to Know If Your Agent Actually Works applies Rules 1 and 2. Why most agent teams have no evals, what to measure, and how to build a basic evaluation pipeline. Includes the METR perception gap, Hamel Husain’s framework in depth, and a code example for scoring agent output.

Article 7: Guardrails and Safety embodies Rule 4. The compound error problem at scale, the three layers of guardrails (input, reasoning, output), Simon Willison’s “lethal trifecta,” and why simpler architectures are inherently safer. Includes a prompt injection detector.

Article 8: The Self-Improving Agent applies Rules 3 and 4 to learning. Start with the Karpathy Loop (one agent, one file, one metric), then add an inner/outer loop architecture for agents that improve over time. Covers where automation should stop and human judgment should begin.

Each article stands alone. But read in sequence, they form a single argument: the principles that governed good software in 1989 govern good agent development in 2026. The technology changed. The engineering discipline did not.

Do Next

If you are exploring agents for the first time

PriorityActionWhy it matters
This weekRead the five agent-era restatements in the synthesis table above. You do not need the original 1989 C document. Then re-read them with “context window” in place of “data structure” and “model” in place of “algorithm.”The mapping trains your instincts before you write your first agent. When you encounter agent development advice, you will have a filter for separating signal from hype.
This weekPick one agent task and define what “correct output” looks like before you run the agent. Write it down.This is Rule 2 in its simplest form. Most people skip this step and evaluate output by feel. Defining correctness first is the habit that separates productive agent use from aimless experimentation.
This monthRead Article 1 to understand agent components, then try the simplest possible agent: one model, one tool, one clear objective.Rule 3 says start simple. The temptation will be to add RAG, multi-agent routing, or a vector database. Resist it. Get one agent working reliably first.

If you are building agents at work

PriorityActionWhy it matters
This weekFor your most important agent workflow, list every step where data enters the context window. Count the steps. Calculate 0.85^n for that count.This makes Rule 4 visceral. If your agent workflow has 8 steps, the compound reliability at 85% per step is 27%. You cannot improve what you have not quantified.
This monthImplement one eval for your highest-stakes agent output. It does not need to be sophisticated: compare agent output against a known-good answer for 20 test cases.Rule 2 demands measurement. A basic eval with 20 test cases catches more problems than months of prompt tuning by feel. Start here.
This quarterAudit whether your team is spending more time on model selection or context quality. If the ratio favors model selection, flip it.Rule 5 says context dominates. In practice, most teams spend 80% of effort on model evaluation and 20% on what actually enters the context window. The leverage is in the 20%.

If you are leading an agent program

PriorityActionWhy it matters
This monthEstablish a “simplicity budget” for agent architectures on your team. Start with a budget of three tools and two handoffs. Adjust based on your measured reliability at each boundary. Any design that exceeds the budget requires written justification showing the simpler version was tried first.Rules 3 and 4 together. Complexity creep in agent architectures is the default. An explicit policy that requires proving simplicity was insufficient before adding complexity changes the decision dynamic.
This quarterBuild an eval pipeline that runs on every agent deployment, not just initial development. Track accuracy, latency, and cost over time.Rules 1 and 2 applied to production. Agent performance drifts. Models get updated. Tool APIs change. Without continuous measurement, you are flying blind after deployment.
This quarterInvest in Context Engineering as a named discipline on your team. Assign someone to own the quality of what enters agent context windows, the same way you assign ownership for Data Quality in pipelines.Rule 5 as an organizational principle. Context quality does not improve without ownership. If nobody owns it, nobody measures it, and it degrades silently.

This is Part 3 of 12 in The Practitioner’s Guide to AI Agents. ← Previous: When NOT to Build an Agent · Next: Build a Real Agent This Weekend →

Sources & References

  1. Rob Pike: Notes on Programming in C (1989)(1989)
  2. Rob Pike's 5 Rules of Programming (University of Texas)(1989)
  3. Hacker News: Pike's Rules (March 2026)(2026)
  4. AgentDrift: Probing Agent Influence on LLM Safety and Quality(2026)
  5. METR: Measuring the Impact of AI on Developer Productivity (RCT)(2025)
  6. Stack Overflow 2025 Developer Survey(2025)
  7. Karpathy AutoResearch(2026)
  8. Anthropic: Building Effective AI Agents(2025)
  9. Chroma Research: Context Rot(2026)
  10. Context Engineering for AI Agents(2026)
  11. Hamel Husain: Evals Skills for Coding Agents(2026)
  12. ReliabilityBench: Evaluating AI Agent Robustness(2026)
  13. Simon Willison: Agentic Engineering Patterns(2026)

Stay in the loop

Get new articles on data governance, AI, and engineering delivered to your inbox.

No spam. Unsubscribe anytime.