The Practitioner's Guide to AI Agents
A twelve-part series that takes you from 'what is an agent?' to building self-improving systems. Pick your starting point based on where you are today.
Why This Guide Exists
Every major AI company has published an agent-building guide in the last twelve months. Anthropic released “Building Effective Agents.” OpenAI published “A Practical Guide to Building Agents.” Google shipped its Agents white paper. LangChain, CrewAI, and a dozen startups have their own tutorials.
Those guides share a blind spot. They teach you how to build agents with their tools. They do not teach you how to think about agents as an engineering discipline. They show you the happy path: pick a model, connect some tools, write a system prompt, deploy. They do not show you the failure modes that emerge at scale, the design decisions that determine whether your agent is reliable or merely impressive in a demo, or the governance questions that will land on your desk six months after launch.
This guide fills that gap. It is written by a practitioner, not a vendor. The principles are vendor-neutral: they apply whether you use Claude, GPT, Gemini, or an open-source model. What matters is the engineering discipline, not the SDK.
What You Will NOT Find in Vendor Guides
When not to build an agent at all. No vendor will tell you that most agent projects should be scripts, workflows, or simple API calls instead. Article 2 gives you a decision framework with six disqualifiers.
The compound error math. If each step in an agent workflow has 85% accuracy, a ten-step workflow succeeds only 20% of the time. Vendor guides do not quantify this risk because it undermines the “agents can do anything” narrative. Article 3 introduces design principles that address compound error directly.
Context quality as the primary lever. Vendor guides focus on model selection and prompt engineering. This guide argues, with evidence, that the data entering the agent’s context window matters more than the model processing it. Article 6 lays out five engineering criteria for context quality.
Prompt specification, not prompt art. Most prompt failures in production are specification failures: vague criteria, missing examples, schemas without escape hatches. Article 5 covers the five patterns that fix this.
Evaluation beyond vibes. Most teams shipping agents have no systematic way to measure whether the output is good. Article 7 covers the eval hierarchy from assertions to red teaming.
Observability for non-deterministic systems. Your monitoring says 200 OK. The agent returned the wrong answer. Article 9 covers the five dimensions of agent observability that traditional APM cannot provide.
What This Series Covers
When I started building agent-based workflows, I made every mistake this series warns against. I over-engineered, I skipped evals, I bolted on multi-agent complexity before proving that a single agent was insufficient. The series is the playbook I wish I had read first. It uses Rob Pike’s five rules of programming (1989) as the decision framework that ties everything together. Pike was a systems programmer at Bell Labs whose design principles shaped Go, Unix, and Plan 9. His rules are famous because they keep being right.
Twelve articles. Each one stands on its own, but they build on each other.
| # | Article | What you will learn | Read time |
|---|---|---|---|
| 1 | What Is an AI Agent (and What Isn’t)? | The agent loop, tool calling, tool design principles, the spectrum from chatbot to autonomous agent | 11 min |
| 2 | When NOT to Build an Agent | Decision framework: when agents are the wrong choice and what to build instead | 10 min |
| 3 | Pike’s Five Rules for Agent Development | Five principles from 1989 that predict every agent failure mode in 2026 | 8 min |
| 4 | Build a Real Agent This Weekend | End-to-end: a working research agent with structured error handling, context management, and evals | 18 min |
| 5 | Prompt Engineering for Production Agents | Five patterns that separate production prompts from tutorial-grade prompting: explicit criteria, few-shot, nullable fields, enum-with-fallback | 14 min |
| 6 | Context Is the Program | Why the data inside the context window matters more than the model, plus context placement tactics | 16 min |
| 7 | Evals: How to Know If Your Agent Works | How to measure agent quality, catch “almost right” outputs, validation patterns, and build eval pipelines | 13 min |
| 8 | Guardrails and Safety | Input safety, output filtering, escalation patterns, workflow gates, and why simpler architectures are safer | 15 min |
| 9 | Observability: Seeing What Your Agent Actually Does | Five dimensions of agent observability, the tooling landscape, and a week-by-week instrumentation plan | 16 min |
| 10 | Multi-Agent Systems: When One Agent Isn’t Enough | Four signals you actually need multi-agent, three orchestration patterns, task decomposition, and why debugging is the real cost | 8 min |
| 11 | The Self-Improving Agent | Inner loops, outer loops, the Karpathy Loop, and where automation stops | 15 min |
| 12 | From Problem to Agent: Implementation Reference Guide | End-to-end walkthrough: applying the full series framework to a real problem | 17 min |
Where to Start Based on Where You Are
”I have never built an agent. I want to understand what the fuss is about.”
Start with Articles 1 and 2 (understand what agents are, then decide when not to build one). Then read Articles 3 and 4 (design principles, then build a real agent). Total: about 47 minutes. You will go from zero to a working agent with a principled foundation.
”I am experimenting with agents on my own time. I want to know if I am on the right track.”
Start with Article 3 (Pike’s rules as a decision framework), then Article 7 (evals). Most people experimenting with agents skip evaluation entirely and tune by feel. Article 7 gives you the measurement discipline that separates useful agents from impressive demos. If your agents produce confident but wrong answers, read Article 6 (context quality). If your prompts produce inconsistent results, read Article 5 (prompt engineering). Total: about 50 minutes.
”I am building or deploying agents at work. I want to validate my architecture.”
Read Articles 6, 7, and 8 in order. Context quality (6) tells you where your agent’s data pipeline is leaking. Evals (7) tells you how to measure the leak. Guardrails (8) tells you how to prevent the failures evals detect. Then read Article 9 (observability) to see what your agent actually does in production, Article 10 (multi-agent) only if you have hit the limits of a single agent, and Article 12 for the full implementation walkthrough. Total: about 75 minutes. These articles add the Data Quality, governance, and operational lens that vendor documentation omits.
”I want the full picture. Give me everything.”
Read all twelve in order. Articles 1-2 set the foundation. Articles 3-5 cover principles, building, and prompt specification. Article 6 covers context quality. Articles 7-8 cover measurement and safety. Article 9 covers observability. Article 10 covers multi-agent systems. Article 11 covers self-improvement. Article 12 ties it all together. Total: about 2.5 hours of reading, plus time with the diagrams, tables, and code examples.
Choosing a Framework
One of the most common questions when starting with agents: which framework should I use? The honest answer is that the framework matters less than the thinking. But here is a practical comparison.
| Framework | Best for | Control level | Learning curve | When to use |
|---|---|---|---|---|
| Anthropic SDK (direct) | Simple agents, full control, learning | Maximum | Low (just Python + API) | Default starting point. This series uses it. |
| OpenAI Agents SDK | OpenAI-native teams, tool-heavy agents | High | Low-Medium | When your team is already on OpenAI |
| LangGraph | Complex stateful workflows, multi-agent | Medium-High | Medium | When you need explicit state machines |
| CrewAI | Role-based multi-agent teams | Medium | Medium | When your task naturally decomposes into roles |
| No framework | Scripts, pipelines, deterministic tasks | N/A | N/A | When you should not build an agent (see Article 2) |
This series uses the Anthropic SDK directly. The principles apply regardless of framework. If you have read Article 4 and built the research agent, you already have the patterns that frameworks automate. Add a framework when you outgrow the manual loop, not before.
What This Series Is Not
This is not a replacement for Anthropic’s documentation, OpenAI’s API guides, or LangChain’s tutorials. Those resources tell you how to use specific tools. This series tells you how to think about agent development: which problems to solve first, which complexity to avoid, and which principles hold regardless of which framework you choose.
Code examples use Python with the Anthropic SDK, but the principles apply regardless of framework.
Every article includes a diagram, comparison tables, and a “Do Next” section with actions tiered for your experience level.
Stay in the loop
Get new articles on data governance, AI, and engineering delivered to your inbox.
No spam. Unsubscribe anytime.