AI Products & Strategy March 25, 2026 · 4 min read

The Practitioner's Guide to AI Agents

A twelve-part series that takes you from 'what is an agent?' to building self-improving systems. Pick your starting point based on where you are today.

By Vikas Pratap Singh

#ai-agents #series-guide #agent-architecture #ai-fundamentals

Why This Guide Exists

Every major AI company has published an agent-building guide in the last twelve months. Anthropic released “Building Effective Agents.” OpenAI published “A Practical Guide to Building Agents.” Google shipped its Agents white paper. LangChain, CrewAI, and a dozen startups have their own tutorials.

Those guides share a blind spot. They teach you how to build agents with their tools. They do not teach you how to think about agents as an engineering discipline. They show you the happy path: pick a model, connect some tools, write a system prompt, deploy. They do not show you the failure modes that emerge at scale, the design decisions that determine whether your agent is reliable or merely impressive in a demo, or the governance questions that will land on your desk six months after launch.

This guide fills that gap. It is written by a practitioner, not a vendor. The principles are vendor-neutral: they apply whether you use Claude, GPT, Gemini, or an open-source model. What matters is the engineering discipline, not the SDK.

What You Will NOT Find in Vendor Guides

When not to build an agent at all. No vendor will tell you that most agent projects should be scripts, workflows, or simple API calls instead. Article 2 gives you a decision framework with six disqualifiers.

The compound error math. If each step in an agent workflow has 85% accuracy, a ten-step workflow succeeds only 20% of the time. Vendor guides do not quantify this risk because it undermines the “agents can do anything” narrative. Article 3 introduces design principles that address compound error directly.

Context quality as the primary lever. Vendor guides focus on model selection and prompt engineering. This guide argues, with evidence, that the data entering the agent’s context window matters more than the model processing it. Article 6 lays out five engineering criteria for context quality.

Prompt specification, not prompt art. Most prompt failures in production are specification failures: vague criteria, missing examples, schemas without escape hatches. Article 5 covers the five patterns that fix this.

Evaluation beyond vibes. Most teams shipping agents have no systematic way to measure whether the output is good. Article 7 covers the eval hierarchy from assertions to red teaming.

Observability for non-deterministic systems. Your monitoring says 200 OK. The agent returned the wrong answer. Article 9 covers the five dimensions of agent observability that traditional APM cannot provide.

What This Series Covers

When I started building agent-based workflows, I made every mistake this series warns against. I over-engineered, I skipped evals, I bolted on multi-agent complexity before proving that a single agent was insufficient. The series is the playbook I wish I had read first. It uses Rob Pike’s five rules of programming (1989) as the decision framework that ties everything together. Pike was a systems programmer at Bell Labs whose design principles shaped Go, Unix, and Plan 9. His rules are famous because they keep being right.

Twelve articles. Each one stands on its own, but they build on each other.

#	Article	What you will learn	Read time
1	What Is an AI Agent (and What Isn’t)?	The agent loop, tool calling, tool design principles, the spectrum from chatbot to autonomous agent	11 min
2	When NOT to Build an Agent	Decision framework: when agents are the wrong choice and what to build instead	10 min
3	Pike’s Five Rules for Agent Development	Five principles from 1989 that predict every agent failure mode in 2026	8 min
4	Build a Real Agent This Weekend	End-to-end: a working research agent with structured error handling, context management, and evals	18 min
5	Prompt Engineering for Production Agents	Five patterns that separate production prompts from tutorial-grade prompting: explicit criteria, few-shot, nullable fields, enum-with-fallback	14 min
6	Context Is the Program	Why the data inside the context window matters more than the model, plus context placement tactics	16 min
7	Evals: How to Know If Your Agent Works	How to measure agent quality, catch “almost right” outputs, validation patterns, and build eval pipelines	13 min
8	Guardrails and Safety	Input safety, output filtering, escalation patterns, workflow gates, and why simpler architectures are safer	15 min
9	Observability: Seeing What Your Agent Actually Does	Five dimensions of agent observability, the tooling landscape, and a week-by-week instrumentation plan	16 min
10	Multi-Agent Systems: When One Agent Isn’t Enough	Four signals you actually need multi-agent, three orchestration patterns, task decomposition, and why debugging is the real cost	8 min
11	The Self-Improving Agent	Inner loops, outer loops, the Karpathy Loop, and where automation stops	15 min
12	From Problem to Agent: Implementation Reference Guide	End-to-end walkthrough: applying the full series framework to a real problem	17 min

Where to Start Based on Where You Are

”I have never built an agent. I want to understand what the fuss is about.”

Start with Articles 1 and 2 (understand what agents are, then decide when not to build one). Then read Articles 3 and 4 (design principles, then build a real agent). Total: about 47 minutes. You will go from zero to a working agent with a principled foundation.

”I am experimenting with agents on my own time. I want to know if I am on the right track.”

Start with Article 3 (Pike’s rules as a decision framework), then Article 7 (evals). Most people experimenting with agents skip evaluation entirely and tune by feel. Article 7 gives you the measurement discipline that separates useful agents from impressive demos. If your agents produce confident but wrong answers, read Article 6 (context quality). If your prompts produce inconsistent results, read Article 5 (prompt engineering). Total: about 50 minutes.

”I am building or deploying agents at work. I want to validate my architecture.”

Read Articles 6, 7, and 8 in order. Context quality (6) tells you where your agent’s data pipeline is leaking. Evals (7) tells you how to measure the leak. Guardrails (8) tells you how to prevent the failures evals detect. Then read Article 9 (observability) to see what your agent actually does in production, Article 10 (multi-agent) only if you have hit the limits of a single agent, and Article 12 for the full implementation walkthrough. Total: about 75 minutes. These articles add the Data Quality, governance, and operational lens that vendor documentation omits.

”I want the full picture. Give me everything.”

Read all twelve in order. Articles 1-2 set the foundation. Articles 3-5 cover principles, building, and prompt specification. Article 6 covers context quality. Articles 7-8 cover measurement and safety. Article 9 covers observability. Article 10 covers multi-agent systems. Article 11 covers self-improvement. Article 12 ties it all together. Total: about 2.5 hours of reading, plus time with the diagrams, tables, and code examples.

Choosing a Framework

One of the most common questions when starting with agents: which framework should I use? The honest answer is that the framework matters less than the thinking. But here is a practical comparison.

Framework	Best for	Control level	Learning curve	When to use
Anthropic SDK (direct)	Simple agents, full control, learning	Maximum	Low (just Python + API)	Default starting point. This series uses it.
OpenAI Agents SDK	OpenAI-native teams, tool-heavy agents	High	Low-Medium	When your team is already on OpenAI
LangGraph	Complex stateful workflows, multi-agent	Medium-High	Medium	When you need explicit state machines
CrewAI	Role-based multi-agent teams	Medium	Medium	When your task naturally decomposes into roles
No framework	Scripts, pipelines, deterministic tasks	N/A	N/A	When you should not build an agent (see Article 2)

This series uses the Anthropic SDK directly. The principles apply regardless of framework. If you have read Article 4 and built the research agent, you already have the patterns that frameworks automate. Add a framework when you outgrow the manual loop, not before.

What This Series Is Not

This is not a replacement for Anthropic’s documentation, OpenAI’s API guides, or LangChain’s tutorials. Those resources tell you how to use specific tools. This series tells you how to think about agent development: which problems to solve first, which complexity to avoid, and which principles hold regardless of which framework you choose.

Code examples use Python with the Anthropic SDK, but the principles apply regardless of framework.

Every article includes a diagram, comparison tables, and a “Do Next” section with actions tiered for your experience level.