AI Products & Strategy March 25, 2026 · 11 min read

What Is an AI Agent (and What Isn't)?

An AI agent is a system that uses an LLM to decide which actions to take in a loop until a goal is met. This article breaks down the four components every agent shares, the spectrum from chatbot to autonomous agent, what tool calling actually looks like in code, and the design principles that separate good tool definitions from bad ones.

By Vikas Pratap Singh
#ai-agents #agent-architecture #tool-calling #ai-fundamentals

Part 1 of 12: The Practitioner’s Guide to AI Agents

The Word Everyone Uses, Nobody Defines

Every vendor deck in 2026 has the word “agent” somewhere on it. Salesforce has Agentforce. Microsoft has Copilot agents. Google has Gemini agents. Startups that were “AI-powered” in 2024 are “agentic” in 2026. The word has become so overloaded that it risks meaning nothing at all.

I saw this firsthand while reviewing a design spec for a data AI agent. The team had written detailed requirements, architecture diagrams, and a project timeline. The document said “AI agent” on every page. But as I read through the spec, the design was something different: a chat interface where the user manually selected an action from a pre-defined list, provided the relevant context, and an LLM processed the request and returned a response. That response then fed into another chat session for the next step.

It was a well-built LLM-powered workflow. It was not an agent. There was no loop. No autonomous tool selection. No goal-directed iteration. The user was making every decision; the LLM was formatting the output. The team had spent weeks building toward something they called an “AI agent” because the term was everywhere in 2026. The confusion was not about capability. It was about vocabulary.

This article fixes that. I will give you a precise definition, break it into components you can inspect, and show you what separates an agent from a chatbot, a copilot, and an autonomous system. No frameworks to memorize, no vendor narratives. Just the mechanical reality of what these systems do.

This is the first article in a nine-part series. Everything that follows, from when not to build an agent to context quality to evals to guardrails to self-improving agents, builds on what we establish here.

One Sentence

An AI agent is a system that uses an LLM to decide which actions to take in a loop until a goal is met.

That sentence does a lot of work. Let me unpack each piece.

“A system”: not a single model call. An agent is composed of multiple components working together. The LLM is one part of it, not the whole thing.

“Uses an LLM to decide”: the model is the reasoning engine. It reads context, weighs options, and picks the next step. This is what makes agents different from traditional automation, which follows hardcoded rules. The LLM can adapt to novel situations because it reasons about them rather than matching them against a decision tree.

“Which actions to take”: agents act on the world. They call APIs, query databases, write files, send messages. A system that only generates text is not an agent. The ability to take actions through tools is what gives agents their power and their risk.

“In a loop”: one action is not enough. The agent acts, observes the result, and decides what to do next. This iteration is the core mechanical difference between an agent and a single LLM call. A chatbot generates one response. An agent generates a response, evaluates whether the goal is met, and continues until it is.

“Until a goal is met”: agents are goal-directed. They have a stopping condition. A well-designed agent knows when it has succeeded and when it should ask for help. A poorly designed agent loops forever or gives up too early.

This definition is intentionally narrow. It excludes many things that get marketed as “agents” but are really single-step LLM calls with a good prompt. That exclusion is the point.

A note on scope: the word “agent” has a 30-plus year history in AI and computer science. FIPA standards defined agent communication protocols in the 1990s. BDI (belief-desire-intention) architectures formalized goal-directed reasoning. Robotics and reinforcement learning have their own agent traditions, where an agent is anything that perceives an environment and acts on it.

This series focuses specifically on LLM-based agents because that is the pattern practitioners are building in 2025-2026. If you come from an RL or robotics background, this definition is deliberately narrower than what you are used to. The narrowing is intentional, not ignorant.

For practitioners: If you have built agents before, the value of this article is the vocabulary it establishes for the rest of the series. The four components and the spectrum table are referenced in every subsequent article. Skim the definitions and move to Article 2.

The Four Components

Every agent, from a weekend prototype to a production deployment at scale, shares four components. The sophistication varies; the structure does not.

1. LLM: The Reasoning Engine

The LLM reads the current state of the world (the context window), decides what to do next, and formulates the action. It is the brain of the agent, but “brain” oversells it. A more accurate analogy: the LLM is a pattern-matching engine that produces plausible next steps given everything it can see. It does not understand the world. It predicts useful actions based on context.

This distinction matters. When an agent fails, the instinct is to blame the model. More often, the model was reasoning correctly over bad input. The context was wrong, not the reasoning. We will go deep on this in Article 5 of this series, where context quality becomes the central concern.

2. Tools: The Hands

Tools are functions the agent can call to interact with the outside world. A search API. A database query. A calculator. A code interpreter. A file system. Without tools, the LLM can only generate text. With tools, it can act.

The Anthropic tool use documentation describes the mechanism: you define a set of tools with names, descriptions, and parameter schemas. The LLM decides when to call a tool, which tool to call, and what arguments to pass. The system executes the tool and returns the result to the LLM for further reasoning.

Tools are what make agents useful. They are also what make agents dangerous. An agent with access to a database can read data. An agent with write access can delete it. The scope of an agent’s tools defines the boundary of what it can do, and what it can break.

3. Memory: The State

Memory is everything the agent knows at a given point in its execution. This includes the conversation history, the results of previous tool calls, any retrieved documents, and the system prompt that defines the agent’s role.

Memory comes in two forms. Short-term memory is the context window: the rolling buffer of text that the LLM can see during a single session. Long-term memory is persistent storage (a database, a file, a vector store (a database optimized for finding similar text)) that the agent can read from and write to across sessions.

Most agent failures trace back to memory problems. The context window fills up and critical information gets pushed out. Retrieved documents are stale. Prior tool results contradict new ones. Memory management is not glamorous, but it is where most production agent bugs live.

4. Loop: The Iteration

The loop is what ties the other three together. It is the control flow that repeats: observe the current state, think about what to do, act by calling a tool, observe the result, and decide whether to continue or stop.

Without the loop, you have a single LLM call. With the loop, you have an agent. The loop is what allows agents to handle multi-step tasks, recover from errors, and refine their approach based on intermediate results.

Anthropic’s guide to building effective agents makes a point worth internalizing: the most effective agent architectures are often the simplest ones. A single LLM in a loop with well-defined tools frequently outperforms complex multi-agent orchestration frameworks. Simplicity is not a limitation. It is a design choice that reduces failure surfaces.

The Spectrum: Chatbot to Autonomous Agent

Not everything that uses an LLM is an agent. The industry has been sloppy with terminology, so here is a precise spectrum.

ChatbotCopilotAgentAutonomous Agent
LoopNo loop. Single turn: prompt in, response out.No loop. Suggests actions within a human workflow.Yes. Iterates until goal is met.Yes. Runs independently for extended periods.
ToolsNone (or very limited). Generates text only.Limited. Can read context (files, tabs) but rarely writes.Yes. Calls APIs, queries databases, writes files.Yes, with broader scope and fewer restrictions.
AutonomyNone. Human drives every interaction.Low. Human accepts or rejects each suggestion.Medium. Runs a task, may ask for confirmation at key steps.High. Runs continuously with minimal human oversight.
Decision-makingReactive. Answers questions.Assistive. Suggests next steps within a human-owned workflow.Goal-directed. Chooses actions to achieve an objective.Self-directed. Sets sub-goals and allocates resources.
ExampleChatGPT in a simple Q&A conversationGitHub Copilot suggesting code completions; Excel Copilot suggesting formulas based on your dataClaude Code executing a multi-file refactorA research agent that runs experiments for days (Karpathy’s AutoResearch)
Risk profileLow. Output is text; human decides what to do with it.Low-medium. Suggestions can be accepted uncritically.Medium-high. Agent takes actions that change state.High. Extended autonomy means errors compound over time.

The boundaries are not sharp. A copilot that can execute code is moving toward agent territory. A chatbot with web search has a tool but no loop. The spectrum is useful not for classification but for understanding what you are building and what safeguards it needs.

The key threshold is the loop. Once a system iterates, observes results, and decides its own next step, it has crossed from assistance to agency. That crossing changes the risk profile fundamentally, because the system is now making decisions that the human did not explicitly approve in advance.

If you have used Claude Code to refactor files across a project, you have already seen the agent loop in action. Claude Code observes your codebase, decides which files to edit, makes changes, checks the result, and iterates. The loop you watched is the same loop described above.

The Agent Loop, Visualized

The loop is the defining mechanism. Here is what it looks like.

The agent loop: observe context, think about next action, act by calling a tool, check if the goal is met, and loop back if not

The agent starts by observing: reading the user’s request, the conversation history, and any available context. It then thinks: the LLM reasons about what action to take next. It acts: calling a tool (an API, a database query, a code interpreter). The tool returns a result, and the agent checks: is the goal met? If yes, it returns the final result. If no, the result feeds back into the observation step, and the loop continues.

This loop is the reason agents can handle tasks that would require multiple manual steps. A research agent does not just search once. It searches, reads the results, identifies gaps, searches again with refined queries, synthesizes, and checks whether the synthesis answers the original question. Each iteration adds information and refines the approach.

What Tool Calling Actually Looks Like

The mechanism that makes agents possible is tool calling: the LLM’s ability to output structured function calls instead of plain text. Here is what that looks like with the Anthropic SDK.

You will need an Anthropic API key (free tier available at console.anthropic.com) set as the ANTHROPIC_API_KEY environment variable to run this code.

View code: a single tool call with Claude
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }],
    messages=[{"role": "user", "content": "What's the weather in Chicago?"}]
)

# Claude responds with: tool_use block calling get_weather(city="Chicago")

The key insight: you define the tools, and the LLM decides when and how to use them. You did not write an if-statement that says “if the user asks about weather, call the weather API.” The LLM read the user’s message, saw that a weather tool was available, and decided on its own that calling it was the right next step.

This is a single tool call, not yet an agent. It becomes an agent when you wrap it in a loop: call the tool, feed the result back to the LLM, let the LLM decide whether to call another tool or return a final answer.

The “Hello World” Agent: Weather Lookup

The simplest possible agent is one with a single tool and a loop. Here is a weather agent that decides which action to take based on the user’s question.

View code: a minimal weather agent with a loop
import anthropic

client = anthropic.Anthropic()
tools = [{
    "name": "get_weather",
    "description": "Get current weather for a city",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
    }
}]

def call_weather_api(city):
    return {"temperature": "72F", "conditions": "sunny", "city": city}

def run_agent(user_message):
    messages = [{"role": "user", "content": user_message}]
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=1024,
            tools=tools, messages=messages
        )
        if response.stop_reason == "tool_use":
            tool_call = next(b for b in response.content if b.type == "tool_use")
            result = call_weather_api(tool_call.input["city"])  # your API call
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": [{
                "type": "tool_result", "tool_use_id": tool_call.id,
                "content": result
            }]})
        else:
            return response.content[0].text  # agent is done

This is fifteen lines of logic (excluding the API client setup). The while True loop is the agent loop. The if response.stop_reason == "tool_use" check is where the agent decides to act. When the LLM stops generating because it wants to call a tool, the code executes that tool and feeds the result back. When the LLM stops because it has a final answer, the loop exits.

Ask this agent “What’s the weather in Chicago?” and it calls get_weather("Chicago") once and returns the answer. Ask it “Compare the weather in Chicago and Tokyo” and it calls get_weather twice, once for each city, before synthesizing a comparison. The LLM decides the plan. The loop executes it.

This tiny agent already demonstrates all four components: the LLM (Claude) reasons about which tool to call, the tool (get_weather) interacts with an external API, memory accumulates in the messages list as the conversation grows, and the loop iterates until the goal is met.

Tool Design Principles

The weather agent above has one tool with a two-word description: “Get current weather for a city.” That is sufficient for a demo. It is not sufficient for a production agent with ten or twenty tools. When an agent has many tools, the LLM reads every tool description on every iteration to decide which one to call. The quality of those descriptions determines whether the agent picks the right tool.

Four principles separate well-designed tools from tools that cause silent failures.

Tell the Model WHEN, Not Just WHAT

Most tool descriptions say what the tool does: “Get current weather for a city.” Better descriptions say when to use it: “Use this when the user asks about current weather conditions, temperature, or forecast for a specific location. Do not use this for historical weather data; use get_historical_weather instead.”

The WHEN framing matters because agents choose between tools. If two tools have similar WHAT descriptions, the LLM cannot distinguish them reliably. WHEN descriptions create decision boundaries. They tell the model the trigger conditions, the constraints, and what the tool is not for.

# Weak: what it does
{"description": "Search for customer data"}

# Strong: when to use it
{"description": (
    "Look up a customer by ID or email. Use this when the user provides "
    "a customer identifier and needs account details. Returns name, email, "
    "plan tier, and account status. If you have a company name but no "
    "customer ID, use list_customers first to find the right ID."
)}

The strong description does three things the weak one does not: it specifies the input trigger (customer ID or email), the return shape (name, email, plan, status), and the cross-reference (use list_customers if you only have a company name). That cross-reference prevents the agent from calling the wrong tool and receiving an unhelpful error.

Name Tools as Verb-Noun Pairs

Tool names should read as commands: get_weather, search_customers, create_invoice, validate_schema. The verb tells the LLM the action type (read, write, compute), and the noun tells it the target. Vague names like process_data or handle_request give the model no signal about when to use the tool.

Names also matter for debugging. When you read agent logs and see save_note was called, you know what happened. When you see do_thing_2 was called, you have to dig into the implementation.

Know When to Split, When to Merge

A common question: should search_customers and get_customer_by_id be one tool or two? The decision criterion is whether the inputs and outputs differ meaningfully.

Split when the tool has distinct modes that take different inputs and return different shapes. A search that takes a query string and returns a list of matches is a different operation from a lookup that takes an exact ID and returns a single record. Merging them forces the LLM to figure out which mode it is in, and it gets it wrong often enough to matter.

Merge when the operations are closely related and share the same input shape. If get_customer_email and get_customer_phone both take a customer ID and return a single field, merge them into get_customer_details that returns both. Splitting fine-grained tools that always get called together wastes iterations and tokens.

The test: if you find the agent frequently calling Tool A immediately followed by Tool B with the same input, merge them. If you find the agent confused about which mode of a tool to use, split it.

Document Edge Cases in the Schema

Tool schemas define the parameters, but they rarely document what happens at the boundaries. What does search_customers return when the query matches nothing? What happens if get_weather receives a city name that does not exist? What is the maximum length for the query parameter?

These edge cases end up in the tool description, not the parameter schema.

{
    "name": "search_customers",
    "description": (
        "Search customers by name, email, or company. Returns up to 20 "
        "matching records sorted by relevance. Returns an empty list (not "
        "an error) when no customers match. Queries longer than 200 "
        "characters are truncated."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search term. Partial matches are supported."
            }
        },
        "required": ["query"]
    }
}

The description tells the model three things it cannot learn from the parameter schema alone: the result limit (20), the empty-result behavior (empty list, not error), and the input limit (200 characters). Without these, the agent may retry an empty search thinking it failed, or send a paragraph as a query and wonder why results are poor.

Article 4 puts these principles into practice with a three-tool research agent. The tool descriptions there show the WHEN pattern, and the structured error handling section shows how to communicate edge case behavior through error categories.

Why This Matters for Data Practitioners

If you work in Data Governance, Data Architecture, Data Engineering, or Data Quality, agents are about to become relevant to your daily work in a very specific way: they are becoming the interface layer between humans and data platforms.

Consider what is already happening. OpenAI’s practical guide to building agents describes agents that query databases, run analyses, and return insights in natural language. Google’s agent documentation focuses on agents that orchestrate API calls across enterprise systems. The pattern is consistent: instead of building dashboards and reports for humans to consume, organizations are building agents that consume data platforms directly and deliver answers.

This shift has three implications.

Your data platform’s next power user is an LLM. The queries hitting your warehouse, the API calls hitting your metadata catalog, the retrieval requests hitting your document store: increasingly, these will come from agents, not humans. The quality, latency, and schema stability of your data products now affect agent performance directly.

Data Quality is no longer just about dashboards. When a human reads a stale number on a dashboard, they might notice it looks off and dig deeper. When an agent receives a stale number from an API, it reasons over it as if it were fresh. It has no instinct for “that doesn’t look right.” The quality of your data is the quality of the agent’s reasoning. Gartner’s prediction that over 40% of agentic AI projects will be canceled by 2027 points directly to this gap between agent ambition and data readiness.

Governance extends inside the agent. Traditional Data Governance focuses on who can access what data and how it should be handled. Agent governance adds new questions: which tools can the agent call? What data can it write? How do you audit a decision chain that spans five tool calls and three reasoning steps? The AI Governance frameworks that organizations are building need to cover agent behavior, not just model training.

What Comes Next in This Series

This article gives you the vocabulary. The rest of the series gives you the practice.

  • Article 2 asks the question most teams skip: should this even be an agent?
  • Article 3 introduces the five design principles that should guide every agent architecture decision.
  • Article 4 builds a real agent from scratch, with full error handling and a basic eval.
  • Article 5 goes deep on context quality: why the data entering the agent’s context window matters more than the model you choose.
  • Article 6 covers evals, because you cannot improve what you cannot measure.
  • Article 7 addresses guardrails and safety, the boundaries every agent needs before it touches production.
  • Article 8 brings it together with self-improving agents: systems that learn from their own execution.
  • Article 9 walks through a complete implementation, applying every concept from the series to one problem.

Each article stands alone, but they build on each other. Start here, then go where your need is greatest.

Do Next

PriorityActionWhy it matters
No experienceAsk Claude or ChatGPT to use a tool: try “search the web for X” or “run this Python code.” Watch how the model decides to call a tool, executes it, and reasons over the result.You just saw an agent loop in action. The model observed your request, decided to use a tool, acted, and incorporated the result. That is the four-component cycle from this article.
LearningBuild a single-tool agent this weekend. Use the code snippet above as a starting point. Pick any API (weather, news, a public dataset) and write a loop that calls it.The gap between reading about agents and building one is where real understanding forms. Fifteen lines of code and an afternoon will teach you more about tool calling, memory accumulation, and loop control than any whitepaper.
PractitionerAudit your existing agents against the four-component framework. For each agent: what is the LLM? What tools does it have? How is memory managed? What controls the loop and its stopping condition?Most production agents have implicit answers to these questions buried in code. Making them explicit reveals gaps: tools with no error handling, memory with no size limits, loops with no maximum iteration count. The audit surfaces the risks before they surface in production.

This is Part 1 of 12 in The Practitioner’s Guide to AI Agents. Next: When NOT to Build an Agent →

Sources & References

  1. Anthropic: Tool Use with Claude(2025)
  2. Anthropic: Building Effective Agents(2024)
  3. OpenAI: A Practical Guide to Building Agents(2025)
  4. LangChain: What Is an AI Agent?(2025)
  5. Google DeepMind: Agents White Paper(2025)
  6. Andrej Karpathy: No Priors Podcast on Code Agents and the Loopy Era of AI(2026)
  7. Gartner: Predicts Over 40% of Agentic AI Projects Will Be Canceled by 2027(2025)
  8. Simon Willison: Building Effective Agents (Commentary)(2024)

Stay in the loop

Get new articles on data governance, AI, and engineering delivered to your inbox.

No spam. Unsubscribe anytime.