← blog··12 min read

What Is the Claude Agent SDK? A Developer's Guide

The Claude Agent SDK is Claude Code's engine, programmable in Python and TypeScript. A working engineer's guide to what it does, when to use it, and a complete hello-world agent you can run today.

agent-sdkagentsapitutorial
Kev Gary
Senior Software Engineer, Credit Karma at Intuit

The Claude Agent SDK is the most under-appreciated part of Anthropic's developer platform. Most engineers I talk to have used Claude Code, have heard of the Messages API, and have a vague sense that "agents" are a thing — but have never actually wired up the Agent SDK to do something useful in their own app. That's a mistake, and this post exists to fix it.

Here's the one-sentence version: the Agent SDK is Claude Code's engine, made programmable. Everything Claude Code does — the tool use loop, the permission system, the file operations, the command execution, the MCP support — is exposed as a library you can embed in your own Python or TypeScript program.

That means you can build custom agents that do what Claude Code does, but shaped to your workflow. A code review bot. A CI fixer. An internal dev tool. A scheduled agent that runs overnight and opens PRs. This post will tell you exactly when to use it, when not to, and how to build your first one.

What the Agent SDK is

The Agent SDK gives you a programmatic interface to Claude's agentic loop: receive a goal, pick a tool, execute, read the result, decide what's next, repeat until done. Everything you see in Claude Code — file operations, bash execution, search, MCP, the permission layer — is available to the SDK too.

You import it, configure it, give it a prompt, and it runs. The SDK handles:

  • The tool use loop (model decides which tool to call, you don't write the state machine)
  • File operations (read, write, edit, glob, grep)
  • Shell command execution via a Bash tool
  • Permissions and allow-lists
  • CLAUDE.md loading (yes, the same file)
  • MCP server integration
  • Streaming output and token accounting
  • Session state if you want to persist conversations

You focus on what the agent should do. The SDK handles how to run the loop that gets it there.

Agent SDK vs Client SDK vs Managed Agents — when to use each

This is the part that confuses people, so let's be precise. Anthropic ships three different "SDK" things, and they serve different purposes.

Anthropic Client SDK (the Messages API client).

The lowest-level building block. A direct wrapper around the Messages API — you send messages, you get responses, you handle everything yourself. If you want tool use, you define the tool schemas, you catch the model's tool call requests, you execute them, and you feed the results back. No agent loop is provided — you write it.

Use the Client SDK when:

  • You want full control over every turn of the conversation.
  • You're building a custom chat interface in your product.
  • Tool use is simple or absent.
  • You want the smallest possible dependency footprint.

Agent SDK (Claude Code's engine).

The Agent SDK sits on top of the Client SDK and gives you the full agentic loop, built-in tools, file operations, shell access, and MCP integration. You're essentially embedding Claude Code as a library. You give it a prompt, it runs until the task is done or it stops for a permission check.

Use the Agent SDK when:

  • You want agentic behavior: multi-step, tool-using, goal-oriented.
  • You want the built-in tools (file ops, bash, web search, etc.) without rewriting them.
  • You want CLAUDE.md, MCP, and the permission model without rebuilding them.
  • You're running agents inside your own infrastructure (your server, your CI, your laptop).

Managed Agents API (hosted).

Managed Agents is the "I don't want to run this myself" tier. You define the agent's behavior, environment, and tools via an API, and Anthropic runs the agent loop on their infrastructure. You get endpoints for creating agents, environments, and sessions. Agents run in Anthropic's execution environment with isolated filesystems, network access, and tool support — no servers of your own required.

Use Managed Agents when:

  • You want agents in production without operating the infrastructure.
  • You need isolated execution environments (one per user, per tenant, per task).
  • You want long-running agent sessions with persistent state managed for you.
  • You're shipping a product feature where reliability matters more than customization.

Here's the cheat sheet:

I want to...Use
Directly call Claude with low-level controlClient SDK
Run an agentic loop in my own infraAgent SDK
Run agents in production with zero opsManaged Agents API

The three layer on top of each other. Managed Agents uses patterns from the Agent SDK, which uses the Client SDK, which hits the Messages API. You can drop down a layer when you need more control, or stay at the top when you want less.

A "hello world" Agent SDK agent

Let's build a real one. We'll create an agent that reads files, runs commands, and can do a real task — in this case, explain a codebase and suggest improvements. This will run locally, in Python, and use Claude Code's built-in tools.

First, install the SDK:

pip install anthropic-agent-sdk

Then set your API key:

export ANTHROPIC_API_KEY=sk-ant-...

Now the code. This is a complete, runnable agent:

from anthropic_agent_sdk import Agent, allow_tools
 
# Create an agent configured with a set of built-in tools.
agent = Agent(
    model="claude-sonnet-4-6",
    tools=allow_tools(["Read", "Glob", "Grep", "Bash"]),
    cwd=".",  # the directory the agent operates in
    system_prompt=(
        "You are a helpful code reviewer. Keep responses concise. "
        "Never modify files — this is a read-only review."
    ),
)
 
# Give it a goal and let it run the loop.
result = agent.run(
    "Read the src/ directory structure, then pick the three files that "
    "look most important and summarize what each one does. Finish with "
    "two concrete suggestions for improvement."
)
 
print(result.final_message)
print(f"\n--- Used {result.total_tokens} tokens, {result.tool_calls} tool calls ---")

Run it:

python agent.py

Watch what happens. The agent will:

  1. Use Glob to list the repo structure.
  2. Use Read to open the most interesting-looking files.
  3. Maybe use Grep to look for patterns.
  4. Maybe use Bash to run a command like wc -l to understand file sizes.
  5. Decide it has enough context.
  6. Produce the final summary.

The whole thing is an agentic loop — the model picks tools, the SDK runs them, the results go back to the model, the model decides whether it's done. You didn't write any of that state machine. That's what the SDK gives you.

Built-in tools

The Agent SDK ships with the same core tools that Claude Code uses. You allow-list the ones your agent needs:

  • Read — read file contents.
  • Write — create new files.
  • Edit — modify existing files with exact string replacements.
  • Glob — pattern match file paths.
  • Grep — ripgrep-backed content search.
  • Bash — execute shell commands.
  • WebFetch / WebSearch — fetch and search the web.
  • TodoWrite — the task-list tool Claude Code uses for multi-step plans.

You'll typically allow-list a subset per use case. A code reviewer probably wants Read, Glob, Grep, and nothing else. An automated fixer wants Read, Edit, Bash (to run tests), and Grep. A research agent might want WebFetch and WebSearch.

The allow-list is how you bound the agent's blast radius. If you give an agent Bash, it can (with permission) run any shell command. If you give it Read only, it physically cannot modify files.

Configuration and safety

A few patterns I use in every Agent SDK deployment.

Allow-list tools tightly. Default to the smallest set that lets the task succeed. If the agent only needs to read, don't give it write access. If it needs to run commands, think hard about whether you can scope those commands with a pre-approved allow-list.

Use CLAUDE.md. The SDK respects CLAUDE.md the same way Claude Code does. A well-tuned CLAUDE.md is as valuable for an SDK-based agent as it is for interactive use — maybe more, because an SDK agent doesn't have a human to correct mistakes mid-flight.

Scope the CWD. Run agents from the narrowest possible working directory. If your agent only needs to touch src/lib, run it from src/lib, not from the repo root. File operations respect the CWD.

Set turn limits. Agents can loop if they're confused or if the task is underspecified. Always set a max turn count via agent.run(..., max_turns=20) so a runaway agent stops itself.

Log everything. Every tool call, every token count, every model response. When you're debugging why an agent did something surprising, detailed logs are the difference between a 10-minute fix and an hour of guesswork.

Use the Advisor Tool pattern. More on this in a minute — it's the best cost/quality pattern I know.

Real use cases

Here's where the Agent SDK actually shines in production. These are patterns I've either built or seen built.

Automated code review bot. A GitHub Action that runs on every PR. It checks out the diff, uses the Agent SDK with Read, Glob, Grep, Bash (to run tests and linters), and produces a review posted back as a PR comment. The CLAUDE.md encodes team standards, and a custom system prompt focuses the reviewer on the specific issues you care about — bugs, security, missing tests.

Internal dev tools. "I want to run natural language queries against our internal API docs" → build an SDK agent with MCP access to a vector store, expose it as a Slack slash command, done.

CI fixers. Tests failing? Type errors? Lint issues? Kick off an SDK agent in CI that checks out the branch, tries to fix the issue, runs the tests again, and pushes a commit with the fix. This is useful for small, mechanical failures where a human looking at it would say "just run the formatter."

Scheduled dependency upgrades. Cron job, once a week, runs an SDK agent that runs pnpm outdated, picks safe upgrades, runs the test suite after each, and opens PRs for the ones that pass.

Research + summarization agents. Give the agent WebFetch, WebSearch, and a narrow goal. "Look up the latest benchmark scores for LLMs in the X category and summarize." Great for daily briefings or weekly roundups.

None of these are science projects. They're pragmatic automations that save real time.

The Advisor Tool pattern

This is one of the highest-ROI patterns I've found in production. The idea: pair two models. A cheap fast executor does the work; an expensive smart advisor reviews it.

In practice, an executor agent (typically Sonnet 4.6) does the main loop — reads files, runs tools, drafts the answer. Then at key decision points, the executor calls an "advisor tool" that routes the current state to an advisor agent (typically Opus 4.6) for a higher-quality second opinion. The advisor isn't in the loop every turn — only when you decide it matters.

The benefit is twofold:

  1. Cost. You're only paying Opus rates for the moments that actually need Opus. The bulk of the loop is on Sonnet.
  2. Quality. You get Opus-level judgment on the calls that matter most — architectural choices, risky edits, ambiguous trade-offs — without paying for Opus on every turn.

The Agent SDK supports this pattern directly via the Advisor Tool. You configure the executor and the advisor, tell the SDK how you want them paired, and you get the cost/quality trade-off of a two-model system without writing the orchestration yourself.

This is the pattern I'd recommend for any agent that's making high-stakes decisions at production scale.

Managed Agents: when to promote your agent to production

Once you've built an Agent SDK agent and it's working, you may hit a point where running it yourself is more trouble than it's worth. That's where Managed Agents comes in.

Managed Agents gives you a hosted API for creating agents (reusable definitions), environments (isolated execution sandboxes with files, tools, and network), and sessions (individual conversations with state). Anthropic runs the loop on their infrastructure. You don't manage compute, you don't worry about scaling, you don't debug why a long-running agent ate a server.

The trade-off: less flexibility. If your agent needs weird custom infrastructure, you're better off running it yourself with the Agent SDK. If it's a fairly standard "read files, use tools, produce an answer" loop, Managed Agents will save you operational overhead.

The practical play: start with the Agent SDK on your own infrastructure to prototype, then promote to Managed Agents when you want production reliability. The mental model transfers one-to-one — same tools, same CLAUDE.md compatibility, same permission concepts.

Where to go next

If this is your first time seeing the Agent SDK, the fastest path is:

  1. Install it and run the hello world agent above. Feel the loop.
  2. Swap in a real task you care about — a code reviewer, a log summarizer, a test writer. Make the agent do something useful.
  3. Read the built-in tools docs to see what's available. The set has grown a lot in the last year and there are probably tools you don't know about.
  4. Try the Advisor Tool pattern once you have a working agent. It's the easiest cost/quality win.
  5. Promote to Managed Agents when you're ready for production.

For more context, the foundation starts with my Claude Code tutorial — the Agent SDK is literally Claude Code's engine, and understanding Claude Code first makes the SDK easier to reason about. If you're also curious how teams deploy Claude across engineering orgs, I wrote about enterprise Claude adoption.

And if you want the whole picture — the Agent SDK, Managed Agents, custom MCP servers, multi-agent orchestration, the Advisor Tool, and building a real integration in your own app — that's Day 3 of Claude Camp, the 3-day live cohort bootcamp. We build real agents together, live. See the full curriculum, or join the waitlist for the next cohort.

// keep reading

Related posts

← all postsPublished April 2, 2026