5 Agent Design Patterns for Robust, Cheap AI Systems
Five agent design patterns for reliable, low-cost AI systems: ReAct, Plan-and-Execute, Reflection, Router, and Tool-First, with trade-offs for each.
TL;DR — Five patterns carry most production agents: ReAct (reason + act in a loop), Plan-and-Execute (plan once, run cheap), Reflection (self-check before commit), Router (cheap model triages, expensive model finishes), and Tool-First (push work out of the model). None is universal. Pick by your latency budget, error tolerance, and token bill.
Why Patterns, Not Frameworks
Most “how to build an agent” advice starts with a framework. That’s backwards. The framework is plumbing. What actually decides whether your agent ships or stalls is the control flow: how it decides what to do next, when it stops, and how much it spends getting there.
Agent design patterns are the reusable shapes of that control flow. I’ve rebuilt the same five enough times across LangGraph, CrewAI, and hand-rolled loops that I now reach for them before I touch a library. They’re framework-agnostic on purpose. Get the pattern right and the implementation is a weekend; get it wrong and no framework saves you.
Here are the five, what each costs, and where each falls apart.
1. ReAct: Reason, Act, Repeat
ReAct is the default loop. The model thinks, picks a tool, reads the result, thinks again. It’s the pattern behind most chat agents and the one everyone learns first, formalized in the ReAct paper.
# ReAct loop, stripped to essentials
def react_loop(query, tools, max_steps=8):
history = [{"role": "user", "content": query}]
for step in range(max_steps):
response = call_model(history, tools=tools)
if response.tool_calls:
for call in response.tool_calls:
result = tools[call.name](**call.args)
history.append({"role": "tool", "content": result})
else:
return response.content # model decided it's done
return "Hit step limit without a final answer."
Where it shines: open-ended tasks where you can’t predict the steps. Research, debugging, “find me X and summarize it.” The model adapts as it learns.
Where it breaks: cost and loops. Every step is a full model call carrying the entire growing history. An 8-step ReAct run on a long context can cost 5-10x a single call. Worse, agents get stuck repeating the same failed action. Always cap max_steps, and log the step count, because a ReAct agent that quietly takes 15 steps instead of 3 is a budget leak you won’t see until the invoice.
2. Plan-and-Execute: Think Once, Run Cheap
Instead of reasoning at every step, the agent makes a plan upfront, then executes the steps with a cheaper model (or no model at all).
[Planner: expensive model] → produces ordered step list
↓
[Executor: cheap model / plain code] → runs each step
↓
[Re-plan only if a step fails]
The economics are the appeal. One expensive planning call, then a series of cheap executions. If your task decomposes cleanly (ETL jobs, multi-step form filling, scripted research), this can cut cost 60-80% versus ReAct because you’re not paying a frontier model to re-read the whole history on every action.
Where it breaks: brittle plans. The planner commits before it sees reality. If step 3 returns something unexpected, a rigid executor plows ahead anyway. The fix is a re-plan trigger: on failure or surprising output, kick back to the planner. That hybrid (plan, execute, re-plan on deviation) is what most mature systems actually run. The self-improving loops in agents like Hermes are essentially Plan-and-Execute with a learned re-planner.
3. Reflection: Check Your Own Work
The model generates an answer, then a second pass critiques it against the goal before anything is committed. It’s the cheapest reliability win available.
def reflect(query, draft):
critique = call_model([
{"role": "system", "content": "You are a strict reviewer. List concrete errors only. If none, reply 'PASS'."},
{"role": "user", "content": f"Task: {query}\n\nDraft:\n{draft}"}
])
if "PASS" in critique.content:
return draft
return call_model([
{"role": "user", "content": f"Fix these issues:\n{critique.content}\n\nDraft:\n{draft}"}
]).content
Where it shines: code generation, structured output, anything with a verifiable spec. One reflection pass catches a surprising share of “looks right, is wrong” outputs (the Reflexion paper quantifies the gains). For a deeper version that persists what it learns across sessions, see building a self-correcting agent with reflection.
Where it breaks: diminishing returns and self-agreement. Two reflection passes help; five just burn tokens. And a model reviewing its own work shares its blind spots. When it matters, reflect with a different model or against a real checker (a compiler, a test suite, a JSON schema validator) instead of more LLM opinion.
4. Router: Right Model for the Job
Not every request needs a frontier model. A router classifies the incoming task and sends it to the cheapest model that can handle it. Simple FAQ → small fast model. Hard reasoning → the expensive one.
| Tier | Model class | Cost (relative) | Handles |
|---|---|---|---|
| Triage | Small/fast | 1x | Classification, simple Q&A, formatting |
| Standard | Mid-tier | 5-10x | Most tasks, tool use, summaries |
| Heavy | Frontier | 20-40x | Multi-step reasoning, hard code, planning |
The router itself should be cheap. Use a small model or even a classifier, not a frontier model, to decide. This pattern is where a lot of real cost savings live: if 70% of your traffic is simple and you route it to a model that’s 10x cheaper, your bill drops by more than half without users noticing. Picking which models go in which tier is its own decision, covered in Claude Sonnet 4 vs GPT-4o for agents.
Where it breaks: misroutes. Send a hard problem to the cheap tier and you get a confident wrong answer. Build in an escape hatch: if the cheap model signals low confidence or the output fails validation, escalate to the next tier and eat the cost.
5. Tool-First: Get Work Out of the Model
The most underrated pattern. The model is bad at arithmetic, exact lookups, and deterministic transforms, and it’s expensive to use for them. Tool-First means: anything a function can do reliably, a function should do. The model orchestrates; it doesn’t compute.
Don’t ask the model to sort a list, sum a column, or parse a date. Give it a tool. Every deterministic operation you move out of the model is one fewer chance to hallucinate and a lot fewer tokens. This is also the cleanest path to testability, since tools are ordinary code you can unit-test. For how the tool interface itself should look, see the MCP vs function calling breakdown.
Where it breaks: tool sprawl. Fifty tools with verbose schemas can blow your context budget before the task even starts. Keep the tool set tight, write terse descriptions, and group related actions behind one tool with a mode parameter rather than ten near-identical tools.
Combining Patterns
Real systems layer these. A production agent often looks like: a Router up front, ReAct or Plan-and-Execute in the middle, Reflection before commit, and Tool-First throughout.
flowchart LR
A[Request] --> R[Router]
R -->|simple| S[Small model + tools]
R -->|complex| P[Plan-and-Execute]
P --> RF[Reflection check]
S --> RF
RF -->|pass| O[Response]
RF -->|fail| P
The mistake is reaching for all five on day one. Start with the simplest thing that works (often ReAct plus Tool-First), measure where it hurts, and add patterns to fix specific pain. Add Reflection when quality is the problem. Add a Router when cost is the problem. Add Plan-and-Execute when latency from too many sequential model calls is the problem.
A Quick Decision Guide
| Your problem | Reach for |
|---|---|
| Unpredictable, exploratory tasks | ReAct |
| Repeatable multi-step workflows | Plan-and-Execute |
| ”Looks right, is wrong” outputs | Reflection (against a real checker) |
| Bill too high, traffic mostly easy | Router |
| Hallucinated math / lookups | Tool-First |
FAQ
Which pattern should I start with? ReAct plus Tool-First. ReAct gives you a working loop, Tool-First keeps it accurate and cheap. Add the others only when you have a measured problem they solve.
Do agent frameworks implement these for me? Partly. LangGraph gives you the graph to wire any of them; CrewAI leans toward role-based Plan-and-Execute. But the patterns are decisions you make, not features you turn on. See the best open-source agent frameworks for what each gives you out of the box.
How much does Reflection actually cost? Roughly one extra model call per item reviewed, plus the rewrite if it fails. For a code agent that’s often worth it; for a high-volume chat bot, reflect selectively (only on low-confidence or high-stakes outputs).
Isn’t a Router just added complexity? Only if your traffic is uniform. If most requests are easy and a few are hard, routing is the single biggest lever on your token bill. If everything is hard, skip it.
How do I stop ReAct agents from looping forever? Cap steps, detect repeated identical actions, and add a “you’ve tried this, try something else or stop” nudge to the prompt when you see a repeat. Always log step counts so a runaway loop shows up before the invoice does.


