Building a Self-Correcting AI Agent with Reflection
How to build a self-correcting AI agent using the reflection pattern and persistent memory. A runnable Python loop that critiques and fixes its own output.
TL;DR — A self-correcting agent adds two things to a normal agent: a reflection step where it critiques its own output before committing, and a memory of past mistakes so it stops repeating them. Reflection catches errors within a task; memory prevents them across tasks. This walkthrough builds both in plain Python — no framework required — and shows where each one actually helps and where it just burns tokens.
What “Self-Correcting” Actually Means
The phrase gets thrown around loosely. Concretely, a self-correcting agent does two distinct things:
- Reflection (intra-task): Before finalizing an answer, the agent reviews its own draft against the goal, finds flaws, and revises. This catches mistakes within a single task. The technique is formalized in the Self-Refine and Reflexion papers.
- Memory of failures (inter-task): When the agent gets something wrong and learns the fix, it records that lesson and recalls it on future tasks. This prevents repeating the same mistake across tasks.
These solve different problems. Most “self-correcting agent” tutorials only do reflection and call it done. The memory half is what actually makes an agent improve over time. We covered the storage side in agent memory architectures; here we wire it into a correction loop.
The Reflection Loop
The core pattern is generate → critique → revise. Here’s a minimal version you can run.
from openai import OpenAI
client = OpenAI(base_url="https://api.sandbase.ai/v1", api_key="sk-...")
MODEL = "anthropic/claude-sonnet-4"
def generate(task: str, prior_feedback: str = "") -> str:
prompt = task
if prior_feedback:
prompt += f"\n\nA previous attempt had this problem:\n{prior_feedback}\nFix it."
resp = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": prompt}],
)
return resp.choices[0].message.content
def critique(task: str, draft: str) -> tuple[bool, str]:
"""Return (is_good, feedback). The agent judges its own work."""
prompt = (
f"Task: {task}\n\nDraft answer:\n{draft}\n\n"
"Critique this draft. If it fully satisfies the task, reply exactly 'PASS'. "
"Otherwise, list the specific problems to fix."
)
resp = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": prompt}],
)
feedback = resp.choices[0].message.content.strip()
return feedback == "PASS", feedback
def solve(task: str, max_iterations: int = 3) -> str:
feedback = ""
for i in range(max_iterations):
draft = generate(task, feedback)
ok, feedback = critique(task, draft)
if ok:
return draft
return draft # return best effort after max iterations
The structure matters more than the prompts. Three rules from running this in production:
Cap the iterations. Without max_iterations, a fussy critic loops forever. Three is usually enough; past three you’re polishing diminishing returns at full token cost.
The critic needs the goal, not just the draft. A critique step that only sees the draft will invent problems. Give it the original task so it judges against the actual requirement.
Make PASS an exact token. Fuzzy “looks good!” responses are hard to parse. Force a literal PASS so your control flow is deterministic.
When Reflection Helps and When It Doesn’t
Reflection is not free — it at least doubles your token cost and latency. It pays off unevenly:
| Task type | Does reflection help? | Why |
|---|---|---|
| Code generation | Strongly | The critic catches bugs, missing edge cases |
| Math / logic | Strongly | Self-check finds arithmetic and reasoning slips |
| Structured extraction | Moderately | Catches schema violations, missing fields |
| Creative writing | Weakly | ”Better” is subjective; the critic adds little |
| Simple lookups | No | Nothing to reflect on; pure waste |
The lesson: gate reflection behind task type. Don’t reflect on every call. A trivial classification doesn’t need a self-critique; a code-generation step does. This is the same selective-spend discipline that separates reliable, cost-effective agents from ones that burn money.
Adding Memory: Learning Across Tasks
Reflection fixes the current task. It does nothing for the next one. If the agent keeps making the same mistake (wrong date format, forgetting to validate input, misreading an API), reflection re-discovers and re-fixes it every single time. That’s wasteful.
The fix: when the critic finds a real problem, store the lesson. On future tasks, load relevant lessons into the prompt.
import json
from pathlib import Path
LESSONS_FILE = Path("lessons.json")
def load_lessons() -> list[str]:
if LESSONS_FILE.exists():
return json.loads(LESSONS_FILE.read_text())
return []
def save_lesson(lesson: str):
lessons = load_lessons()
if lesson not in lessons:
lessons.append(lesson)
LESSONS_FILE.write_text(json.dumps(lessons, indent=2))
def extract_lesson(task: str, feedback: str) -> str:
"""Turn a specific critique into a reusable rule."""
prompt = (
f"A task had this problem: {feedback}\n"
"Write one short, general rule (one sentence) to avoid this in future tasks."
)
resp = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": prompt}],
)
return resp.choices[0].message.content.strip()
def solve_with_memory(task: str, max_iterations: int = 3) -> str:
lessons = load_lessons()
lesson_text = "\n".join(f"- {l}" for l in lessons)
augmented_task = task
if lesson_text:
augmented_task = f"{task}\n\nLessons from past tasks:\n{lesson_text}"
feedback = ""
for i in range(max_iterations):
draft = generate(augmented_task, feedback)
ok, feedback = critique(task, draft)
if ok:
return draft
# Persist the lesson so future tasks benefit
save_lesson(extract_lesson(task, feedback))
return draft
The key transformation is in extract_lesson: it turns a specific critique (“you used MM/DD/YYYY but the API wants ISO 8601”) into a general rule (“always use ISO 8601 date format for API calls”). Specific feedback doesn’t transfer; general rules do.
This is a deliberately simple memory: a flat JSON list loaded into every prompt. It works until the lesson list grows past what fits comfortably in context, at which point you switch to retrieval — embed the lessons, fetch only the relevant ones per task. That’s the warm-vs-cold memory split in action.
The Honest Trade-offs
Self-correction is not a free upgrade. The costs:
- Token multiplier. Reflection roughly 2-3x’s your tokens per task. Memory adds a smaller, constant overhead (loading lessons).
- Latency. Each reflection iteration is another round-trip. A 3-iteration loop can triple wall-clock time.
- The critic can be wrong. A flawed critic rejects good answers or approves bad ones. Your correction is only as good as your critique step.
- Lesson pollution. Bad lessons accumulate. An over-general rule (“always double-check everything”) adds noise without value. Prune the lesson store.
The teams that get value from this don’t apply it everywhere. They apply reflection to high-stakes, error-prone steps (code, structured output, multi-step plans) and let cheap, low-risk calls run once.
FAQ
Is reflection the same as chain-of-thought?
No. Chain-of-thought is reasoning before an answer, in one pass. Reflection is critiquing a completed answer and revising it, across multiple passes. You can combine them — reason step-by-step, then reflect on the result — but they’re distinct techniques.
Should the critic use the same model as the generator?
Often yes, but a different model as critic can catch errors the generator is blind to. A practical pattern is generate with a strong model, critique with the same or a cheaper one. The critic’s job (find flaws) is sometimes easier than the generator’s (produce a correct answer).
How do I stop the agent from looping forever?
Hard-cap iterations (3 is a good default) and return the best attempt when you hit the cap. A self-critic will always find something to improve, so an unbounded loop never terminates. The cap is non-negotiable.
Where should I store lessons in production?
Start with a JSON file or a database row per user/agent. Once lessons exceed what fits in context, move to a vector store and retrieve only relevant lessons per task. See agent memory architectures for the retrieval design.
Does this work with any model?
Yes. The pattern is model-agnostic — it’s just structured prompting and a control loop. Stronger models produce better critiques and need fewer iterations. Through an OpenAI-compatible gateway you can swap models without changing the loop.
Key Takeaways
- Self-correction has two parts: reflection fixes mistakes within a task, memory prevents repeating them across tasks. Most tutorials skip the memory half.
- The reflection loop is generate → critique → revise, with a hard iteration cap and a critic that sees the original goal, not just the draft.
- Gate reflection by task type. It strongly helps code and logic, does nothing for simple lookups, and roughly 2-3x’s your token cost.
- Memory works by converting specific critiques into general rules. Start with a JSON list, move to retrieval when it outgrows the context window.


