Building a Self-Correcting AI Agent with Reflection

TL;DR — A self-correcting agent adds two things to a normal agent: a reflection step where it critiques its own output before committing, and a memory of past mistakes so it stops repeating them. Reflection catches errors within a task; memory prevents them across tasks. This walkthrough builds both in plain Python — no framework required — and shows where each one actually helps and where it just burns tokens.

What “Self-Correcting” Actually Means

The phrase gets thrown around loosely. Concretely, a self-correcting agent does two distinct things:

Reflection (intra-task): Before finalizing an answer, the agent reviews its own draft against the goal, finds flaws, and revises. This catches mistakes within a single task. The technique is formalized in the Self-Refine and Reflexion papers.
Memory of failures (inter-task): When the agent gets something wrong and learns the fix, it records that lesson and recalls it on future tasks. This prevents repeating the same mistake across tasks.

These solve different problems. Most “self-correcting agent” tutorials only do reflection and call it done. The memory half is what actually makes an agent improve over time. We covered the storage side in agent memory architectures; here we wire it into a correction loop.

The Reflection Loop

The core pattern is generate → critique → revise. Here’s a minimal version you can run.

from openai import OpenAI

client = OpenAI(base_url="https://api.sandbase.ai/v1", api_key="sk-...")
MODEL = "anthropic/claude-sonnet-4"

def generate(task: str, prior_feedback: str = "") -> str:
    prompt = task
    if prior_feedback:
        prompt += f"\n\nA previous attempt had this problem:\n{prior_feedback}\nFix it."
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.choices[0].message.content

def critique(task: str, draft: str) -> tuple[bool, str]:
    """Return (is_good, feedback). The agent judges its own work."""
    prompt = (
        f"Task: {task}\n\nDraft answer:\n{draft}\n\n"
        "Critique this draft. If it fully satisfies the task, reply exactly 'PASS'. "
        "Otherwise, list the specific problems to fix."
    )
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
    )
    feedback = resp.choices[0].message.content.strip()
    return feedback == "PASS", feedback

def solve(task: str, max_iterations: int = 3) -> str:
    feedback = ""
    for i in range(max_iterations):
        draft = generate(task, feedback)
        ok, feedback = critique(task, draft)
        if ok:
            return draft
    return draft  # return best effort after max iterations

The structure matters more than the prompts. Three rules from running this in production:

Cap the iterations. Without max_iterations, a fussy critic loops forever. Three is usually enough; past three you’re polishing diminishing returns at full token cost.

The critic needs the goal, not just the draft. A critique step that only sees the draft will invent problems. Give it the original task so it judges against the actual requirement.

Make PASS an exact token. Fuzzy “looks good!” responses are hard to parse. Force a literal PASS so your control flow is deterministic.

When Reflection Helps and When It Doesn’t

Reflection is not free — it at least doubles your token cost and latency. It pays off unevenly:

Task type	Does reflection help?	Why
Code generation	Strongly	The critic catches bugs, missing edge cases
Math / logic	Strongly	Self-check finds arithmetic and reasoning slips
Structured extraction	Moderately	Catches schema violations, missing fields
Creative writing	Weakly	”Better” is subjective; the critic adds little
Simple lookups	No	Nothing to reflect on; pure waste

The lesson: gate reflection behind task type. Don’t reflect on every call. A trivial classification doesn’t need a self-critique; a code-generation step does. This is the same selective-spend discipline that separates reliable, cost-effective agents from ones that burn money.

Adding Memory: Learning Across Tasks

Reflection fixes the current task. It does nothing for the next one. If the agent keeps making the same mistake (wrong date format, forgetting to validate input, misreading an API), reflection re-discovers and re-fixes it every single time. That’s wasteful.

The fix: when the critic finds a real problem, store the lesson. On future tasks, load relevant lessons into the prompt.

import json
from pathlib import Path

LESSONS_FILE = Path("lessons.json")

def load_lessons() -> list[str]:
    if LESSONS_FILE.exists():
        return json.loads(LESSONS_FILE.read_text())
    return []

def save_lesson(lesson: str):
    lessons = load_lessons()
    if lesson not in lessons:
        lessons.append(lesson)
        LESSONS_FILE.write_text(json.dumps(lessons, indent=2))

def extract_lesson(task: str, feedback: str) -> str:
    """Turn a specific critique into a reusable rule."""
    prompt = (
        f"A task had this problem: {feedback}\n"
        "Write one short, general rule (one sentence) to avoid this in future tasks."
    )
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.choices[0].message.content.strip()

def solve_with_memory(task: str, max_iterations: int = 3) -> str:
    lessons = load_lessons()
    lesson_text = "\n".join(f"- {l}" for l in lessons)
    augmented_task = task
    if lesson_text:
        augmented_task = f"{task}\n\nLessons from past tasks:\n{lesson_text}"

    feedback = ""
    for i in range(max_iterations):
        draft = generate(augmented_task, feedback)
        ok, feedback = critique(task, draft)
        if ok:
            return draft
        # Persist the lesson so future tasks benefit
        save_lesson(extract_lesson(task, feedback))
    return draft

The key transformation is in extract_lesson: it turns a specific critique (“you used MM/DD/YYYY but the API wants ISO 8601”) into a general rule (“always use ISO 8601 date format for API calls”). Specific feedback doesn’t transfer; general rules do.

This is a deliberately simple memory: a flat JSON list loaded into every prompt. It works until the lesson list grows past what fits comfortably in context, at which point you switch to retrieval — embed the lessons, fetch only the relevant ones per task. That’s the warm-vs-cold memory split in action.

The Honest Trade-offs

Self-correction is not a free upgrade. The costs:

Token multiplier. Reflection roughly 2-3x’s your tokens per task. Memory adds a smaller, constant overhead (loading lessons).
Latency. Each reflection iteration is another round-trip. A 3-iteration loop can triple wall-clock time.
The critic can be wrong. A flawed critic rejects good answers or approves bad ones. Your correction is only as good as your critique step.
Lesson pollution. Bad lessons accumulate. An over-general rule (“always double-check everything”) adds noise without value. Prune the lesson store.

The teams that get value from this don’t apply it everywhere. They apply reflection to high-stakes, error-prone steps (code, structured output, multi-step plans) and let cheap, low-risk calls run once.

FAQ

Is reflection the same as chain-of-thought?

No. Chain-of-thought is reasoning before an answer, in one pass. Reflection is critiquing a completed answer and revising it, across multiple passes. You can combine them — reason step-by-step, then reflect on the result — but they’re distinct techniques.

Should the critic use the same model as the generator?

Often yes, but a different model as critic can catch errors the generator is blind to. A practical pattern is generate with a strong model, critique with the same or a cheaper one. The critic’s job (find flaws) is sometimes easier than the generator’s (produce a correct answer).

How do I stop the agent from looping forever?

Hard-cap iterations (3 is a good default) and return the best attempt when you hit the cap. A self-critic will always find something to improve, so an unbounded loop never terminates. The cap is non-negotiable.

Where should I store lessons in production?

Start with a JSON file or a database row per user/agent. Once lessons exceed what fits in context, move to a vector store and retrieve only relevant lessons per task. See agent memory architectures for the retrieval design.

Does this work with any model?

Yes. The pattern is model-agnostic — it’s just structured prompting and a control loop. Stronger models produce better critiques and need fewer iterations. Through an OpenAI-compatible gateway you can swap models without changing the loop.

Key Takeaways

Self-correction has two parts: reflection fixes mistakes within a task, memory prevents repeating them across tasks. Most tutorials skip the memory half.
The reflection loop is generate → critique → revise, with a hard iteration cap and a critic that sees the original goal, not just the draft.
Gate reflection by task type. It strongly helps code and logic, does nothing for simple lookups, and roughly 2-3x’s your token cost.
Memory works by converting specific critiques into general rules. Start with a JSON list, move to retrieval when it outgrows the context window.

What “Self-Correcting” Actually Means

The Reflection Loop

When Reflection Helps and When It Doesn’t

Adding Memory: Learning Across Tasks

The Honest Trade-offs

FAQ

Key Takeaways

You May Also Like

Agent Memory Architectures: Vector, Graph & Episodic

How Hermes' Self-Improving Agent Loop Actually Works (2026)

Best AI Sandboxes for Agents in 2026