Cron-Driven Agents: Autonomous Workflows on a Schedule

TL;DR — A cron-driven agent runs on a schedule instead of waiting for you to prompt it — daily summaries, monitoring, recurring research. The architecture is simple (scheduler triggers agent run), but the hard parts are idempotency (what if it runs twice?), failure handling (what if the LLM is down at 3am?), and cost control (an agent that wakes up every minute gets expensive fast). Here’s how to build one that won’t surprise you with a bill.

The Shift from Reactive to Proactive

Most agents are reactive: you ask, they answer. A cron-driven agent flips this — it wakes up on a schedule and does work without anyone prompting it. Build a cron-driven agent and you move from “tool you operate” to “colleague who handles things while you’re away.”

Concrete examples I’ve built or seen work well:

A 7am agent that reads overnight GitHub issues, triages them, and posts a summary to Slack
A nightly agent that reviews the day’s work and consolidates memory (this is essentially what Anthropic’s “Dreaming” does)
A weekly agent that researches a topic and drafts a report
A monitoring agent that checks a metric every 15 minutes and pings you only when something’s wrong

Hermes Agent ships a cron system as one of its five pillars precisely because scheduled autonomy is what separates an assistant from a tool. Let’s build the pattern.

Reactive and cron-driven agents differ on a few axes worth being clear about:

	Reactive agent	Cron-driven agent
Trigger	User prompt	Schedule / timer
Failure visibility	Immediate (you see it)	Silent (3am, nobody watching)
Cost driver	Per user request	Per scheduled run
Main risk	Bad answer	Double-run, silent failure, runaway cost

The Basic Architecture

At its simplest, a cron-driven agent is a scheduler plus an agent run:

Scheduler (cron) --> trigger --> Agent run --> action (Slack post, email, etc.)
     |                              |
     |                              +--> persist results to memory
     +--> next scheduled time

The scheduler can be literal cron, a cloud scheduler, or a framework’s built-in cron (like Hermes’). When it fires, it kicks off an agent run with a predefined task (“summarize overnight issues”), the agent does its work using whatever tools it has, takes an action, and persists anything worth remembering.

# A cron-driven agent definition
jobs:
  - name: morning-triage
    schedule: "0 7 * * *"   # 7am daily
    task: |
      Read GitHub issues opened since 6pm yesterday.
      Triage by severity. Post a summary to #eng-standup.
    tools: [github, slack]
    model: anthropic/claude-sonnet-4

That’s the happy path. Now the parts that actually matter.

Idempotency: What If It Runs Twice?

Schedulers retry. Networks hiccup. Your 7am job might fire twice if the first run times out and the scheduler retries. If your agent isn’t idempotent, that means two Slack posts, two emails, or worse — two purchases, two deployments.

Design every cron agent assuming it might run more than once for a single scheduled slot:

Use idempotency keys. Tag each scheduled run with a unique key (e.g., morning-triage-2026-06-04). Before acting, check whether that key already completed.
Make actions check-then-act. “Post summary if not already posted today” beats “post summary.”
Separate compute from side effects. The agent can re-run its reasoning safely; gate only the irreversible action (the Slack post) behind the idempotency check.

This is the single most common way cron agents go wrong in production. The agent logic is fine; it just ran twice and double-posted.

Failure Handling: What If the LLM Is Down at 3am?

A reactive agent’s failure is visible — you’re sitting there, you see the error. A cron agent fails silently at 3am while you sleep. You find out when the morning summary never arrives.

Build for unattended failure:

Retries with backoff for transient errors (provider 503, timeout). But cap them — don’t retry a malformed-request error forever.
Dead-man’s switch. If a job that should run daily hasn’t succeeded in 25 hours, alert. The absence of success is the signal, not the presence of an error.
Provider fallback. A single-provider agent dies if that provider has a 3am outage. This is where an LLM gateway with automatic fallback earns its keep — if one provider is down, the run routes to another instead of failing.

# Cron agent pointed at a gateway with fallback
OPENAI_API_BASE=https://api.sandbase.ai/v1
OPENAI_API_KEY=your-sandbase-api-key
MODEL=anthropic/claude-sonnet-4

With SandBase, a 3am provider outage doesn’t kill your nightly job — traffic fails over automatically. For unattended automation, that resilience is the difference between “ran fine” and “silently broken for a week.”

Cost Control: The Trap of Always-On

Here’s the bill-shock scenario: you set a monitoring agent to run every minute, each run loads 3000 tokens of context and calls a frontier model, and a month later you’re staring at a surprising invoice. An agent that wakes up 1,440 times a day adds up.

How to keep cron agents cheap:

Right-size the schedule. Does it really need to run every minute, or every 15? Most monitoring tolerates a coarser interval than you’d reflexively set.
Use cheap models for routine runs. A nightly summary doesn’t need your most expensive model. Route scheduled grunt work to a budget model and reserve the strong model for when reasoning quality matters.
Trim context. Cron agents often reload the same bulky context every run. Load only what the task needs. (The layered memory approach from our memory architectures guide applies here — warm context only, skip the expensive cold lookups unless needed.)
Short-circuit early. A monitoring agent should cheaply check “is anything wrong?” before spinning up expensive reasoning. Most runs should exit early having done almost nothing.

Routing scheduled work through SandBase lets you assign a cheap model to high-frequency jobs and a strong one to the occasional heavy task, all from the same setup.

A Pattern That Ties It Together

The cron agents that work in production share a shape: a cheap, fast “should I do anything?” check that runs frequently, escalating to expensive reasoning only when warranted. The monitoring agent that pings you only when a metric breaks is the canonical example — 99% of runs are a cheap no-op, 1% are a real alert.

This keeps cost proportional to actual events rather than to schedule frequency, which is the whole game for always-on automation.

FAQ

Q: What’s the difference between a cron agent and a regular scheduled script?

A scheduled script runs fixed code. A cron agent runs an LLM reasoning loop on a schedule — it can adapt to what it finds, use tools dynamically, and make judgment calls a static script can’t. The tradeoff is it’s less predictable and costs tokens per run.

Q: How do I stop a cron agent from doing something twice?

Idempotency keys. Tag each scheduled run with a unique identifier and check whether it already completed before taking any irreversible action. Gate side effects (posts, emails, purchases) behind that check, even if you let the reasoning re-run freely.

Q: What happens if my LLM provider is down when the job fires?

With a single provider, the job fails silently. Use an LLM gateway with automatic fallback (like SandBase) so the run routes to another provider instead of dying. Also add a dead-man’s switch that alerts when an expected run hasn’t succeeded.

Q: How do I keep scheduled agents from getting expensive?

Right-size the schedule (every 15 min beats every minute), use cheap models for routine runs, trim the context you load each run, and short-circuit early so most runs do almost nothing. Cost should track real events, not schedule frequency.

Q: Can existing frameworks do this, or do I build it?

Hermes Agent has a built-in cron system. For others, you can pair a scheduler (system cron, cloud scheduler) with an agent run. The scheduling is the easy part — idempotency, failure handling, and cost control are what you actually need to design.

The Shift from Reactive to Proactive

The Basic Architecture

Idempotency: What If It Runs Twice?

Failure Handling: What If the LLM Is Down at 3am?

Cost Control: The Trap of Always-On

A Pattern That Ties It Together

FAQ

You May Also Like

MCP vs Function Calling: Which Tool Integration to Use

Agent Memory Architectures: Vector, Graph & Episodic

Inside OpenClaw: The Architecture That Hit 250K Stars (2026)