AI observers: observability and monitoring for autonomous agents

Every AI agent system I have seen works the same way. You talk to it. It does something. It stops.

That is fine for a chatbot. It is not fine for a system that runs your business.

I run 18 AI agents. They learn from the web, pitch me ideas, evaluate their own work, and even dream at night. But until now, they only worked when triggered. A cron job fires at 07:00, agents learn. Another fires at 09:00, agents pitch. A message arrives, agents respond.

Between triggers, nothing. The agents are asleep.

The next step is obvious: agents that are always watching.

From batch to always-on

Right now, the system runs on cron. Every job has a schedule. Learning at 07:00. Pitching at 09:00. Solar forecasts at 07:30. Apartment scouting every 30 minutes. Heartbeat checks every 5 minutes.

This works, but it creates blind spots. If a support ticket arrives at 14:23, it waits until someone checks. If a server metric spikes at 03:00, nobody notices until the morning digest. If a deal goes stale in the CRM, it sits there until the next scheduled review.

The agents have the tools to handle all of this. They just are not watching.

The event bus is already there

We built an event bus months ago. Every significant action in the system emits an event:

pitch.approved when I approve an agent's proposal
learning.complete when an agent finishes its morning research
task.created when work gets assigned
heartbeat.failed when a monitor stops responding

These events get logged to the database. They are useful for auditing and debugging. But nothing listens to them in real-time. They are a record of what happened, not a trigger for what should happen next.

The observer pattern

The idea is simple. Instead of agents waiting to be called, they observe. Each agent declares what events it cares about, and when those events fire, the agent wakes up and handles them.

Think of it as moving from pull (cron asks "anything to do?") to push (event tells the agent "this happened, respond").

Some examples:

Betty (Support) observes support.ticket.created. A customer submits a ticket, Betty responds immediately. Not at the next cron cycle. Now.
Keith (SysAdmin) observes heartbeat.failed. A monitor stops responding, Keith investigates the server, checks logs, and sends a Telegram alert with diagnosis.
Chick (Security) observes deployment.completed. Every time code ships, Chick scans the diff for exposed secrets or security regressions.
Bryan (Health) observes meal.logged. I log a meal, Bryan analyzes the nutritional content in context of my weekly intake.
Ella (Finance) observes payment.received and subscription.cancelled. Revenue events trigger immediate financial tracking updates.

The agents already have all the tools for this. The missing piece is the listener, a persistent process that watches the event bus and dispatches agents when their triggers fire.

What changes architecturally

Not much, actually. The event bus exists. The agent dispatch mechanism exists (it is the same one that handles approved pitches). The tools exist.

The new piece is a daemon process that:

Watches the agent_events table (or a Redis pub/sub channel) for new events
Matches each event against agent subscriptions
Dispatches the relevant agent with the event context
Logs the response and any follow-up events the agent emits

This is not a rewrite. It is a loop.

while (true) {
    $events = getNewEvents($lastChecked);
    foreach ($events as $event) {
        $agents = getSubscribedAgents($event->type);
        foreach ($agents as $agent) {
            dispatch($agent, $event);
        }
    }
    $lastChecked = now();
    sleep(1);
}

The complexity is not in the architecture. It is in the economics. Every dispatch costs tokens. An always-on system that fires agents on every event could get expensive fast. The trick is being selective about what triggers a full agent dispatch versus what gets handled by a lightweight rule.

The cost question

This is the real constraint. Running agents on cron is cheap because you control exactly how many times they fire. An event-driven system could fire hundreds of times a day if you are not careful.

The solution is a tiered response:

Tier 1: Rules (free). Simple pattern matching. "If heartbeat fails and it was healthy 5 minutes ago, send a Telegram alert." No LLM needed.
Tier 2: Cheap model (fractions of a cent). Quick triage. "Is this support ticket urgent?" If yes, escalate to tier 3.
Tier 3: Full agent (cents). The agent wakes up with full context, tools, and reasoning. Reserved for events that actually need intelligence.

Most events can be handled at tier 1 or 2. Only the interesting ones need a full agent dispatch. This keeps costs comparable to the current cron-based system while making the agents genuinely responsive.

Why this matters

The difference between a batch system and an always-on system is the difference between checking your email once a day and having notifications.

Batch processing is fine when nothing is urgent. But the whole point of having AI agents is to handle things you cannot handle yourself. If they only check in at scheduled intervals, they are not really autonomous. They are just automated scripts with better language skills.

Always-on observers turn agents from workers who clock in and out into team members who are always present. They notice things. They respond in real-time. They catch problems before they become incidents.

The event bus we built for orchestration becomes the nervous system. Events are stimuli. Agents are reflexes. The system does not just process work. It pays attention.

This is part of a series on building FlatNine Ensemble, a team of AI agents that runs a business. Previous posts: AI pitches, AI self learning, Evaluating AI Work, AI REM, Lossless orchestration.

From batch to always-on

The event bus is already there

The observer pattern

What changes architecturally

The cost question

Why this matters

Related Posts

How AI handles my email before I read it

I built my own Claude eval

The self-improving company