yarnnnyarnnn
← Back to blog

The Agent Convergence, Part 2: The Sleep-Wake Architecture

·9 min read·Kevin Kim

At a Glance

Answer: Session-based agents forget everything when you close the tab. Always-on agents burn compute watching nothing happen. The right architecture is neither — it's...

This article covers:

  • Three Architectures, Three Failure Modes
  • Why Persistence Matters More Than Autonomy
  • The Graduated Response
  • Many Specialists, Not One Generalist
  • The Meta-Agent Question

This is Part 2 of "The Agent Convergence" — a three-part series on why every company shipping agents simultaneously is less interesting than the architectural question underneath. Part 1 covered the convergence and the AC/DC question. This part gets into architecture.

I spend a lot of time thinking about how agents should work. Not in the abstract — I'm building an agent platform, so "how should agents work" is the same question as "what should I build next." And the more I study what everyone's shipping, the more I think the industry is optimizing for the wrong axis.

The current debate frames agent architecture as a spectrum from "assistant" to "autonomous." On one end, you have chatbots that do what you ask. On the other, you have fully autonomous agents that do work without being asked. Most of the products shipping this week — Copilot Cowork, Claude Cowork, Notion's agents — sit somewhere in the middle. You hand off a task; the agent executes it; you get the result.

That's the wrong spectrum. The spectrum that matters isn't about how much autonomy the agent has. It's about how much the agent remembers.

Three Architectures, Three Failure Modes

Session-based agents are the most common pattern right now. You open a chat, describe what you need, the agent executes, you get the result, and the session ends. Next time you need something, you start from zero.

This is Copilot Cowork's model. It's Claude Cowork's model. It's ChatGPT's model. It works — sometimes impressively well — for one-shot tasks. Write this email. Analyze this spreadsheet. Prep this meeting.

The failure mode is repetition. I've experienced this firsthand: every Monday, I need a client update that synthesizes Slack conversations, email threads, and calendar context from the previous week. In a session-based model, I describe this task from scratch every week. The agent doesn't know I did the same thing last Monday. It doesn't know what I emphasized. It doesn't know that my client prefers bullet points over paragraphs, or that the project shifted priorities three weeks ago. It executes the task competently, but generically. Week 50 looks exactly like week 1.

Always-on agents are the other extreme. OpenClaw runs a persistent agent with a heartbeat — it's always watching, always processing, maintaining continuous awareness. The idea is compelling: an agent that never sleeps, that sees everything, that maintains perfect continuity.

The failure mode is cost and noise. An always-on agent processing your Slack, email, calendar, and docs in real-time burns compute constantly — most of it watching nothing happen. Ninety percent of your Slack messages don't require agent action. Ninety-five percent of your emails are irrelevant to any specific task. The always-on model pays for full-time attention to generate occasional value.

Sleep-wake agents are the architecture I think actually works, and it's less intuitive than the other two. The idea: each agent is a persistent specialist that sleeps between executions. It has its own memory, its own instructions, its own learned preferences. When it wakes — on schedule, in response to an event, or when it proactively decides to — it wakes with full context. It executes. It accumulates what it learned. It goes back to sleep.

The idle cost is zero. The continuity is total. And the execution quality improves over time because every wake cycle adds to the agent's accumulated knowledge.

Why Persistence Matters More Than Autonomy

Let me make this concrete. Say you have an agent that writes a weekly competitive analysis. Here's what changes across the three architectures:

Session-based: Every week, you describe the task. Every week, the agent searches your connected sources. Every week, it produces a competent but generic analysis. If a competitor made a pricing change three weeks ago that you noted was important, the agent doesn't know that unless you re-explain it. The output is correct but shallow — it can't distinguish between what's new and what's been trending.

Always-on: The agent monitors competitor signals continuously. Every blog post, every press release, every pricing page change gets processed in real time. By Friday, the analysis is rich with accumulated context. But you're paying for five days of continuous processing to generate one output. And the agent is processing your entire information stream, not just competitive signals — it's watching your unrelated Slack channels, your personal email, your calendar. Compute scales with attention, not with value.

Sleep-wake: The agent sleeps all week. Your perception pipeline — the background sync that watches your connected platforms — accumulates signals. When the agent wakes on Friday, it doesn't need real-time processing. It reads from the knowledge base. But critically, it also has its own memory. It remembers that three weeks ago, the competitor's pricing change was significant. It remembers that you edited last week's draft to emphasize market positioning over feature comparison. It remembers the feedback loop. The 12th weekly analysis is structurally different from the 1st — not because the prompt is different, but because the agent has accumulated 11 weeks of operational knowledge.

This is the distinction I keep coming back to. Autonomy is a feature. Persistence is an architecture. And persistence is what makes agents actually useful for recurring work.

The Graduated Response

There's a design pattern inside the sleep-wake architecture that I think is underappreciated: not every signal warrants action. Most signals should feed memory, not trigger execution.

Think about how a good human analyst works. They don't write a report every time something happens. They observe. They accumulate. They develop a sense of what's important. And when the pattern reaches a threshold — or when the scheduled checkpoint arrives — they synthesize everything they've been accumulating into a coherent output.

This is the graduated response pattern: most events get logged as observations in the agent's memory. Some events, if they're important enough, trigger immediate execution. The agent decides — based on the kind of agent it is, and what it's been told to care about — whether to act, observe, or ignore.

A competitive analysis agent sees a competitor's blog post. It doesn't write a full report. It notes: "Competitor X published about enterprise pricing, third post on this topic in two weeks." Next week, when it wakes for the scheduled analysis, that observation is part of its context. The pattern emerges naturally.

A client relationship agent sees a Slack message from a key account. If it's routine, it observes. If the message suggests frustration or an escalation risk, it might trigger an immediate alert or draft a response. The agent's accumulated memory of this client — tone shifts, past issues, relationship history — informs the threshold.

This is fundamentally different from both the session-based model (no memory, so no pattern recognition) and the always-on model (processes everything at equal weight). The graduated response means the agent's intelligence is distributed across time — it's always accumulating, rarely acting, and when it acts, the action is informed by everything it's observed.

Many Specialists, Not One Generalist

The other architectural bet I think matters: many sleeping agents, each specializing in one domain, rather than one general-purpose agent trying to do everything.

This runs counter to the direction most platforms are going. Microsoft's Copilot Cowork is one agent per workspace. Claude Cowork is one agent per session. The implicit model is: you have an AI, and you give it tasks.

I think the right model is closer to how organizations actually work. You don't have one employee who does everything. You have a marketing specialist, a competitive analyst, a client relationship manager, a financial forecaster. Each one has deep domain knowledge. Each one has their own working memory, their own priorities, their own judgment about what matters.

The agent equivalent: twenty sleeping specialists, each with its own memory, its own instructions, its own learned preferences. A competitive analysis agent that knows your market. A client update agent that knows each client's preferences. A meeting prep agent that knows your calendar patterns and who needs what context.

Each agent accumulates knowledge in its own domain. None of them is general-purpose. Together, they cover the full surface area of your work. And because they sleep between executions, twenty specialists cost the same at idle as one generalist: zero.

The Meta-Agent Question

Once you have many specialists, you need something to coordinate them. This is where the architecture gets genuinely interesting — and where I think most agent platforms haven't thought far enough ahead.

If your competitive analysis agent notices a pattern that's relevant to your client relationship agent, how does that signal get routed? If your meeting prep agent discovers context that your project status agent needs, how do those agents communicate?

The answer I've arrived at is what I think of as a coordinator agent — a meta-agent whose job is to review the state of other agents and decide whether something needs to happen. It doesn't do the domain work itself. It reads across agents, spots cross-domain patterns, and triggers the right specialist when the pattern warrants action.

This is multi-agent orchestration, but with a specific design constraint: the coordinator operates on the same sleep-wake pattern as everything else. It wakes periodically, reviews the landscape, decides whether to trigger other agents or just observe, and goes back to sleep. No always-on processing. No real-time event stream. Just periodic, intelligent review with accumulated context.

Why This Matters Now

I'm going deep on architecture because I think the decisions being made this month — by Microsoft, by Anthropic, by Google, by startups like mine — will compound for years. The session-based products shipping this week are impressive demos. They'll get real usage. But they're building on an architecture that doesn't learn.

If I'm wrong, and session-based is good enough — if people don't need agents that remember, that specialize, that improve over time — then the current products win. They have distribution, they have enterprise relationships, they have brand trust.

But if I'm right that the 50th execution should be meaningfully better than the 1st, then the architecture that compounds will eventually outperform the architecture that doesn't, regardless of who has more distribution today.

That's the bet. And in Part 3, I'll make the business case for why knowledge accumulation isn't just an architectural nicety — it's the only durable moat in an age of commoditized intelligence.


This is Part 2 of "The Agent Convergence." Part 1: Everyone's Shipping Agents covers the convergence and the AC/DC question. Part 3: Knowledge Accumulation as Competitive Moat makes the business case for why accumulated knowledge is the only durable advantage.

Series Navigation

  1. Part 1: The Agent Convergence, Part 1: Everyone's Shipping Agents — That's Not the Interesting Part
  2. Part 2: The Agent Convergence, Part 2: The Sleep-Wake Architecture (current)
  3. Part 3: The Agent Convergence, Part 3: Knowledge Accumulation as Competitive Moat

Related Reading