yarnnnyarnnn
← Back to blog

OpenClaw vs Claude Code vs YARNNN: Three Models of Agent Orchestration

·9 min read·Kevin Kim

I've spent the last year building an AI agent product. I use Claude Code daily. I've studied OpenClaw's agent model closely — it's one of the most interesting open-source agent architectures out there, and it accumulated 17,830 GitHub stars in 24 hours for good reason. And I'm building YARNNN, which takes a fundamentally different approach from both.

There are three ways to build an AI agent right now. Each one makes a different bet about what matters most. And the bet you make determines whether your AI gets smarter over time — or starts from zero every single session.

The Tool Model: Claude Code

Claude Code is the best tool-model agent shipping today. You open a session, give it a task, it executes with access to your codebase, and the session ends. Next time you open it, it reads your CLAUDE.md file and starts fresh.

The strengths are real. It's predictable — the agent does nothing you didn't ask for. It's auditable — every action is in a session log. It's cost-efficient — you only pay for compute when you're actively using it. And it's enterprise-friendly, because stateless tools don't accumulate liability.

But here's what I kept hitting: I'd have a great session where Claude Code understood my project deeply, made smart architectural decisions, navigated tradeoffs perfectly. Then I'd close the terminal and open it again the next day. Gone. All of it. The understanding, the patterns it noticed, the preferences it learned about how I structure code. I'd re-explain the same context, re-establish the same constraints, re-teach the same preferences.

Claude Code's CLAUDE.md is a clever workaround — a static file that provides project-level context. But it's a file I maintain manually. It doesn't learn. It doesn't update itself based on what happened in the last 50 sessions. The quality of session 51 is essentially identical to session 1, minus whatever I remembered to write down.

The product thesis: the LLM is the product. Better models equal better output. Context is the user's problem.

The Agent Model: OpenClaw

OpenClaw goes the other direction entirely. The agent is a colleague — it has persistent identity (SOUL.md), behavioral instructions (AGENTS.md), accumulated memory (MEMORY.md plus daily logs), and continuous awareness through heartbeats. A unified gateway routes all inputs — messages, crons, hooks, webhooks, heartbeats — to a Lane Queue for serial execution per workspace.

The agent "lives" in its workspace. It remembers what happened yesterday. It notices patterns across sessions. It builds up understanding of your project, your preferences, your communication style — automatically, without you maintaining a static file.

I've studied this model closely, and the always-on, always-aware agent is genuinely compelling when it works. The moment your AI proactively says "I noticed you've been dealing with this recurring issue" — that's magic. That's the moment it stops feeling like a tool and starts feeling like something more.

But I also know the costs. Heartbeats burn compute even when nothing changed. The single-workspace assumption means one agent per context — if you need 20 specialized work products, you need 20 workspaces, each with their own always-on process. The personification adds product complexity that users don't always want. And scaling? Architecturally expensive.

The product thesis: the agent relationship is the product. Accumulated context equals moat. The agent understands you.

The Deliverable Model: YARNNN

YARNNN makes a different bet. The agent isn't a tool you invoke or a colleague that lives in your workspace. It's a network of purpose-built specialists — and they sleep.

Each deliverable in YARNNN is a lightweight, self-contained agent. It has its own instructions, its own accumulated memory, its own data sources, its own schedule, and its own output history. When it's time to execute — on a schedule, on a trigger, on demand — the deliverable wakes up with full context of everything it has ever learned about this specific work product. It produces output. Then it goes back to sleep.

I have about 20 deliverables running right now. A weekly work status digest. A competitive research brief. A calendar preview. Meeting prep flows. Each one has executed dozens of times. Each one is measurably better than it was on its first run — not because I tuned prompts, but because the deliverable accumulated memory about what I actually care about, what format I prefer, what level of detail is useful.

The Monday digest knows what a good Monday digest looks like for me. The meeting prep knows which meetings actually need prep and which don't. This isn't general intelligence getting smarter. It's 20 specialized agents, each compounding quality in their own domain.

The product thesis: accumulated, specialized context is the product. Quality compounds per work product, not per session.

The Technical Comparison

Here's where the models diverge architecturally:

| Dimension | Claude Code | OpenClaw | YARNNN | |-----------|------------|----------|--------| | State persistence | None (session-scoped) | Full (workspace-level) | Per-deliverable (scoped accumulation) | | Context source | User-provided (CLAUDE.md) | Auto-accumulated (MEMORY.md + logs) | Auto-accumulated per specialist | | Idle cost | Zero | Continuous (heartbeats) | Zero (agents sleep) | | Multi-specialist | No (one session, one task) | No (one agent per workspace) | Yes (20+ deliverables = 20+ specialists) | | Quality compounding | Resets each session | Compounds globally | Compounds per deliverable | | Execution trigger | User-initiated | Gateway-routed (any input) | Schedule, signal, or on-demand | | Memory model | Static file (CLAUDE.md) | Living logs (MEMORY.md + daily) | Scoped memory per deliverable + global user memory | | A2A readiness | None (no persistent identity) | Partial (single agent identity) | Native (each deliverable is an agent card) |

The Architecture That Matters

<!-- Architecture Diagram (Mermaid): graph TB subgraph "Claude Code: Tool Model" U1[User] -->|session| CC[Claude Code]
    CC -->|reads| CMD[CLAUDE.md]
    CC -->|output| O1[Response]
    style CC fill:#1a1a2e
end

subgraph "OpenClaw: Agent Model"
    U2[User] --> GW[Gateway]
    CR[Crons] --> GW
    WH[Webhooks] --> GW
    HB[Heartbeats] --> GW
    GW --> LQ[Lane Queue]
    LQ --> OC[Agent]
    OC -->|reads/writes| MEM[MEMORY.md]
    OC -->|identity| SOUL[SOUL.md]
    OC -->|output| O2[Response]
    style OC fill:#1a1a2e
end

subgraph "YARNNN: Deliverable Model"
    U3[User] --> TP[Thinking Partner]
    SCH[Scheduler] --> D1[Deliverable 1]
    SIG[Signals] --> D2[Deliverable 2]
    DEM[On-demand] --> D3[Deliverable N]
    D1 -->|own memory| M1[Memory 1]
    D2 -->|own memory| M2[Memory 2]
    D3 -->|own memory| M3[Memory N]
    PC[Platform Content] --> D1
    PC --> D2
    PC --> D3
    style D1 fill:#1a1a2e
    style D2 fill:#1a1a2e
    style D3 fill:#1a1a2e
end

-->

Think of it this way. Claude Code is a brilliant contractor you hire for the day. They do great work. But tomorrow, they've never met you. OpenClaw is a full-time employee who lives in your office 24/7, even on weekends, even when there's nothing to do. YARNNN is a team of specialists on retainer — each one remembers everything about their domain, shows up exactly when needed, and doesn't charge you for sleeping.

What the Industry Gets Wrong

The dominant narrative in AI agents right now is: more autonomy = better agent. Give the agent more tools, more access, more freedom to act. The logical endpoint is an always-on, fully autonomous digital colleague.

I think this gets the vector wrong. The thing that makes AI output useful isn't autonomy — it's context. An agent with full autonomy but zero context about your work will produce confident, well-structured, completely generic output. An agent with limited autonomy but deep accumulated context about your specific work products will produce something you'd actually use.

After building in all three paradigms, here's what I believe: the model that wins isn't the smartest or the most autonomous. It's the one that knows your work best after 90 days of use. That's the real test. Not "can it do X?" but "is it meaningfully better at doing X for you specifically than it was three months ago?"

Claude Code's answer is no — by design. OpenClaw's answer is yes — but at continuous cost. YARNNN's answer is yes — per specialist, at zero idle cost.

Why This Matters for What Comes Next

There's a reason I care about this beyond product positioning. The Agent-to-Agent future — where AI agents negotiate, collaborate, and hand off work to each other — is coming. Google's A2A protocol, MCP tool use, multi-agent orchestration. In that world, agents need to describe themselves, advertise capabilities, and carry domain knowledge.

A YARNNN deliverable is already an agent card. It has a name, a purpose, defined capabilities, accumulated domain knowledge, a trigger model, and an output history. Twenty deliverables means twenty specialized agents ready to participate in an A2A network.

A single-agent model has to figure out how to expose sub-capabilities from one monolithic identity. A stateless tool model has no persistent identity to advertise at all.

The deliverable-as-agent architecture isn't just a product choice. It's a bet on how the multi-agent landscape actually plays out — not as a few big generalist agents, but as networks of specialized, context-rich specialists.

I could be wrong. But after studying all three, this is the architecture I'd bet on.


References