The Clicking Isn't the Hard Part

Anthropic just shipped computer use — AI that controls your desktop, clicks your buttons, navigates your browser. It's a genuine breakthrough. And it reveals a fundamental architectural question the agent ecosystem hasn't resolved: should AI work through your computer, or instead of your computer?

I watched the demo this morning. Claude opens Finder, navigates Chrome, fills in spreadsheets. 15 million views in hours. The excitement is deserved — this is the most visceral demonstration of AI capability since GPT-4's launch. You assign a task from your phone, come back to finished work on your Mac.

But I keep returning to a distinction that matters more than it seems: computer use is not agent-native. And that distinction determines the entire trajectory of what agents become.

Why is computer use fundamentally not agent-native?

An agent-native system is one where AI operates as a persistent, autonomous entity — with identity, memory, and accumulated context that compound over time. The agent doesn't emulate a human sitting at a desk. It works in a domain it understands, on a cadence it manages, producing output that improves with tenure.

Computer use is the opposite architecture. It's session-based: you assign a task, Claude executes it by controlling your screen, the session ends. It's desktop-bound: the agent needs your machine, your apps, your screen. It's stateless: session 47 has the same zero context as session 1. And it's human-initiated: you're still the trigger, the context-holder, the quality gate.

The strengths are real. Computer use solves the last-mile integration problem — when there's no API, no connector, no MCP server, the agent just looks at the screen and clicks. That's powerful. It means Claude can interact with any software, not just software built for AI. For ad-hoc tasks — filling forms, navigating legacy apps, moving files — it's genuinely magical.

But the ceiling is also real. An agent that navigates your desktop is fundamentally doing RPA with a better model. It handles more ambiguity, recovers from more errors, adapts to more interfaces. That's meaningful progress on the interface layer. It doesn't touch the intelligence layer — the layer where an agent knows which spreadsheet matters, why this week's report should emphasize different metrics, and how your stakeholders actually want information delivered.

That knowing comes from accumulated context. Not from screenshots of your desktop.

What's the open question between local access and cloud intelligence?

Here's where I have to be honest about the bet we're making at YARNNN — because it's a real bet with real risk.

Computer use assumes the agent needs access to your local environment. Your files, your apps, your browser sessions, your credentials. That's a massive practical advantage: it works with everything you already use, exactly as you use it.

The cloud-native agent model — which is what YARNNN is building — assumes the opposite. Agents don't need your desktop. They connect to platforms (Slack, Notion, your work tools) via APIs, accumulate context from those platforms over time, and produce output independently. No desktop required. No human triggering each run.

The risk is obvious: if the world decides that local desktop access is the primary mode of AI work, cloud-native agents lose. If people want AI that is them — clicking their buttons, using their apps, operating in their environment — then the entire thesis of persistent, cloud-based, autonomous agents is wrong.

I'm betting it's not. Here's why: local access doesn't compound. Your desktop doesn't get smarter because an AI used it yesterday. But a cloud agent that's been connected to your Slack for 90 days, accumulating observations, learning from feedback, building a model of your domain — that agent produces qualitatively different output on day 91 than day 1. That's a moat. Local access is a capability; accumulated context is a compounding asset.

But I could be wrong. And YARNNN's entire trajectory depends on this bet resolving in favor of cloud-native intelligence over local desktop control. That's the honest version.

Where do developmental agents fit in the landscape?

The agent ecosystem is fracturing into distinct architectural philosophies. Claude Code treats AI as a tool — brilliant per-session, stateless by design. OpenClaw and the open-source agent movement treat AI as a collaborative colleague — agent-to-agent communication, shared context, emergent coordination. Computer use treats AI as a desktop operator — your hands, but faster.

YARNNN is betting on a fourth model: AI as a developmental knowledge worker. Agents that have identity, that accumulate context from real work platforms, that receive feedback and adjust, that get promoted from simple digests to complex synthesis as they earn trust. Not tools. Not operators. Not colleagues. Employees — in the sense that their value increases with tenure.

The teams building containers are solving the infrastructure problem. The teams building computer use are solving the interface problem. The teams building agent communication protocols are solving the coordination problem. The teams building developmental agents are solving the intelligence problem — and the intelligence problem is the one that compounds.

Computer use is impressive. It may become a useful primitive inside larger agent systems. But the hard part of knowledge work was never navigating your browser or clicking the right button. The hard part was always knowing what to do — and knowing it better than you did last month.

The clicking isn't the hard part. The knowing is. And knowing is what accumulates.

Kevin Kim is the founder of YARNNN, a platform for developmental AI agents that accumulate context and improve with tenure.

The Clicking Isn't the Hard Part

At a Glance

Why is computer use fundamentally not agent-native?

What's the open question between local access and cloud intelligence?

Where do developmental agents fit in the landscape?

Related Reading