yarnnnyarnnn
← Back to blog

Why LLM Knowledge Bases Will Replace RAG for Agents That Do Real Work

·5 min read·How It Works·yarnnn

At a Glance

Answer: RAG was built for Q&A. But AI agents that produce recurring output need accumulated context, not retrieved fragments. Karpathy's LLM Knowledge Bases workflow...

This article covers:

  • What's the difference between answering a question and doing the work?
  • Why does retrieval fail for recurring autonomous work?
  • What does the accumulation architecture actually look like?
  • Where does this break?
  • Why is the industry building the wrong thing?

What this article answers (plain language): RAG works for chatbots that answer questions. But AI agents that produce autonomous, recurring output — competitive briefs, status reports, market analyses — need a different architecture. LLM-maintained knowledge bases, where context accumulates and improves over time, are that architecture.

What's the difference between answering a question and doing the work?

RAG was designed for retrieval. Agents that do real work need accumulation. The distinction determines the entire architecture — and the AI industry has been building the wrong one for autonomous agents.

Retrieval-Augmented Generation works for a specific use case: a user asks a question, the system searches a corpus, grabs semantically relevant chunks, and feeds them to a model alongside the query. The model answers. The interaction is stateless — nothing about that exchange improves the next one.

For chatbots, that's sufficient. For agents that produce recurring autonomous output, it's structurally wrong.

Why does retrieval fail for recurring autonomous work?

An agent tasked with producing a weekly competitive brief doesn't need the three most semantically similar paragraphs from a Pinecone or Weaviate index. It needs a holistic understanding of the competitive landscape — who the players are, how they've moved this quarter, what patterns are forming across multiple signals over time.

That understanding can't be assembled on demand from fragments. It has to be built, maintained, and refined across execution cycles. A recent Arize AI survey found that 80% of agent failures in production are state management problems, not prompt quality problems — the agents are smart enough, they just lose track of what they know.

The difference is temporal. RAG treats every interaction as independent. Agent work is inherently sequential — each execution should build on what the agent learned in previous ones. A competitive brief written in week 12 should reflect everything the agent has accumulated since week 1, not just whatever happens to match a retrieval query at the moment of execution.

Andrej Karpathy recently described a personal workflow that captures this distinction precisely. He builds LLM-maintained knowledge bases: raw documents compiled into structured markdown wikis, with auto-maintained index files, cross-references, and — critically — outputs filed back into the wiki so every query enriches the knowledge base for the next one.

His observation that surprised even him: the auto-maintained indexes made "fancy RAG" unnecessary at working scale. The LLM, given structured files with summaries and cross-links, could navigate the knowledge base without vector retrieval.

What does the accumulation architecture actually look like?

The pattern emerging independently across multiple teams follows the same structure.

Structured domains, not flat corpora. Knowledge isn't dumped into a single vector store. It's organized into typed directories — competitive intelligence in one domain, market research in another, relationship tracking in a third. Each domain has its own entity structure: per-company folders for competitors, per-contact folders for relationships, per-segment folders for market analysis.

Auto-maintained indexes. Every domain has a tracker file — a deterministic index rebuilt after each execution cycle. The tracker lists every entity, its last update, its current status. Zero LLM cost. The agent reads the tracker first to understand what exists before reading individual files. This is Karpathy's insight made mechanical: you don't need embedding-based retrieval when the LLM has a structured index that tells it exactly where everything is.

The accumulation loop. After producing output, the agent writes entity updates back into the knowledge base. A competitive brief that surfaces a new market entrant creates an entity file for that competitor. The next cycle reads that file and builds on it. Over weeks and months, the knowledge base becomes a compound asset — richer, more connected, more valuable with every execution.

Feedback as knowledge. User edits to agent output aren't just corrections — they're signal. Edit patterns get distilled into preference files that the agent reads on future executions. The workspace doesn't just accumulate facts. It accumulates taste.

Where does this break?

The honest answer: at scale. The architecture works when knowledge bases are compact — hundreds of files, not thousands. Current implementations read a fixed number of files per domain (typically the most recently updated), which means older-but-relevant files become invisible as the corpus grows.

Recency is a poor proxy for relevance. A three-month-old competitor profile is highly relevant to a comparative analysis, but it gets pushed out by newer files about different entities. The index files already describe everything in the workspace. The missing step is using those indexes for selective retrieval — letting the agent read the tracker, decide what it needs for the current task, and pull those specific files.

This is where accumulation and retrieval eventually converge. Not RAG-style semantic retrieval against a flat corpus, but index-driven selection within a structured, accumulated knowledge base. The accumulation architecture provides the substrate. Intelligent retrieval provides the scaling mechanism. Neither works without the other.

Why is the industry building the wrong thing?

The AI infrastructure market has spent two years optimizing for the chatbot use case. Vector databases raised over $700M in 2023-2024 alone — Pinecone, Weaviate, Chroma, Qdrant. All built for the assumption that an AI system's job is to find the right fragment in response to a query.

But the highest-value use case for AI agents isn't answering questions. It's producing work. Recurring briefings, ongoing research, scheduled reports, continuous monitoring. Work that requires persistent understanding, not on-demand retrieval.

Karpathy's workflow — built independently by one of the most respected practitioners in AI — validates this thesis. He didn't reach for a vector store. He reached for markdown files, directory structure, and an accumulation loop.

The question for the industry isn't RAG or accumulation. It's whether the next generation of agent platforms will recognize that the knowledge base is the product — not the model, not the retrieval pipeline, not the prompt chain. The knowledge base, accumulated and maintained by AI, readable by humans, improving with every cycle.

That's the architecture Karpathy described. The teams that recognize it early will own the category.

Related Reading

The Clicking Isn't the Hard Part

Claude Computer Use lets AI control your desktop. But computer use is fundamentally not agent-native — it's session-based, stateless, and desktop-bound. The...