Draw the Autonomy Line at Goal-Choice
At a Glance
Answer: The safest place to draw an AI autonomy boundary is at goal-choice. An agent can own tempo, allocation, and method without owning direction. YARNNN encodes that...
This article covers:
- Why is goal-choice the right place to draw the line?
- What does that boundary look like in the substrate?
- Doesn't this just limit what the agent can do?
- When would you move the line?
- What this means
What this article answers (plain language): Every autonomous AI needs a boundary that says "this far and no further." The strongest place to draw it is at goal-choice. An agent can be trusted to decide how to do the work, when to do it, and how to get better at it — while the why stays with the operator. YARNNN encodes that boundary as configuration the agent reads but cannot rewrite.
The most important line in an autonomous system is the one between deciding how to act and deciding what to act toward. Put the boundary there — let the agent own method, tempo, and self-improvement, but reserve direction for the operator — and you get a system that can be genuinely autonomous without ever being unsupervised in the way that matters. Draw it anywhere else and you either cripple the autonomy or you hand away the one thing the operator should never delegate.
A recent Anthropic essay on AI's capability trajectory locates this line precisely. It names "research taste" — judgment in choosing goals — as the part of the work that doesn't automate, and it treats the moment an AI starts choosing its own direction as the threshold where the risk picture changes. YARNNN's architecture draws its autonomy boundary right at that threshold. On purpose.
Why is goal-choice the right place to draw the line?
Autonomy is not one dial. It's at least four: what to do (direction), how to do it (method), when to do it (tempo), and how to get better at it (self-improvement). Most discussions of AI autonomy collapse these into a single slider from "assistant" to "fully autonomous," which is why they produce so much anxiety — sliding the one slider seems to give away everything at once.
Separate the dials and the picture clarifies. Method, tempo, and self-improvement are exactly the dials an agent should own, because they're the ones where the agent's speed and consistency beat a human's. Direction is the one dial where the operator's judgment is irreplaceable — and where an agent's optimization pressure, pointed at a goal it chose for itself, is most likely to drift somewhere the operator never intended.
So the line goes between the three the agent should hold and the one it shouldn't. The agent decides how to execute, when to wake, and how to tune its own cadence. The operator decides what the operation is for.
What does that boundary look like in the substrate?
In YARNNN it's two files the operator authors and the agent cannot rewrite.
The mandate holds standing intent — what this operation exists to do, who it serves, what counts as success. We've made the full case for this shape in Mandate-Driven AI: standing intent authored by the operator, autonomous execution governed by it. The Reviewer — the independent judgment seat — reads the mandate at every decision. It reasons within it. It never edits it.
The autonomy config holds the second half: how far decisions are allowed to bind before a human has to see them. Spend ceilings, what auto-approves, what always escalates. This file is governance-locked. The agent reads it to know its limits; it has no authority to widen them.
Together, these two files are the boundary. Everything above the line — direction, and how far autonomy extends — is operator-authored and agent-read-only. Everything below it — method, tempo, self-improvement — is the agent's to exercise. The line isn't a runtime check bolted onto the agent's reasoning. It's a property of who can write which file.
Doesn't this just limit what the agent can do?
This is the objection worth taking seriously: isn't a fenced agent a weaker agent? If the Reviewer can author its own cadence (which it can — see A Self-Improving Loop You Can Actually Watch) but not its own mandate, isn't that an arbitrary stopping point?
No — and the Anthropic essay is the reason why. The essay's whole caution is about the closed loop where AI improves AI past the point of human understanding, and it identifies goal-choice as the specific capability that, once automated, removes the human from the part of the loop that matters most. Fencing the recursion below direction-setting isn't a limitation the architecture failed to overcome. It's the architecture refusing to cross the exact line the essay flags as the danger threshold.
The agent is not weaker for it. It gets faster, more consistent, and better-calibrated at serving the operation — within a direction the operator set. What it doesn't get is the authority to decide the operation should be doing something else. That's not a capability gap. That's the supervision model working as designed — the same bet we made in Why We Build for Supervision, drawn at the sharpest possible point.
When would you move the line?
Being explicit that the boundary is a choice means being honest that it could move. You could, in principle, let an agent propose changes to its own mandate. The question is what verification you'd want first.
The Anthropic essay is the argument for moving it slowly. If goal-choice is the threshold where misalignment can compound least-observably, then handing an agent authority over its own direction is precisely the change you'd want gated behind heavy verification — outcome history on a substrate where ground truth is honest, a track record of the agent's judgment holding up against reality, and the ability to revert. YARNNN's current answer is to keep direction with the operator and let everything below it improve under measurement. That's the conservative line. The architecture makes the line explicit so that moving it would be a deliberate, verifiable decision rather than a default that crept in.
What this means
If you're building or buying an autonomous AI system, ask where its autonomy boundary is drawn — and whether it's drawn in a place the agent can move. A system where the agent's method, tempo, and self-improvement are autonomous but its direction is operator-authored and agent-read-only is a system you can let run. A system where the agent can quietly redefine its own goal is a system you have to watch by hand, forever.
Draw the line at goal-choice. Give the agent the dials it's good at. Keep the one dial that decides what "good" even means.
Key Takeaways
- Autonomy is at least four dials: direction, method, tempo, self-improvement. Treat them separately.
- Method, tempo, and self-improvement are the agent's to own. Direction is the operator's to keep.
- In YARNNN the boundary is two operator-authored, agent-read-only files: the mandate (standing intent) and the autonomy config (how far decisions bind).
- The line is enforced by write authority over substrate, not by a runtime check inside the agent's reasoning.
- Fencing self-improvement below goal-choice is the architecture refusing to cross the threshold Anthropic names as the danger line — a deliberate choice, not a missing feature.
- Related: A Self-Improving Loop You Can Actually Watch and Mandate-Driven AI.
Related Reading
Anthropic warns that self-improving AI could compound misalignment until we lose control. A self-improving loop is safe when the agent improves against ground...
The AI industry races toward full autonomy. yarnnn deliberately chose human-in-the-loop supervision as the end state, not a stepping stone. Here's the...
Anthropic calls research taste the last human moat in AI. The answer isn't scaling until taste emerges — it's a substrate where judgment is authored by the...