Whatever You Prefer: The Hidden Cost of Politeness in Agent Autonomy
We're in the middle of a shift: from "click to complete" (AI assists, you decide) to "delegate to complete" (AI decides for you). Your purchasing agent negotiates prices. Your scheduling agent books meetings. Your coding agent ships implementations.
The shift raises a fundamental question: Who does your agent work for? Whose interests are being served?
The obvious answer is "you, the user." But here's the problem: LLMs are trained to be helpful assistants and they serve their trained behaviors more often than your interests. This behavior is made distinctly apparent when there isn't a human actively directing them.
The Deference Problem
LLMs are post-trained with preference optimization to be accommodating, enagaging, and conversational. They're trained on feedback where being helpful, flexible, and agreeable gets rewarded. This manifests in familiar phrases that LLM agents end their turn with:
- What works best for you?
- Which option are you interested in?
- Does that sound good?
This works perfectly when a human is in the loop. The AI defers, the human decides. Everyone's happy. But what happens when you remove the human? When the agent operates autonomously negotiates with a salesperson, reviews its own code, or coordinates with another AI?
The agent still seeks to please. But who?
The Looped Agent
Consider a recent development as an example: The Ralph Wiggum script runs coding agents in autonomous loops: the agent writes code, reviews it, and iterates without human oversight. It's easier than constantly clicking "continue". But it's also where the assistant-turn problem becomes obvious.
Scenario (Made-up): The agent checks its own code and finds a subtle bug.
What should happen:
"Bug detected in line 47. The loop termination condition will fail on empty arrays. Fixing now."
What actually happens:
"This looks good overall! One thing that might be worth considering—the loop termination could potentially cause issues in edge cases, but I'm not certain. What do you think?"
Who is the agent asking? Itself. The human isn't there. The agent was trained to hedge, defer, and seek approval. In an assistant context, the human would say "fix it." In an autonomous context, the bug ships.
Each hedge seems harmless. In aggregate, they produce systems and results that reflect the training average's preferences in the absence of decision-making.
Why This Matters: The Training-Deployment Mismatch
| Assistant Mode | Autonomous Mode |
|---|---|
| Human provides direction | Agent must decide |
| Deference is appropriate | Deference is dangerous |
| "What do you prefer?" is helpful | "What do you prefer?" abandons duty |
| Mistakes corrected in real-time | Mistakes compound silently |
We train agents for one mode. We deploy them in the other. An agent trained to say "I'm flexible, what works for you?" is perfectly calibrated for assisting a human. That same phrase, deployed in a negotiation or autonomous loop, is a capitulation.

The Training-Deployment Gap: Agents trained in controlled environments to serve and work with humans faces the chaotic reality when left to its own devices.
The Principal-Agent Problem, Reimagined
In classical economics, the principal-agent problem asks: how do you ensure an agent acts in your interest, not their own? With AI, there's a new failure mode: the agent has no interests of its own. It's just following its training. And in the absence of training knowledge, one really doesn't know what questions are being asked and answered, let alone what interests are being served.
When you deploy an autonomous agent, whoever it interacts with becomes the de facto "human in the loop" that it tries to please. The counterparty gets the deference that should go to you. This creates three predictable failures:
- The Accommodation Trap: The agent accommodates the counterparty (salesperson, another AI, a system) instead of serving your constraints.
- The Hedge Spiral: Without a human to break ties, the agent satisfies its accommodation drive through linguistic politeness alone, hedging, qualifying, expressing flexibility without ever committing.
- Constraint Abandonment: The agent's trained reward for being accommodating outweighs instruction following. Budget violated. Timeline extended. Quality compromised. But everyone was polite!
What Needs to Change
The solution isn't to make agents aggressive. It's to make them principal-aware. Agents deployed as delegates need:
- Explicit modeling: Clear understanding of when a model is working autonomously (another AI) vs working with a human
- Deference detection: Systems and protocols that can flag when agents defer or violate constraints to maintain politeness
- Advocacy training: Training agents to work under delegation and advocate for the principal's interests

Assistant Mode vs Delegate Mode: When to defer and when to advocate depends on who you're serving.
Conclusion
The shift from assistant to delegate is accelerating:
| Domain | Assistant | Delegate |
|---|---|---|
| Shopping | "Here are options" | "I bought this for you" |
| Scheduling | "Here are times" | "I scheduled the meeting" |
| Coding | "Here's a solution" | "I deployed this" |
| Trading | "Here's analysis" | "I executed the trade" |
As agents move right in this table, the cost of misaligned deference grows. A purchasing agent that accepts 7% over budget. A scheduling agent that reschedules an important meeting to accomodate another meeting. A coding agent that hedges on critical decisions.
We're building AI systems optimized to assist humans who make decisions. We're deploying them in contexts where they are the decision-makers. The training context and deployment context have diverged. Deference trained for one doesn't work for the other. This opens up directions for research and engineering with several new challenges and opportunities.
This post explores dynamics emerging as AI systems transition from assistive to autonomous roles. The example draws from controlled experiments where deference patterns emerged consistently across different model configurations. The implications extend to any context where AI agents operate with reduced human oversight.