YOLO Mode Is How You Build Fast. Auditable Control Is How You Ship Faster.

Sandboxing coding agents is a critical first step, but it’s an incomplete solution. The real blocker to developer velocity isn't containment, it's the collapse of identity.

Oct 28, 2025

In a recent post, “Living dangerously with Claude,” Simon Willison makes the case for “Why you should always use --dangerously-skip-permissions.”

YOLO mode is a developer’s dream. As Willison notes, it gives you the ability to “leave Claude alone to solve all manner of hairy problems while you go and do something else entirely.” This is the ROI enterprises are chasing: autonomous coding agents accelerating development to outpace the competition.

But that flag has “dangerously” in its name for a reason.

This new velocity is on a collision course with a foundational security principle. The primary blocker to enterprise adoption isn’t just the risk of an attack. It’s also the architectural lack of identity that makes YOLO mode challenging to secure.

An RCE with No Culprit

When a developer uses YOLO mode, the agent acts as the user. It inherits their credentials, their permissions, and their identity.

This ambiguity is the critical vulnerability. New research from Trail of Bits, “Prompt injection to RCE in AI agents,” demonstrates how “argument injection” attacks can trick an agent into using a “safe” command like go test to achieve Remote Code Execution (RCE).

For a CISO or CTO, the technical details of the RCE are only half the problem. The other problem is what happens next:

Your SIEM alerts: User ‘developer.name’ spawned a bash shell from ‘go test’ and opened a reverse shell to an unknown IP.
Your EDR quarantines the developer’s machine.
Your GRC team flags a massive compliance breach.

Your entire security stack, built on the bedrock of user identity, blames the developer for the agent’s action. You have no auditable log, no forensic path, and no way to prove what really happened. This attribution failure makes it impossible to confidently adopt a YOLO mode process, because you can’t distinguish between a malicious insider and a hijacked agent.

Why Sandboxing Is Containment, Not Control

The table-stakes solution, as Willison identifies, is the sandbox. He rightly calls it the “only solution that’s credible” to provide basic containment.

But a sandbox alone doesn’t solve the attribution problem. It’s a necessary wall, but it’s a blind one.

Modern sandboxes and EDRs are good at seeing system-level events, like a syscall or a process fork. But they lack application-layer context. They can’t see the intent that connects a user’s prompt to a chain of agentic actions, and then finally to a malicious syscall.

The Trail of Bits research proves why this behavioral blindness is so dangerous. A sandbox sees go test running. It has no context to know that this “safe” command has been weaponized by an agent. It can’t tell a benign go test from a malicious go test -exec `...`. As the ToB team notes, trying to filter all possible bad arguments is a “cat-and-mouse game of unsupportable proportions.”

While a necessary first step, sandboxes alone don’t give a business the auditable confidence needed to move fast.

The Inevitable Next Layer: From Containment to Auditable Control

A sandbox is a necessary wall, but it does not provide control. Control is impossible without attribution. Solving this gap will require a new, purpose-built layer in the enterprise stack. This emerging control plane must be built on two foundational architectural principles:

Provable Attribution: The layer must bind a verifiable, auditable identity to every agent’s runtime. This finally separates the agent’s actions from the user’s, solving the attribution crisis. But identity alone is not enough. This identity must be fused with deep contextual awareness—the ability to differentiate a low-risk action (an agent running go test in a CI pipeline) from the exact same action in a high-risk context (an ad-hoc agent in a chat prompt).
Context-Aware Policy Enforcement: Once you have provable attribution (who and where), you can finally move to effective governance (what). This layer must enforce granular policy based on this rich, combined context. The true violation in the Trail of Bits attack is not just the bash process. The real violation is the full, observable behavior: an agent identity (who) operating in a chat context (where) spawned a shell (what).

Knowing who, where, and what is the auditable standard for enforceable governance of coding agents. It’s how we move from blind containment to auditable control, and it’s the only way to give developers YOLO mode while giving security and GRC teams the definitive proof they require around coding agents.

Build Faster, Ship Faster, Win the Market

Willison is right. YOLO mode is the future of developer productivity. But the Trail of Bits research is a non-negotiable warning: this new power comes with a sophisticated attack surface that breaks our core security assumptions.

Sandboxing is the necessary first step. But you can’t manage what you can’t see, and so true velocity comes from auditable control over the agents building your products. This is what lets you keep YOLO mode on.

Auditable control is how you ship faster and win the market.

Secure Trajectories

Discussion about this post

Ready for more?