Security Takeaways from 2025 AI Engineer World's Fair

The Year of Agents, RL, and Security

Jun 09, 2025

Last week's AI Engineer World's Fair in SF reinforced to me that 2025 will be the year of agents.

And the path to reliability in production? Post-training with verifiable RL for achieving performance on domain-specific tasks.

Beyond this main theme, agent security was called out in several keynotes and had its own dedicated track alongside the more popular ones like MCP, SWE Agents, Reasoning+RL, and Agent Reliability.

Here are my takeaways from the talks I saw IRL (mostly Security):

Anatomy of an AI Breach: In his keynote on the pelican riding a bicycle benchmark, Simon Willison highlighted the recent Github MCP exploit as another example of "The Lethal Trifecta for prompt injection: access to private data, exposure to malicious instructions and the ability to exfiltrate information."
The System-Level Control Imperative: Fouad Matin from OpenAI's Agent Robustness and Control team outlined essential safeguards for large-scale agent deployment: sandboxing, limited internet access by default, and human review for consequential operations. You can control things at the model level, but ultimately your most deterministic control is going to be at the system level. In the coming months they will release more tooling for ML and systems controls for agents.
The Promise and Limits of Private Compute: Jonathan Mortensen sketched out Apple's Private Cloud Compute architecture—his company Confident Security has built their own implementation. PCC provides trusted remote servers for computations through enforceable guarantees like stateless computation and verifiable transparency, though it’s limited by single-party reliance and closed-source code.
When Standard AI Evaluations Fail: Leonard Tang of Haize Labs demonstrated why standard evaluations fail for testing AI applications and how standalone LLM-as-a-judge is biased and inconsistent. With Verdict, you can scale judge-time compute by composing weaker LLM models that outperform larger ones, citing case studies with major banks.
Solving the Agent Identity Crisis: Bobby Tiernay and Kam Sween from Auth0 explained how agents need clear identity and properly delegated access to act safely. Their Auth0 AI for JS dispatches user notifications for approval on sensitive actions, ensuring explicit consent.
Containing the Software-Building Agent: As someone building product, I'm excited about Dagger's Container Use ("C U later agent!") that provides modular, local environment isolation for SWE agents while recognizing that model providers are a commodity.
A New Framework for Agent Reasoning: My favorite research talk was Nathan Lambert defining a taxonomy for next-generation reasoning models: skills, calibration, strategy, and abstraction. It provided inspiration for my own side-quest exploring RLAIF for detection engineering.

The throughline connecting every security conversation was that the industry's focus is moving beyond the model itself and toward the broader systems in which agents operate. This system-level view acknowledges that the most significant challenges lie in governing what agents do after the prompt.

As a result, the most interesting work is happening on new foundational layers: creating distinct identities for agents, building contained environments for them to act in, and developing new methods to evaluate their behavior when standard tests fall short.

Secure Trajectories

Discussion about this post