Two AI Cybersecurity Products Walk Into a Bar

Last week, Anthropic quietly announced Mythos. A model so capable at finding vulnerabilities that they decided not to release it publicly. Days later, OpenAI raised its hand and said it has one too.

The coverage framed this as a cybersecurity breakthrough moment. AI that can spot flaws that evaded human detection for nearly three decades. Models that will reshape how enterprises think about security.

They are not wrong. But they are describing a different problem than the one most engineering and security teams are about to walk into.

What Mythos and its equivalents actually do

Vulnerability discovery tools, at their core, do static and dynamic analysis at scale. They read your codebase, your dependencies, your configurations. They pattern-match against known vulnerability classes and reason about novel attack surfaces. They are extraordinarily good at finding things that were always there but required expert eyes and time to surface.

This is genuinely valuable. A model that finds a 28-year-old memory safety bug before a threat actor does is worth deploying.

But notice what these tools operate on: code at rest, infrastructure configurations, historical CVE patterns. The threat surface they are targeting is the software you have already built.

The threat surface that nobody is talking about

Here is what is happening in parallel, at most of the same enterprises that will adopt Mythos-style tooling.

They are deploying agents.

Not demos. Not internal hackathon projects. Production agents. Agents with tool access to Salesforce, to internal APIs, to file systems, to code execution environments. Agents that make decisions, take actions, and chain calls across systems in ways that are difficult to fully predict from the prompt that initiated them.

The attack surface this creates is not in your code. It is in the runtime behaviour of systems that were not designed with a fixed execution path. You cannot statically analyse an agent the way you analyse a codebase, because the agent's behaviour is a function of its context window, its tool responses, its model version, and the sequence of external inputs it receives during a session.

Mythos finds the vulnerability in your authentication library. It does not tell you what happens when your customer-facing agent is prompt-injected through a malicious document it was asked to summarise, escalates privileges through a misconfigured tool allowlist, and exfiltrates data through a series of individually-innocuous API calls.

Those are different problems. They require different solutions.

The three gaps that static analysis cannot close

1. Tool access control at runtime

An agent is only as dangerous as the tools it can call. But tool allowlists in most agent frameworks are defined at the framework configuration level, not enforced at the network or policy layer. This means a compromised or manipulated agent can attempt calls that the developer intended to be off-limits, and the only thing stopping it is a config file that the agent itself may have influence over.

Runtime governance means policy enforcement that sits outside the agent, at the call boundary, and cannot be overridden by the agent's own reasoning.

2. Audit trails that survive the session

Agents are stateful across a session but stateless across sessions in most implementations. When something goes wrong, reproducing the chain of events requires logs that capture not just the final tool call, but the full context: the prompt chain, the intermediate reasoning steps, the tool responses that informed each decision.

Most observability tooling captures input and output. It does not capture the causal chain. This matters enormously when you are trying to understand whether an anomalous action was a genuine compromise, a model hallucination, or intended behaviour that had unexpected downstream effects.

3. Exec approval gates for high-stakes actions

Some actions are reversible. Some are not. Deleting a record, sending an external communication, executing a financial transaction, modifying access controls: these are categories of action where human approval in the loop is not a UX preference but a risk management requirement.

Static analysis cannot tell you which tool calls in a live session are about to cross a threshold that requires escalation. That requires a policy engine that understands the action being requested, the agent context requesting it, and the approval chain configured for that combination.

Why the labs cannot solve this for you

There is a structural reason why Anthropic and OpenAI are building vulnerability discovery tools and not agent runtime governance platforms.

Vulnerability discovery is a problem they can solve without touching your infrastructure. They train a model, you point it at your codebase, it returns findings. The model never needs to know anything about your deployment topology, your agent framework choices, your internal tool schemas, or your approval workflows.

Runtime governance is the opposite. It requires sitting inside your deployment boundary, understanding your specific agent configurations, integrating with your tool layer, and enforcing policy at the point where decisions become actions. It is inherently on-premise, inherently vendor-agnostic across agent frameworks, and inherently specific to each enterprise's risk posture.

OpenAI cannot build a governance layer for agents running on LangGraph calling your internal APIs. Anthropic cannot enforce approval gates on a multi-agent workflow that uses Claude in one step and Mistral in another. Not because they lack the capability, but because the architectural position required to do it sits inside your network boundary, not theirs.

That is the gap. That is where Tropic sits.

What this means practically

If you are deploying agents in production today, the question is not whether to adopt AI-assisted vulnerability scanning. You probably should, eventually.

The question is what you have in place right now for the agents that are already running. What tool calls can they make. What they are logging. What happens when one of them does something unexpected at 2am. Who gets paged. What the approval chain looks like for the actions you have defined as high-stakes.

If the answer to any of those is “we are figuring it out” or “it is handled at the framework level,” that is the gap worth closing first.

Mythos will find the bug in your library. It will not govern the agent that is calling that library at runtime, on behalf of your customers, with access to their data.

That is a different problem. And it is the one most enterprises have not solved yet.

Tropic is a vendor-agnostic agent governance and security control plane. It provides runtime policy enforcement, audit trails, cost attribution, and exec approval gates for enterprises deploying agentic AI at scale. Early access at tropic.bot.