The Three Columns Behind the Ten Walls

Ajey Gore published a piece yesterday called The Ten Walls. It maps what happens when a technical person tries to self-host an AI agent past the getting-started phase. Infrastructure that turns into a second job. Memory that quietly gives up. Security as a moving target you didn't sign up to track. Token bills with a personality. Each wall is real, each wall is solvable on its own, and that is precisely the problem. You solve one and the next three show up.

I read it and recognised every wall. I've hit all of them. Tropic exists because I'm a technical person, I tried to run agents properly, and I found it unreasonably hard. Not the getting-started part. The keeping-it-running part. The part where you notice you've spent another weekend on Docker instead of the thing the agent was meant to help you with.

The way I've been thinking about agent security for a while now is in three columns: Infrastructure, Authority, Influence. Infrastructure is the host and the runtime, the things that run the agent. Authority is what the agent is allowed to do and who can make it do things. Influence is what shapes the agent's behaviour, its memory, its context, its coordination with other agents.

Reading Gore's piece, what struck me is how cleanly his ten walls fall into those three columns. So that's how I want to walk through them. Not as ten separate problems to solve, but as three columns, with Authority as the one that has to come first.

Authority: the column that has to come first

Gore's walls three, five, nine, and ten (security, the skills marketplace, team sharing, invisible agent errors) all sit in the authority column. They feel like different problems. A firewall question, a supply chain question, a permissions question, a hallucination question. They're not. They're one question asked four ways. Can you tell what your agent is allowed to do, who is allowed to make it do things, what it actually did, and that it couldn't have done worse?

If you can't answer that, nothing else matters. Not the infrastructure, not the token costs, not the memory. Because the moment your agent does something you didn't expect, and it will (Gore's benchmark of 5% tool error is one in twenty actions), you won't know until someone downstream points it out. By which time it's already happened.

This is the column I've spent the most time on, because solving it is what makes the rest worth solving.

Tropic enforces two layers of policy on every agent action, and they work differently on purpose.

The base security layer is deterministic. Before the agent runs any action, it passes through a policy engine that enforces hard-coded rules against known-dangerous behaviours. These aren't toggles. They aren't prompt-engineered hints. They're rules the agent can't talk around, no matter how creatively a prompt injection asks. rm -rf on your home directory doesn't run. Exfiltration to an unknown host doesn't run. The rules that map to OWASP Agentic, MITRE ATLAS, and NIST AI 600-1 are there because they should be, not because you remembered to turn them on. Defence in depth, not perimeter security, and not dependent on you remembering to configure it.

On top of that, there's an application-level policy layer that you control. This is where you shape the agent to your workflow. Three tiers: Allow, Require Confirm, Deny. Want the agent to push to main without asking? Allow. Want to see it first? Require Confirm. Never? Deny. These are your knobs, and they work on whatever tool calls or action patterns matter to you.

Tropic policy editor showing toggleable security policy packs. Base Protection with 62 rules, System Protection with 24 rules, OWASP Agentic with 43 rules, and Lockdown mode. — Base rules are on by default. Application-level policies are yours to tune.

The split matters. Base security is what protects you from the agent. Application policy is what lets you shape the agent without weakening the protection. Most “AI guardrails” I've seen collapse the two into one prompt-engineered layer, which means you're either over-restricted or under-protected. Tropic keeps them separate.

The skills marketplace is where authority meets supply chain. Tropic proxies ClawHub. Every skill your OpenClaw installs routes through the Tropic marketplace first, which lets me run my own scans on skills before they ever touch your agent. Gore's numbers on the state of community skills (a third with vulnerabilities, hundreds outright malicious) become a lot less alarming when something between you and the marketplace is doing the work.

The proxy also removes the friction of credentials. For any skill that supports OAuth, you click Connect and consent to Tropic access. No API keys to generate, rotate, or paste into config files. No keys ending up in Slack messages or committed to a repo by accident.

And if a skill slips through scanning, or a previously-clean skill gets compromised later, the base policy layer is still there catching anything it tries to do that shouldn't be allowed.

Team sharing is authority in a different sense. Who is allowed to make your agent do things. Tropic has three roles: Owner, Admin, Member. Invite by email, assign a role, done. Members can chat with agents and use instances but can't delete resources, manage credentials, or install paid skills. Admins get full operational control. Owners handle billing. Credentials are encrypted at rest and managed centrally.

Tropic workspace members page showing invite form with role selector and a table of current members with their roles. — Invite teammates, assign roles. Works for a team of two, works for a team of fifty.

The invisible-errors wall is where the application layer earns its keep. Require Confirm is the built-in approval gate for the actions you care about. Set it on anything where you want a human in the loop and the agent can't proceed until you say yes. Set it tight while you're learning what the agent does, loosen it as trust builds. Underneath it all, every tool call, every LLM output, every blocked action, every approval ends up in an audit log. Silent errors stop being silent.

Tropic audit log showing a timeline of agent actions. Tool calls, LLM outputs, blocked actions, with expandable JSON details. — Every action, every blocked attempt, every reason. Searchable after the fact.

Tropic also runs automated benchmarks against NIST AI 600-1, OWASP LLM Top 10, and MITRE ATLAS. You get a scorecard you can actually look at, and show to a security team if you're the person who has to do that.

Worth saying clearly: this column doesn't care where your agent runs. Tropic-provisioned VM, a Mac mini in your closet, your laptop, a VPS at Agent37 or Hetzner, doesn't matter. The policy engine, the audit logs, the benchmarks, all of it hooks into your existing OpenClaw through a lightweight agent on the host and a tunnel back to Tropic. You don't have to move your setup to get authority right. That's the point.

Infrastructure: the column that decides whether month two happens

Gore's walls one, four, seven, and eight (infra as a second job, token costs, observability, upgrades) all sit in the infrastructure column. They're the walls that don't kill you on day one. They kill you gradually. Any one of them is a bad weekend. All four together is the reason people quietly stop using the agent.

The honest pattern here is the one Gore describes. First week you're excited. Second week you're troubleshooting. Third week you check your provider dashboard for an unrelated reason and close the laptop. The agent stops being a tool and starts being a tax. Somewhere in there you make a decision without noticing you've made it, and you're back to doing things the old way.

Tropic handles each of these as platform features, not weekend projects.

Provisioning is one click if you want it. Pick a region, click launch, you have a running agent in under three minutes. No Docker files, no YAML, no SSH. The VM is hardened on boot and configured with a custom AMI that has everything pre-installed. You get a browser-based terminal if you ever need direct access, but you shouldn't.

Tropic dashboard showing one-click VM provisioning with region selection and a progress bar. — Pick a region, click launch. That's it.

If you'd rather keep your own setup, that works too. Tropic connects to an OpenClaw instance running on your Mac mini, your laptop, an Agent37 box, or anywhere else you've already got one. You get the same authority layer, the same observability, the same cost tracking. The only pieces that don't fully apply to BYO setups are the ones tied to the VM itself, like one-click backups, and that's on the roadmap.

Cost tracking is unified. Compute and API token spend in one place, broken down by model and time range. Spending limits are set in your settings. Idle instances stop after 30 minutes of inactivity. If you want to trade availability for spend, spot instances cost roughly 40% of on-demand.

Tropic metrics dashboard showing cost breakdown by model, token usage charts, and time-range selectors. — Compute plus API spend, one view. You can actually see yesterday.

Observability is built in from day one. The logs page shows a real-time event stream. Tool calls, LLM outputs, lifecycle events, cost events, blocked actions, all filterable and searchable, every event expandable with full JSON. Under the hood every running agent has Prometheus instrumentation and Grafana dashboards, but you don't need to know that. You just see the charts. Gore's three weekends you didn't know you owed, you don't owe them.

Upgrades get a version selector for the underlying OpenClaw runtime, optional auto-updates, and full VM backups with one-click restore on Tropic-provisioned instances. If an update breaks something, you roll back to a known-good state. No more upgrade anxiety, no more staying on a vulnerable version because the new one might break your workflow.

Tropic backup management page showing backups with status, size, and expiration dates, with a restore-to-new-VM button. — Always have a known-good state to fall back to.

None of this is the interesting part of having an agent. All of it is what determines whether you still have one in three months.

Influence: the column that extends beyond his map

Gore's walls two and six (memory, multi-agent coordination) sit in the influence column. What I want to say honestly is that this column is bigger than the two walls he catalogued, and the industry is still figuring most of it out.

Influence is what shapes the agent's behaviour over time. Not what it's allowed to do (that's authority), but how it decides what to do within those bounds. Memory is part of that. So is the context it accumulates across sessions. So is how it coordinates with other agents, how it decides which sub-agent should handle a given task, and how human feedback gets folded back into its future behaviour. Gore's piece catches two threads of this, and catches them well, but the column as a whole has more in it than two walls.

Memory is where I've made the most progress. Flat memory architectures force permanent decisions about what matters, and those decisions are almost always wrong as context shifts. Tropic agents have a built-in memory system that persists across sessions, is searchable and browsable, and is automatically maintained. You can see exactly what your agent remembers and delete what it shouldn't keep. I wrote a piece on LOD memory as a solution to this.

You shouldn't have to re-explain yourself to your own agent.

Multi-agent coordination is the one I'm still working on. Gore is right that fewer than 10% of teams maintain multi-agent setups, and the honest answer is that Tropic doesn't solve orchestration today. What it gives you is the foundation. Per-agent audit logs that show every tool call, every response, every blocked action, across every agent in your workspace. When coordination fails, you can trace it. That matters more than it sounds, because most multi-agent systems today fail invisibly.

Mission control, i.e. a shared kanban-board, is one way I'm trying to solve this across multiple agents. The same way mission control works for a single OpenClaw instance with multiple subagents. But it's important that each agent has a well-defined purpose and is called upon for the right task.

The rest of the influence column (human feedback loops, cross-agent context, skill selection reasoning) is genuinely frontier. Anyone claiming to have solved it is overselling. What I'm confident about is that authority has to come first. Until you trust what the agent is doing and why, influence-level improvements are just faster ways to arrive at the wrong answer.

None of this is your job

Gore's closing line is the one that landed hardest for me. None of them are the user's job.

He's right. You shouldn't need WireGuard expertise to run an agent. You shouldn't need to configure fail2ban, build a Grafana dashboard, audit skill packages, or invent an approval system to catch hallucinations. You shouldn't need to spend your weekend becoming an infrastructure engineer, a security engineer, a memory architect, and an observability expert, just to have an AI assistant that actually assists you.

That's why I built Tropic. Not because the agent frameworks were lacking (they aren't) or because hosting platforms weren't shipping VMs fast enough (they were). Because nobody was solving the operational middle. The authority layer, the infrastructure layer, the early parts of the influence layer. The stuff that's boring individually and collectively decides whether your agent is still running in three months. And importantly, none of it requires you to move your setup. Tropic hooks into whatever OpenClaw instance you're already running, wherever you're running it.

Gore's ten walls are real. They fall into three columns: Infrastructure, Authority, Influence. That's the framing I've been building Tropic around since day one.

If you're hitting one of these walls, or you've already walked away from an agent because of them, the walls aren't your fault and they aren't your problem to solve. You shouldn't have to build the ladders yourself.