← Writing Codex Cage · agents in the SDLC

Codex Cage: agents in the SDLC, safely

A CLI that gives an AI agent an issue, an isolated Docker workspace, real verification, and a separate review before any pull request is opened.

Engineering notes·3 min read

Codex Cage is a CLI for delegating a GitHub or Linear issue to an AI coding agent without handing that agent your machine. It creates a disposable Docker workspace, lets the agent implement the change, runs the project's verification commands and a secret scan, then sends the diff to a separate read-only reviewer. Only the host orchestrator commits and opens the pull request after those gates pass. The goal is simple: let agents do useful development work while keeping the blast radius small and the merge criteria explicit. (TypeScript, Apache-2.0.)

The bet behind it: an agent should earn a merge by passing real checks, not by being trusted up front.

How it works

Everything starts from the command line. You give the agent a task, and Codex Cage drops it into a throwaway Docker workspace with its own volume, network and container. That box runs as a non-root user and has no path back to your machine: it can't see your working tree, your Docker socket or your SSH keys. So the agent is free to build things and break things, and none of it leaks onto your laptop.

The sandbox is a real working environment, not an empty shell. The agent gets its own scoped environment variables and secrets, plus a proper data layer through Docker Compose. If a task needs Postgres or Redis, those services come up alongside the agent and it reaches them by name, the same way the app would in production. That lets it actually run the code and the tests against real dependencies while staying fully contained.

Inside that box the agent works the way a developer does. It writes a change, runs it, runs the tests, reads the output and tries again, looping on the implementation until the task's checks pass. Because nothing depends on your local setup, two runs never step on each other.

When the implementation looks done, a different agent reviews it. The reviewer starts fresh in its own session and only reads the diff, so it judges the work on its own terms instead of trusting the agent that wrote it. It checks the change against the acceptance criteria, which are the verify commands plus a review policy, and returns a clear pass-or-block decision. The review is deliberately read-only: if the code shifts underneath it mid-review, the run is failed rather than shipped.

If verification fails or the reviewer blocks, the work doesn't just stop. The findings become feedback and go back to the implementation step, and the develop-then-review loop runs again, up to a set number of cycles. The agent keeps going until it genuinely meets the criteria, or it stops cleanly and tells you it couldn't.

Publishing is the one thing the agent never does itself. Only the host orchestrator commits the change and opens the pull request, and only after every gate has passed. While all of this is running you can watch it: the CLI shows the status of each run and phase and the Docker resources it's managing, run metadata is kept in SQLite with larger artifacts on disk, and a cleanup command tears the sandboxes down when you're finished.

Future work

The direction I want to take it is to make the agent feel like a teammate you delegate to, not a tool you babysit: richer per-project review policies and acceptance criteria, more issue providers and task sources, tighter CI/CD integration so an agent picking up a ticket is just another contributor whose PR you review, and live monitoring of running sandboxes, plus a sharper feedback loop between review and implementation.

Conclusion

Codex Cage is my concrete answer to “agents in the SDLC, but safe”: isolate the agent, give it its own environment and data layer, make it earn a merge by passing verification and an independent review against the acceptance criteria, and keep the human at the one decision that matters, which is whether to merge the PR. Autonomy in the middle; hard checks and people at the edges.

View on GitHub More writing