AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Running Codex safely at OpenAI

infonewsLLM-Specific

safetysecurity

Source: OpenAI BlogMay 8, 2026

Summary

OpenAI's Codex is a coding agent that can autonomously perform tasks like reviewing code and running commands, which creates security risks that need careful control. To deploy Codex safely, OpenAI uses sandboxing (technical boundaries limiting where the agent can write and what it can access), approval policies (requiring human review for risky actions), network restrictions (blocking unexpected connections), and audit logging (recording what the agent does). These controls work together to let Codex move quickly on routine, low-risk tasks while stopping for review on higher-risk actions.

Solution / Mitigation

OpenAI's explicit mitigations include: sandboxing to define execution boundaries, approval policies requiring human review for higher-risk actions, auto-approval mode for routine low-risk requests, managed network policies (allowing expected destinations and blocking unwanted ones), secure credential storage in the OS keyring, forcing authentication through ChatGPT tied to enterprise workspace controls, command rules that allow benign commands without approval but block or require approval for dangerous commands, and agent-native telemetry and audit trails for visibility into agent behavior.