Install and authenticate.
Codex CLI is OpenAI's terminal-native coding agent, tuned for GPT-5.3 Codex and long tool-call loops.
# npm (macOS / Linux / WSL) npm install -g @openai/codex # Authenticate — ChatGPT Plus/Pro or API key codex login # Or set API key directly export OPENAI_API_KEY="sk-..." # Run from repo root cd my-app && codex
Auth via ChatGPT Plus/Pro or an OpenAI API key. API usage is metered per token; subscription tiers may include a bundled allowance depending on plan. Before a week of heavy use, run your expected turn count through the session cost calculator with GPT-5.3 Codex selected.
Project config typically lives in AGENTS.md plus .codex/config.toml or codex.config.json depending on version. Same cross-vendor guidance as the AGENTS.md guide: one canonical context file, short and concrete.
Workspace and repo context.
Codex CLI indexes the working tree on launch. Treat the repo like you would for any agent: a tight AGENTS.md, an ignore file for paths the agent should never read, and explicit commands in the file map.
# build output dist/ build/ .next/ out/ node_modules/ # secrets .env .env.* # lockfiles (token bloat, low signal) package-lock.json pnpm-lock.yaml # large assets public/**/*.{mp4,webm,pdf}
Codex excels when the task has a clear done state: "add Playwright tests for every route under src/app/," "migrate all fetch calls to server actions," "upgrade Next 15 to 16 and fix breaking changes." Vague prompts produce vague diffs.
Goal mode and long runs.
Goal mode (also called autonomous or batch mode in docs) is Codex CLI's killer feature: you describe an outcome, set constraints, and walk away. The agent plans sub-steps, executes tool calls, and reports when done or blocked.
When goal mode pays off for web dev:
- Scaffolding a full test suite from an existing app map
- Mechanical API renames across 40+ files
- Dependency major-version upgrades with fix-forward
- Generating missing TypeScript types from runtime schemas
When it doesn't:
- UI polish and spacing tweaks (you need to see pixels)
- Product copy and information architecture
- Anything requiring stakeholder taste calls mid-run
# Goal Add Playwright e2e tests for every public route listed in AGENTS.md. # Constraints - Use existing `tests/e2e/` patterns (copy auth.setup.ts). - Do not add new npm dependencies. - Run `pnpm exec playwright test` after each route; fix failures before continuing. - Stop and report if more than 3 routes fail for the same root cause. # Done when All routes have a spec file and the full suite is green.
Long runs burn tokens. Set a budget ceiling in your head (or in org policy) and use /compact or fresh sessions between unrelated goals. The model picker guide has GPT-5.3 Codex on the agentic tier for a reason.
Security review workflow.
Codex CLI has no dedicated security slash command like GitHub Copilot CLI's /security-review (see our Copilot CLI security review coverage). Instead, run a read-only audit via prompt plus the Security Review skill and the checklist in the securing AI-generated code guide.
| Check | What Codex should find | Human still verifies |
|---|---|---|
| Secret exposure | Hardcoded keys, .env in diffs, logged tokens |
Git history, CI secret scanning |
| XSS / injection | dangerouslySetInnerHTML, unsanitized user HTML |
CSP headers, production payloads |
| Dependency risk | New packages without lockfile justification | Registry provenance, Socket/Snyk gates |
| Auth boundaries | Missing server-side checks on "protected" routes | Session fixation, OAuth flows |
Read-only security audit of the diff against `main`. Use the review-security skill checklist. Report: 1. Critical (must fix before merge) 2. High (fix this sprint) 3. Medium (track) Do not edit files. Cite file:line for every finding. If you need to run commands, use read-only git and grep only.
MCP servers.
Codex CLI loads MCP servers from project or user config. Same protocol, same servers as Claude Code and Cursor.
High-value pair for web teams:
- Playwright MCP — verify UI changes during autonomous test generation.
- GitHub MCP — read PR context, post review comments (with token scoped to repo).
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
}
},
"model": "gpt-5.3-codex"
}Codex CLI vs the alternatives.
Honest comparison for web dev workflows. See also the OpenCode comparison table and the dedicated Claude Code guide.
| Dimension | Codex CLI | Claude Code | Cursor | OpenCode |
|---|---|---|---|---|
| Best at | Long batch jobs, test gen | Interactive terminal + subagents | Editor-integrated pair programming | Provider freedom, local models |
| Model lock-in | OpenAI | Anthropic | Bundle | Any |
| Goal / autonomous mode | Excellent | Excellent | Excellent (Cloud agents) | Solid |
| Security review | Prompt + skills | Via skills + permissions | Via skills + modes | Via agent config |
| MCP | Yes | Yes | Yes | Yes |
Live: task router.
Five questions about the task in front of you. Get a recommendation for Codex CLI, Claude Code, or Cursor. Runs entirely in your browser.
Common pitfalls.
"Make the app better" is not a goal. Codex will interpret "better" as "rewrite half the repo." Specify files, constraints, and how you verify success.
A 200-file autonomous run can look green in CI and still ship subtle auth bugs. Review the diff summary and spot-check high-risk paths (auth, payments, uploads).
Layout and visual polish need a browser and your eyes. Use Cursor or Claude Code interactively, or pair Codex with Playwright MCP screenshots you review manually.