Dr. Zero for Codex — Autonomous Repository Improvement

Status: Experimental Alpha (Internal Use Only)

Dr. Zero for Codex brings the autonomous repository improvement workflow to the Codex CLI as a native skill bundle. It shares the same two-phase architecture (GRPO/HRPO dual-scoring from arXiv:2601.07055) as the Claude Code plugin but adapts it for the Codex agent framework with TOML-based custom agents and skill-template entry points.

Dr. Zero Codex Skills

The bundle includes 16 workflow skills, each a user-facing entry point that dispatches to Codex custom agents:

Skill	Description	Example
drzero-help	Show setup steps, modes, and examples.	`@drzero help`
drzero	Default autonomous improvement loop.	`@drzero /drzero`
drzero-autonomous	Explicit autonomous propose/solve loop.	`@drzero drzero-autonomous "Improve test coverage"`
drzero-swarm	Coordinated multi-agent build work.	`@drzero drzero-swarm "Build a Pokemon API in FastAPI"`
drzero-morty	Small direct tasks with minimal ceremony.	`@drzero drzero-morty "Update the README"`
drzero-pickle	Smallest viable CI or fix change.	`@drzero drzero-pickle "Make CI pass with the smallest diff"`
drzero-council	Architecture or design debate before implementation.	`@drzero drzero-council "Should this be a plugin skill or script?"`
drzero-cronenberg	Parallel implementation variants for comparison.	`@drzero drzero-cronenberg "Try three caching approaches"`
drzero-portal-gun	Cross-repo coordination.	`@drzero drzero-portal-gun "Update auth across these services"`
drzero-citadel	Governed release-sensitive work with quality gates.	`@drzero drzero-citadel "Prepare payment changes for release"`
drzero-unity	Peer-to-peer parallel cleanup.	`@drzero drzero-unity "Fix lint errors across the repo"`
drzero-analysis	Read-only architecture, quality, or risk analysis.	`@drzero drzero-analysis "Review test coverage gaps"`
drzero-execution	Execute a specific WorkItem through domain routing.	`@drzero drzero-execution "Implement WorkItem DZ-12"`
drzero-status	Show current Dr. Zero session status.	`@drzero drzero-status`
drzero-config	Initialize, show, validate, and edit shared YAML config.	`@drzero drzero-config validate`
drzero-ping	Health check plugin config, scripts, and runtime.	`@drzero drzero-ping`

Agent Architecture

Dr. Zero for Codex uses 19 custom agents defined as TOML files in codex/agents/:

Agent	Role
dr0-architecture	Architecture review and design decisions
dr0-backend	Backend service implementation
dr0-compliance	Regulatory and policy compliance
dr0-database	Database schema and query optimization
dr0-devops	CI/CD pipelines and tooling
dr0-documentation	Documentation generation and maintenance
dr0-frontend	Frontend implementation and UX
dr0-gitops	Git workflow and branch management
dr0-implementation	General-purpose implementation
dr0-infrastructure	Cloud infrastructure and IaC
dr0-monitoring	Observability, alerting, and dashboards
dr0-networking	Network configuration and connectivity
dr0-orchestrator	Coordinates swarm execution and parallel dispatch
dr0-performance	Performance profiling and optimization
dr0-proposer	Generates WorkItems in Phase 1 (scored by HRPO)
dr0-secrets	Secrets management and rotation
dr0-security	Three-headed review: Security, Quality, Savage
dr0-solver	Attempts WorkItems in Phase 2 (scored by GRPO)
dr0-testing	Test strategy, generation, and coverage

Each agent is a Codex custom agent (codex/agents/dr0-{domain}.toml) dispatched by the orchestrator or solver based on WorkItem domain routing.

Two-Phase Architecture

Phase 1: Dual-Scoring Curriculum Learning (arXiv:2601.07055)

Proposer analyzes CI failures, lint errors, test gaps to generate WorkItems
Solver attempts each WorkItem, producing code patches and running acceptance tests
HRPO scores the proposer: format compliance + difficulty calibration
GRPO scores the solver: binary acceptance-test success
Difficulty auto-adjusts to target a 50% success rate

Phase 2: Agent Swarm Execution

Orchestrator coordinates domain specialist agents via Codex subagent dispatch
Domain agents execute domain-specific tasks (up to 6 concurrent)
Security reviews changes through three lenses: security, quality, savage
Output: Production-ready changes with tests passing

Differences from Claude Code Plugin

Feature	Claude Code	Codex
Asset type	Plugin (`.claude-plugin/`)	Skill bundle (`codex/skills-templates/`)
Agent format	Markdown (`.md`)	TOML (`.toml`)
Entry points	Commands (`/drzero:`)	Commands (`@drzero /drzero`)
Agent dispatch	Task tool with SDK precedence	Codex subagent system
Configuration	`drzero.yml` in project root	`drzero.yml` in project root
Installation	Claude Code Plugin Install	Codex Marketplace Plugin Install
Parallelism	`agents.max_threads` (default: 6)	`agents.max_threads` (default: 6)

Configuration

Settings stored in drzero.yml in the project root:

# DrZero + Agent Swarm Configuration
version: "1.0"

dr_zero:
  max_iterations: 3
  tasks_per_iteration: 3
  target_success_rate: 0.5

  proposer:
    type: heuristic
    persona: healthcare

  solver:
    backend: codex
    temperature: 0.7

  production_features:
    enable_checkpoints: true
    enable_rbac: true
    enable_audit: true
    enable_telemetry: true
    enable_cat_defense: true
    enable_rate_limiting: true

  terminal:
    all_tests_pass: true
    lint_clean: true
    coverage_threshold: 80
    max_diff_lines: 500

agent_swarm:
  domains:
    - orchestration
    - architecture
    - implementation
    - testing
    - security
    - documentation
    - monitoring
    - secrets
    - gitops
    - devops
    - infrastructure
    - database
    - frontend
    - backend
    - compliance
    - performance
    - networking

  orchestrator:
    agent: orchestration
    quality_reviewer: security
    definition_of_done:
      - tests_pass
      - lint_clean
      - docs_updated
      - security_cleared
      - pr_ready

  turns:
    default: 1
    max: 5

  prompts:
    next_step: |
      Based on the current state of the codebase and the work completed in the previous turn:
      1. Review the changes made
      2. Identify remaining work
      3. Propose the next actionable step

      Focus on incremental progress toward the definition of done.

    quality_check: |
      Review the implementation for:
      - Test coverage and passing status
      - Code quality and lint cleanliness
      - Security vulnerabilities
      - Documentation completeness

      Provide specific, actionable feedback.

    done_criteria: |
      Evaluate if the implementation is PR ready to merge:
      - All tests pass
      - Lint is clean
      - Security cleared
      - Documentation updated
      - No obvious regressions

github:
  enabled: false
  issue_labels:
    - dr-zero
    - autonomous
  auto_create_pr: false

observability:
  audit_log_dir: logs/
  metrics_dir: logs/metrics/
  log_level: INFO
  export_formats:
    - prometheus
    - json
    - csv

Security Features

Scope boundary validation: Prevents path traversal and CAT attacks
Command whitelisting: Validates acceptance_test commands against safe tool list
Anti-hallucination enforcement: All GRPO/HRPO scores computed by dr0 Python package only

Documentation

Paper Alignment: arXiv:2601.07055
Claude Code Plugin: See the Claude Code Dr. Zero plugin page

License

Internal Use Only — Optum Tech Compute

drzero

Plugin Structure

Installation

Install from a release zip

Documentation

Dr. Zero for Codex — Autonomous Repository Improvement

Dr. Zero Codex Skills

Agent Architecture

Two-Phase Architecture

Phase 1: Dual-Scoring Curriculum Learning (arXiv:2601.07055)

Phase 2: Agent Swarm Execution

Differences from Claude Code Plugin

Configuration

Security Features

Documentation

License