codex-universal¶
[(#top)
The _codex_ image now centers on reasoning agents. Use this document as the top-level map for roadmap milestones,
architecture references, and the bespoke-model hosting workflow that underpins every guided rollout.
Orientation¶
| Goal | Where to start |
|---|---|
| Understand the reasoning roadmap | Reasoning milestones |
| Skim architecture dependencies | Architecture diagrams |
| Launch a bespoke model | Hosting bespoke reasoning models |
| Train/evaluate/deploy | Guided pipelines |
Repo Map (Reasoning-Focused)¶
You can now surface a reasoning-focused repository map:
This highlights reasoning overlays, evaluation presets, and trace-capture knobs.
Reasoning roadmap milestones¶
| Milestone | Focus | Target signal |
|---|---|---|
| M0: Observability baseline | Boot instrumented inference traces across offline smoke runs. | Trace coverage ≥95% on curated reasoning templates. |
| M1: Curriculum-first training | Establish first-principles curricula and replay strategies. | reasoning.win_rate ≥0.55 on benchmarks/cot-lite. |
| M2: Model hosting hardening | Promote bespoke models into hermetic serving pods. | Shadow-hosted latency p95 ≤ 700 ms with parity alerts. |
| M3: Flywheel automation | Continuous evaluation + redeploy gates orchestrated via Codex. | Weekly redeploy cadence with zero manual overrides. |
Track milestone burndown using the reasoning_status table exported by:
Slice specific categories (for example, rollout rings and curricula) with:
For backlog triage, anchor discussions in docs/guides/reasoning_overview.md.
Control surface knobs and promotion checklist¶
codex repo-map --reasoning surfaces a shared set of knobs defined in
configs/training/reasoning/baseline.yaml:
trace_modecurriculum.presetevaluation.presetdeployment.presetmetadata.rollout_ring
Every smoke run of the training loop writes machine-readable artifacts under runs/train_loop/:
run_metadata.json— capturesmetadata.*, the selected presets, and the rollout ring.reasoning.json— snapshot of the reasoning harness configuration plus runtime summary.evaluation.json— evaluation preset enforced for the run.
Promotion toward main requires:
- The evaluation preset to pass (or carry explicit sign-off in status reports).
metadata.rollout_ringdeclared in the training config and matching the target pod ring.codex deploy --dry-runto succeed, which enforces the ring match between training output andconfigs/deploy/reasoning_pod.yaml.
Reviewers preparing a 0D_base_ → main merge should walk the dedicated checklist in docs/ops/promotion_checklist.md.
It also requires attaching the outputs of:
codex_ml.cli.codex_cli status-reportcodex_ml.cli.codex_cli deploy --dry-run- and linking the latest
docs/status_updates/survey-<ring>-and-<PR>-<DATESTAMP>.md
Architecture at a glance¶
The canonical topology is captured in docs/diagrams/architecture.svg. Pair it with the
Mermaid source (architecture.mmd) when proposing changes so reviewers can diff rendered assets and source together.
Key flows:
- Authoring — Hydra configuration layers resolve reasoning templates from
configs/training/reasoning/*before model instantiation. - Training — Training is orchestrated by:
src/codex_ml/training/unified_training.py(deterministic seeding, checkpoint / resume plumbing, continual replay strategy hooks),src/codex_ml/train_loop.py(per-run executor that injects the reasoning harness, logs traces, and rotates checkpoints). These modules together are "the trainer". They replace older references to a standalonecodex_ml.trainer.ReasoningTrainer.- Deployment — Bespoke models are packaged with manifest digests and signed hooks for downstream registries.
When modifying the topology, update both the diagram and docs/guides/serving_reproducibility.md.
Hosting bespoke reasoning models¶
- Bootstrap the project
- Select a template using
codex reasoning-templates list(seecodex_cli). Templates live underconfigs/training/reasoning/and ship default datasets plus evaluator bindings. - Materialise runtime overlays This composes reasoning overrides on top of the legacy defaults so classical experiments keep working.
- Register the artifact with deterministic metadata before handoff:
For service integrations, adopt the PodSpec defined in
docs/deployment/reasoning_pod.md.
This PodSpec is a dry-run template, not production hosting.
Its job is to make resource shape, telemetry, curriculum phase,
trace capture mode, and rollout ring explicit before anything moves
toward main. A dry-run configuration is provided at
configs/deploy/reasoning_pod.yaml.
Link the generated manifest to your rollout plan.
Guided reasoning pipelines¶
Follow the deep dives in the new guides:
docs/guides/reasoning_overview.md— systems overview and milestone guardrails.docs/guides/first_principles_curricula.md— curriculum design and evaluation cadences.
Training quickstart¶
codex-train +reasoning=baseline \
curriculum.phase_schedule=starter \
training.max_steps=500 \
logging.reasoning_trace=true \
training.output_dir=artifacts/runs/reasoning-starter
The +reasoning=baseline defaults hook into configs/training/reasoning/baseline.yaml and emit trace artefacts that downstream
analysis notebooks can load. Curricula definitions are stored as YAML fragments so you can diff changes between cohorts.
Evaluation handoff¶
codex evaluate --config configs/evaluation/reasoning.yaml \
--log-metrics .codex/metrics/reasoning.ndjson \
--run-id reasoning-milestone-m1
Every evaluation appends to the NDJSON ledger with per-phase metrics. Use codex metrics summarize for quick trend
checks when preparing milestone readouts.
Deployment checks¶
codex deploy --config configs/deploy/reasoning_pod.yaml \
--dry-run
# Optional: if your train loop emits run metadata to a non-default path:
# codex deploy --config configs/deploy/reasoning_pod.yaml \
# --run-metadata-dir runs/train_loop \
# --dry-run
--dry-run in place. The manifest is a review artifact, not a production action, and the embedded
rollout_ring is an intent badge rather than permission to ship. Dry runs confirm manifest parity, bundler signatures,
and runtime allowances required by bespoke hosts. Redeployments should always be paired with
codex reasoning-templates explain <name> to document why a template was chosen.
Offline validation helpers¶
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 CODEX_MLFLOW_ENABLE=0 WANDB_MODE=offline \
nox -s tests_offline
This run exports standard offline toggles and keeps artefacts under .codex/ for reproducibility (metrics, checkpoints,
reasoning traces). Combine it with codex_cli.app checkpoint-smoke to validate serialization paths without GPUs.
Next steps¶
- Align sprint planning with the milestones.
- Review
docs/guides/reasoning_overview.mdbefore opening architecture PRs. - Wire bespoke hosting expectations into status reports using the templates in
docs/templates.