codex-universal¶

[(#top)

The _codex_ image now centers on reasoning agents. Use this document as the top-level map for roadmap milestones, architecture references, and the bespoke-model hosting workflow that underpins every guided rollout.

Orientation¶

Goal	Where to start
Understand the reasoning roadmap	Reasoning milestones
Skim architecture dependencies	Architecture diagrams
Launch a bespoke model	Hosting bespoke reasoning models
Train/evaluate/deploy	Guided pipelines

Repo Map (Reasoning-Focused)¶

You can now surface a reasoning-focused repository map:

codex repo-map --reasoning

This highlights reasoning overlays, evaluation presets, and trace-capture knobs.

Reasoning roadmap milestones¶

Milestone	Focus	Target signal
M0: Observability baseline	Boot instrumented inference traces across offline smoke runs.	Trace coverage ≥95% on curated reasoning templates.
M1: Curriculum-first training	Establish first-principles curricula and replay strategies.	`reasoning.win_rate` ≥0.55 on `benchmarks/cot-lite`.
M2: Model hosting hardening	Promote bespoke models into hermetic serving pods.	Shadow-hosted latency p95 ≤ 700 ms with parity alerts.
M3: Flywheel automation	Continuous evaluation + redeploy gates orchestrated via Codex.	Weekly redeploy cadence with zero manual overrides.

Track milestone burndown using the reasoning_status table exported by:

codex repo-map --reasoning

Slice specific categories (for example, rollout rings and curricula) with:

codex repo-map --reasoning --include rollout_ring --include curriculum

For backlog triage, anchor discussions in docs/guides/reasoning_overview.md.

Control surface knobs and promotion checklist¶

codex repo-map --reasoning surfaces a shared set of knobs defined in configs/training/reasoning/baseline.yaml:

trace_mode
curriculum.preset
evaluation.preset
deployment.preset
metadata.rollout_ring

Every smoke run of the training loop writes machine-readable artifacts under runs/train_loop/:

run_metadata.json — captures metadata.*, the selected presets, and the rollout ring.
reasoning.json — snapshot of the reasoning harness configuration plus runtime summary.
evaluation.json — evaluation preset enforced for the run.

Promotion toward main requires:

The evaluation preset to pass (or carry explicit sign-off in status reports).
metadata.rollout_ring declared in the training config and matching the target pod ring.
codex deploy --dry-run to succeed, which enforces the ring match between training output and configs/deploy/reasoning_pod.yaml.

Reviewers preparing a 0D_base_ → main merge should walk the dedicated checklist in docs/ops/promotion_checklist.md. It also requires attaching the outputs of:

codex_ml.cli.codex_cli status-report
codex_ml.cli.codex_cli deploy --dry-run
and linking the latest docs/status_updates/survey-<ring>-and-<PR>-<DATESTAMP>.md

Architecture at a glance¶

The canonical topology is captured in docs/diagrams/architecture.svg. Pair it with the Mermaid source (architecture.mmd) when proposing changes so reviewers can diff rendered assets and source together.

Key flows:

Authoring — Hydra configuration layers resolve reasoning templates from configs/training/reasoning/* before model instantiation.
Training — Training is orchestrated by:
src/codex_ml/training/unified_training.py (deterministic seeding, checkpoint / resume plumbing, continual replay strategy hooks),
src/codex_ml/train_loop.py (per-run executor that injects the reasoning harness, logs traces, and rotates checkpoints). These modules together are "the trainer". They replace older references to a standalone codex_ml.trainer.ReasoningTrainer.
Deployment — Bespoke models are packaged with manifest digests and signed hooks for downstream registries.

When modifying the topology, update both the diagram and docs/guides/serving_reproducibility.md.

Hosting bespoke reasoning models¶

Bootstrap the project

uv sync --extra reasoning --extra cli --frozen
source .venv/bin/activate
codex repo-map --reasoning

Select a template using codex reasoning-templates list (see codex_cli). Templates live under configs/training/reasoning/ and ship default datasets plus evaluator bindings.
Materialise runtime overlays
```
codex-train +reasoning=baseline curriculum.phase_schedule=starter
```
This composes reasoning overrides on top of the legacy defaults so classical experiments keep working.

Register the artifact with deterministic metadata before handoff:

codex register --bundle artifacts/runs/reasoning-baseline \
  --expect manifest.sha256 --tag reasoning/m0/bespoke

For service integrations, adopt the PodSpec defined in docs/deployment/reasoning_pod.md. This PodSpec is a dry-run template, not production hosting. Its job is to make resource shape, telemetry, curriculum phase, trace capture mode, and rollout ring explicit before anything moves toward main. A dry-run configuration is provided at configs/deploy/reasoning_pod.yaml. Link the generated manifest to your rollout plan.

Guided reasoning pipelines¶

Follow the deep dives in the new guides:

docs/guides/reasoning_overview.md — systems overview and milestone guardrails.
docs/guides/first_principles_curricula.md — curriculum design and evaluation cadences.

Training quickstart¶

codex-train +reasoning=baseline \
  curriculum.phase_schedule=starter \
  training.max_steps=500 \
  logging.reasoning_trace=true \
  training.output_dir=artifacts/runs/reasoning-starter

The +reasoning=baseline defaults hook into configs/training/reasoning/baseline.yaml and emit trace artefacts that downstream analysis notebooks can load. Curricula definitions are stored as YAML fragments so you can diff changes between cohorts.

Evaluation handoff¶

codex evaluate --config configs/evaluation/reasoning.yaml \
  --log-metrics .codex/metrics/reasoning.ndjson \
  --run-id reasoning-milestone-m1

Every evaluation appends to the NDJSON ledger with per-phase metrics. Use codex metrics summarize for quick trend checks when preparing milestone readouts.

Deployment checks¶

codex deploy --config configs/deploy/reasoning_pod.yaml \
  --dry-run

# Optional: if your train loop emits run metadata to a non-default path:
# codex deploy --config configs/deploy/reasoning_pod.yaml \
#   --run-metadata-dir runs/train_loop \
#   --dry-run

Always leave --dry-run in place. The manifest is a review artifact, not a production action, and the embedded rollout_ring is an intent badge rather than permission to ship. Dry runs confirm manifest parity, bundler signatures, and runtime allowances required by bespoke hosts. Redeployments should always be paired with codex reasoning-templates explain <name> to document why a template was chosen.

Offline validation helpers¶

HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 CODEX_MLFLOW_ENABLE=0 WANDB_MODE=offline \
  nox -s tests_offline

This run exports standard offline toggles and keeps artefacts under .codex/ for reproducibility (metrics, checkpoints, reasoning traces). Combine it with codex_cli.app checkpoint-smoke to validate serialization paths without GPUs.

Next steps¶

Align sprint planning with the milestones.
Review docs/guides/reasoning_overview.md before opening architecture PRs.
Wire bespoke hosting expectations into status reports using the templates in docs/templates.