End-to-End CPU Training¶
Train, evaluate, and inspect a full Codex symbolic pipeline on CPU — no GPU or cloud required.
Prerequisites¶
1 — Prepare data¶
2 — Run the pipeline¶
python deploy/deploy_codex_pipeline.py \
--corpus data/corpus.jsonl \
--demos data/demos.jsonl \
--prefs data/prefs.jsonl \
--output-dir runs/cpu_test
This executes three stages in sequence:
| Stage | Script section | Output |
|---|---|---|
| Pretraining | pretrain() |
runs/cpu_test/M0/ |
| SFT | sft() |
runs/cpu_test/M1/ |
| RLHF | rlhf() |
runs/cpu_test/M2/ |
3 — Evaluate¶
Expected: all assertions pass, reproducible outputs (seed=0).
4 — Inspect checkpoints¶
ls runs/cpu_test/
# M0/ M1/ M2/ metrics.json
python -c "import json; print(json.load(open('runs/cpu_test/metrics.json')))"
5 — Containerised run¶
docker compose run --rm trainer \
python deploy/deploy_codex_pipeline.py \
--corpus /data/corpus.jsonl \
--output-dir /runs/exp1
Artifacts are written to the mounted volume at /runs/.