CLI reference (offline-first)¶
Training¶
- Primary:
codex-traindelegates to Hydra when installed and falls back to a minimal local configuration when Hydra is absent. The fallback still sets up reproducible seeds, generates a local dataset if missing, and logs metrics toartifacts/experiments/. - Hydra entry:
python -m codex_ml.cli.hydra_entryuses theconf/directory by default. Override values with dotlist syntax, e.g.model.enable_lora=true data.dataset_path=data/sample.jsonl.
Data loading¶
Use codex_ml.codex_data.load_dataset to read JSONL or text datasets,
shuffle deterministically, and persist cached splits under
artifacts/cache/<dataset>/splits-<hash>.json.
Plugin and loader entry points¶
Third-party packages can expose dataset loaders, tokenizers, and reward models without code changes by declaring entry points:
[project.entry-points."codex_ml.datasets"]
my_jsonl = "my_package.loaders:jsonl_loader"
[project.entry-points."codex_ml.tokenizers"]
custom_tokenizer = "my_package.tokenizers:build_tokenizer"
[project.entry-points."codex_ml.reward_models"]
simple_reward = "my_package.rewards:StaticRewardModel"
After installation, the registry functions (codex_ml.data.registry.get_dataset,
codex_ml.plugins.reward_models.get) discover these implementations
automatically during runtime.
Experiment analysis¶
Run python scripts/analyze_experiments.py to generate
artifacts/experiment_summary.md and artifacts/experiment_summary.json.
python -m codex_ml.cli.detectors run consumes the JSON summary as part of the
scorecard.