Codex Changelog¶
2025-10-14 β Evaluation helper & tokenizer adapter refresh¶
WHY¶
- Provide the reusable
evaluate_dataloaderhelper promised in the audit plan. - Ensure GPU metrics logging degrades gracefully on CPU-only environments.
- Offer a lightweight Hugging Face
tokenizersadapter for offline JSON artefacts. - Surface LoRA defaults in Hydra config while keeping changelog traceability.
Changes¶
src/codex_ml/eval/evaluator.py: add_MetricAggregatorutilities and the publicevaluate_dataloaderhelper with optional metric hooks.src/codex_ml/callbacks/system_metrics.py: import-guard NVML and emit zeroed GPU metrics when unavailable.src/codex_ml/interfaces/tokenizer.py: registerHFTokenizerAdapteraroundtokenizer.jsonfiles and expose via package exports.configs/base/default.yaml: include explicit nestedtraining.loradefaults aligned with audit guidance.- Tests:
tests/eval/test_evaluate_dataloader_helper.py: unit tests covering averaging behaviour and torch guardrails.- Metadata: export
evaluate_dataloaderthrough__all__and document the change in this changelog.
Risk¶
- Minimal. The helper is additive and gated on
torchpresence; NVML fallback only affects metrics reporting.
Rollback¶
- Remove the new helper/test and delete the tokenizer adapter registration; restore the previous changelog entry order if needed.
Tests/Docs¶
- Added focused pytest coverage for the evaluation helper. No documentation build required.
2025-10-06 β Unified training + tracking guards + data determinism tests¶
WHY¶
- Reduce training-loop drift and standardize resume/grad-clipping hooks.
- Enforce offline-first tracking to avoid accidental remote egress.
- Improve data determinism confidence with simple, fast property tests.
Changes¶
src/codex_ml/training/unified_training.py: new faΓ§aderun_unified_trainingwith legacy shims emittingDeprecationWarning.src/codex_ml/checkpointing/checkpoint_core.py: addload_checkpoint(...)to matchsave_checkpoint(...).src/codex_ml/tracking/guards.py: adddecide_mlflow_tracking_uri(...)returning a structured decision and normalizing URIs.- Tests:
tests/tracking/test_tracking_guards.py: parameterized matrix for MLflow/W&B offline gates and allow-remote override.tests/data/test_dataset_determinism.py: checksum stability, seed-diff, shard coverage, UTF-8 fallback.tests/training/test_unified_training_warnings.py:DeprecationWarningassertions on legacy shims.- Docs:
docs/unified_training.md,docs/SEARCH_NOTES.md.
Risk¶
- Behavior change for users relying on remote MLflow endpoints when offline signals are set.
Mitigated by
CODEX_ALLOW_REMOTE_TRACKING=1escape hatch.
Rollback¶
- Remove imports/usages of
codex_ml.tracking.guardsand legacy shims will remain no-ops. - Revert the added modules; legacy training paths unaffected.
Tests/Docs¶
- See new
tests/*anddocs/*files. No GitHub Actions were added or modified.
2025-09-17 β Checkpoint resume & dataset manifest updates¶
- Removed duplicate
src/codex/training.py01after verifying no live references remained. - Implemented
CheckpointManager.load_latest, updated the training CLI to auto-discover the latest checkpoint for--resume-from, and added regression tests. -
Updated the dataset registry to apply seeded shuffling, emit manifest files, and expanded data loader tests for deterministic coverage.
-
Pinned
pre-commit==4.0.1,nox==2024.5.1, andpytest-cov==7.0.0across dev requirements and lockfiles to ensure offline availability. - Extended
codex_setup.py,scripts/codex_local_gates.sh, andcodex_workflow.pyto record gate CLI availability in.codex/session_logs.db. - Hardened
configs/development/noxfile.pycoverage sessions to emit hashed JSON reports underartifacts/coverage/and log the artifact metadata.
2025-08-28 β Codex offline runner¶
- Added tools/codex_run.py orchestrator with audit fallback and local gates.
- Added tools/codex_run.sh wrapper.
2025-08-31 β CLI testing improvements¶
- Restricted Nox test session to Python 3.12 and installed missing CLI/API dependencies.
- Enabled
importlibimport mode viapytest.inito prevent module name collisions during collection.
2025-11-25 β Static code analysis step¶
- Added
static_code_analysisstage toanalysis/audit_pipeline.pyand integrated it withscripts/codex_local_gates.sh. - Logs syntax-check metrics for Python sources.
- Introduced a unit test verifying metric emission.
2025-11-24 β Offline upgrade script¶
- Added
codex_ast_upgrade.pyto automate tiered parsing setup and offline auditing.
2025-11-23 β Tiered parsing and offline audit pipeline¶
- Added analysis modules with tiered parsing fallbacks and search providers.
- Added CLI
codex_ml.cli.audit_pipelineand tests for AST extraction. - Documented "Fallback Modes & Feature Flags" in README.
- Deferred advanced codemods and online external search; kept AST-only analyzers as fallback.
2025-05-19 β Validation metrics & splits¶
- Added:
--val-split/--test-splitflags and per-epoch validation logging tometrics.json. - Deferred: stratified splits, GPU-heavy metrics, and online trackers.
- Risks: small datasets may skip evaluation when insufficient tokens.
2025-11-09 β Offline experiment tracking¶
- Added unified
codex_ml.monitoring.codex_loggingwith optional TensorBoard, W&B, and MLflow sinks. - Patched
engine_hf_trainer.pyandfunctional_training.pyto sample CPU/GPU metrics and log per-step scalars. - Added offline test coverage for logging bootstrap and docs for monitoring and experiment tracking.
- Deferred: online W&B/remote MLflow servers, full Trainer callbacks, and extended NVML telemetry.
2025-08-26 β LoRA and deterministic splits¶
- Implemented optional LoRA adapter with graceful fallback when
peftis missing. - Added grad accumulation and mixed precision helpers to
functional_training.py. - Introduced deterministic data splitting utility.
- Generated
requirements/lock.txtand local test gate script. - Sanitized external links in README for offline use.
CI policy docs β 2025-08-26T20:17:49Z¶
- Created /workspace/codex/docs/ci.md (web search allowed; remote CI disallowed)
Disable remote CI β 2025-08-26T20:17:49Z¶
- Patched 5 workflow file(s) to
workflow_dispatchand guarded jobs. - Total jobs guarded: 7
2025-08-26 β Ξ PR Checklist Applied¶
New¶
- standalone
analysispackage with audit pipeline for offline checks.
Modified¶
- README includes "Offline CI & Local Parity" policy block.
scripts/codex_local_gates.shenforces coverage during local tests.- Workflows guarded with
_codex_guardjob and manual triggers.
Removed¶
- none
Deferred / Pruned¶
- existing analysis utilities under
src/codex_ml/analysisretained without duplication.
2025-08-28 β Portable workflow tooling¶
New¶
tools/audit_runner.pyprovides dual-path audit execution with optional external CLI.tools/run_precommit.pyadds verbose pre-commit runner with timeout and cache cleanup.tools/run_tests.pywraps pytest with optional coverage fallback.tools/codex_workflow.pyandtools/codex_workflow.shorchestrate audit, hooks, and tests locally.
2025-08-29 β Misc bug fixes and utilities¶
- Added shebang and docs to
tools/label_policy_lint.py. - Ensured
git_tag.current_commitdecodes byte output. - Added setter for SentencePieceAdapter
model_prefix. - Guarded MLflow run initialization by checking
MLFLOW_TRACKING_URI. - Corrected EarlyStopping patience comparison.
- Introduced importlib-based CLI viewer module.
- Exposed RNG state helpers and best-k retention tests.
- Implemented placeholder keyword risk scoring.
- Added seed-controlled shuffling to data loaders.
- Warned on duplicate registry registrations.
2025-08-29 β Utilities and test cleanup¶
- Added standalone
utils.training_callbackswith EarlyStopping. - WHY: share training callback outside
codex_mlpackage. - RISK: low; new module.
- ROLLBACK: revert
src/utils/training_callbacks.py. - Improved git tag decoding to try locale and latin-1 fallbacks.
- WHY: handle non-UTF-8 git outputs gracefully.
- RISK: minimal; affects only metadata helpers.
- ROLLBACK: revert changes in
src/codex_ml/tracking/git_tag.py. - Fixed missing imports in
label_policy_linttests. - WHY: ensure lint helper tests run.
- RISK: none; tests only.
- ROLLBACK: revert
tests/test_label_policy_lint.py.
Codex Changelog¶
2025-08-30 β Tokenizer, MLflow, and ingestion utilities¶
WHY¶
- Introduce a canonical
HFTokenizeradapter with batching and decode/pad APIs to unify tokenization usage across the codebase. - Add/standardize ingestion helpers including encoding detection (
read_text(..., encoding="auto")) and deterministic shuffling for reproducible data splits. - Provide MLflow tracking utilities with configurable system-metrics toggle and safe no-op behavior when tracking is disabled or the dependency is missing.
- Consolidate ingestion utilities and tooling fixes for consistent execution and packaging.
RISK¶
- Low: modules are thin wrappers with safe fallbacks and preserve prior interfaces where practical. Optional dependencies (transformers, peft, mlflow, charset-normalizer, pytest-cov, etc.) degrade gracefully.
ROLLBACK¶
- Remove newly added modules (tokenizer adapter, ingestion helpers, tracking helpers) and revert any changed imports to restore previous behavior.
TEST¶
- Recommended checks:
pre-commit run --files src/codex_ml/interfaces/tokenizer.py src/codex_ml/tracking/mlflow_utils.py src/ingestion/encoding_detect.py src/ingestion/io_text.py src/ingestion/utils.py tests/interfaces/test_tokenizer_hf.py tests/tracking/test_mlflow_utils.py tests/ingestion/test_io_text.pypytest(note: during initial integration this run reported a number of collection/compatibility issues β expect follow-up fixes; historically some branches reported ~13 collection errors).
2025-08-30 β Tokenizer unification and ingestion consolidation (alternate notes)¶
WHY¶
- Consolidate tokenization into a single HFTokenizer adapter with batch helpers.
- Merge ingestion utilities with encoding detection and deterministic shuffling.
- Restore MLflow helpers with safe no-op fallbacks and environment-variable-aware handling.
- Add package markers and tooling fixes to improve consistent execution across environments.
RISK¶
- Low: changes preserve existing interfaces and degrade gracefully when optional dependencies are missing.
ROLLBACK¶
- Revert this commit to restore the previous module layout.
2025-08-29 β Restore no-op MLflow context manager¶
WHY¶
- Maintain previous behaviour where disabled MLflow tracking yields a no-op context manager so
with start_run(cfg)remains safe when tracking is off or mlflow is unavailable.
RISK¶
- Low: ensures backward compatibility for callers that expect a falsy/no-op context manager rather than an exception.
ROLLBACK¶
- Revert this commit to return
Noneinstead (if required by alternate compatibility concerns).
2025-08-29 β Local orchestration scripts¶
WHY¶
- Add local tooling to run the sequential Codex workflow:
tools/codex_exec.pyandtools/codex_exec.shto run the end-to-end local workflow (preparation, scan, suggest patches, capture errors, finalize).- Local task runner utilities for running pytest-selected tasks and capturing failure outputs for later inspection.
RISK¶
- Low: scripts are optional and operate only on the local repository.
ROLLBACK¶
- Remove the newly added scripts.
REPRO¶
bash tools/codex_exec.shto generate local artifacts and reports.
2025-08-29 β Tokenizer & training wiring¶
WHY¶
- Thread gradient-accumulation and bf16/fp16 flags into the HF Trainer wrapper.
- Provide optional LoRA integration points and deterministic ingestion helpers for training reproducibility.
- Add utilities to capture failing task output into commit-comment artifacts to help maintainers triage issues.
RISK¶
- Low: trainer wiring is additive and optional; defaults preserve prior behaviour if optional dependencies are absent.
ROLLBACK¶
- Revert this commit to remove tokenizer, training, and tracking utilities.
2025-08-29 β Tokenizer and tracking utilities (detailed)¶
WHY¶
- Introduce
HFTokenizerinterface with padding/truncation controls and batch encode/decode helpers. - Add deterministic ingestion helpers (
seeded_shuffle/deterministic_shuffle), and robustread_textwith optional automatic encoding detection. - Simplify MLflow tracking via optional
start_runhelper with safe no-op behavior and configurable environment flags. - Expose Trainer precision and LoRA-related options in the HF trainer wrapper.
RISK¶
- Low: new utilities are optional and default to existing behaviour when optional dependencies are missing.
ROLLBACK¶
- Revert this commit to restore previous behaviour.
REPRO CHECKLIST¶
- Set
PYTHONHASHSEED=0for deterministic ordering where needed. - Use
seeded_shuffle(ordeterministic_shuffle) for reproducible splits. - Use
read_text(..., encoding="auto")where encoding may vary. pre-commit run --all-files --verbose.pytest --cov=src/codex_ml --cov-report=term.
2025-08-29 β Phase 3 integrations¶
WHY¶
- Guard MLflow run initialization behind
CODEX_ENABLE_MLFLOWand related flags to avoid accidental tracking. - Resolve CLI viewer resources using
importlib.resourcesfor packaging safety. - Provide a lightweight checkpoint manager with RNG persistence and best-K pruning when available.
- Add local task runner utilities (
tools/codex_run_tasks.py) and optional venv bootstrap helpers.
RISK¶
- Low: features are optional and protected by environment flags.
ROLLBACK¶
- Revert this commit to restore previous behavior.
Notes and Next Steps¶
- Multiple entries on 2025-08-29/30 reflect iterative integration steps across tokenizer, ingestion, tracking, and trainer wiring. The changes are intentionally additive and guarded; follow-up PRs should address the remaining test collection errors and tighten compatibility layers (e.g., signature normalization for legacy
read_textimplementations, optional dependency detection, and test environment setup). - When backporting or cherry-picking these changes to other branches, ensure environment flags (e.g.,
CODEX_ENABLE_MLFLOW,CODEX_POST_COMMIT_COMMENT) and optional-dependency handling remain consistent to avoid surprising behavior in CI.
2025-08-28 β Codex Run¶
- Enforced self-hosted-only gates via
make codex-gatesandscripts/codex_local_gates.sh - Added doctor script and workflow executor
- Updated README and docs to recommend self-hosted runners and MLflow tracking
$(date -u +%Y-%m-%d) β Codex Run¶
- Added pad_id and eos_id accessors to HFTokenizerAdapter.
- Surfaced monitoring exceptions in functional_training via stderr logging.
- Introduced --grad-accum support and metric logging in train_loop.
- CheckpointManager now writes system.json for hardware metadata.
- Registered codex-ml-cli entry point in pyproject.
- Updated README with offline CI instructions and codex-ml-cli usage.
- Added tests for tokenizer IDs, grad accumulation, and checkpoint system metadata.
- Added codex_seq_runner and run_codex_sequence utilities.
2025-09-02 β Codex Run¶
- Enabled optional padding/truncation in
HFTokenizerAdapter.encode. - Wired
apply_loraintorun_hf_trainerwith dtype/device placement. - Added checkpoint resume support via
resume_fromparameter.
[Unreleased] - 2025-09-02¶
- Format
src/codex_ml/safety/risk_score.pywith Black. - Correct README typos and complete environment description.
- Pin
peftdependency to ensurenox -s testspasses. - Applied fallback_patch_4.1-4.8 with safe sanitizeβapply fallbacks; preserved intended functionality.
- Normalized line endings/BOM; stripped markdown/email artifacts from patch.
- Conformed to local gates (pre-commit/Black/isort/tests), Codex-only (no GitHub Actions). - Ensure test dependencies (including
langchain) are installed sonox -s testspasses. - Add runbook for offline wheelhouse usage at
docs/runbooks/offline_wheelhouse.md. - Add smoke test proving
nox -s testsdelegates tocoverage. - Add
wheelhousealias inMakefilefor bootstrap script. - Expand
configs/development/noxfile.pywithtests_sysandtests_sspsessions, optionaluv|virtualenvbackend,PIP_CACHE_DIRdefault. - feat(gates): Add black/isort/bandit/detect-secrets/safety hooks; nox
sec_scan; Makesys-tests/ssp-tests. - feat(deps): Introduce
tools/uv_lock_refresh.shto generateuv.lockand compiled requirements. - feat(trainer): Early stopping + scheduler selection wired into
Trainer. - feat(logging): Rotating file handler with sane defaults.
- feat(tokenization): SentencePiece adapter padding/truncation shims +
__all__. -
tests(tokenization): Edge-case test gated by
SPM_TINY_MODEL. -
Introduced general
TokenizerAdapterwith HuggingFace and whitespace implementations; added basic round-trip tests. - Added simple dataset loader supporting text/NDJSON/CSV with caching and safety hooks, plus deterministic split utilities.
Unreleased - 2025-09-07¶
- Updated README references to current configuration structure.
- Integrate safety filter with data loader and add license checker script.
- Generated gaps report for TODOs and stubs.
- Executed pre-commit and nox quality gate sessions.
- Add dataset loader supporting TXT/NDJSON/CSV with caching and safety filtering.
- feat(model): introduce model registry with optional LoRA configuration and tests.
- docs: document tokenizer adapter, Hydra configuration groups, and data loading utilities.
[Unreleased] - 2025-09-02¶
- Format
src/codex_ml/safety/risk_score.pywith Black. - Correct README typos and complete environment description.
- Pin
peftdependency to ensurenox -s testspasses. - Applied fallback_patch_4.1-4.8 with safe sanitizeβapply fallbacks; preserved intended functionality.
- Normalized line endings/BOM; stripped markdown/email artifacts from patch.
- Conformed to local gates (pre-commit/Black/isort/tests), Codex-only (no GitHub Actions). - Ensure test dependencies (including
langchain) are installed sonox -s testspasses. - Add runbook for offline wheelhouse usage at
docs/runbooks/offline_wheelhouse.md. - Add smoke test proving
nox -s testsdelegates tocoverage. - Add
wheelhousealias inMakefilefor bootstrap script. - Expand
configs/development/noxfile.pywithtests_sysandtests_sspsessions, optionaluv|virtualenvbackend,PIP_CACHE_DIRdefault. - feat(gates): Add black/isort/bandit/detect-secrets/safety hooks; nox
sec_scan; Makesys-tests/ssp-tests. - feat(deps): Introduce
tools/uv_lock_refresh.shto generateuv.lockand compiled requirements. - feat(trainer): Early stopping + scheduler selection wired into
Trainer. - feat(logging): Rotating file handler with sane defaults.
- feat(tokenization): SentencePiece adapter padding/truncation shims +
__all__. -
tests(tokenization): Edge-case test gated by
SPM_TINY_MODEL. -
Introduced general
TokenizerAdapterwith HuggingFace and whitespace implementations; added basic round-trip tests. - Added simple dataset loader supporting text/NDJSON/CSV with caching and safety hooks, plus deterministic split utilities.
Unreleased - 2025-09-07¶
- Updated README references to current configuration structure.
- Integrate safety filter with data loader and add license checker script.
- Generated gaps report for TODOs and stubs.
- Executed pre-commit and nox quality gate sessions.
- Add dataset loader supporting TXT/NDJSON/CSV with caching and safety filtering.
- feat(model): introduce model registry with optional LoRA configuration and tests.
- docs: document tokenizer adapter, Hydra configuration groups, and data loading utilities.
2025-09-26 β Codex-ready task sequence foundation¶
- Added codex_ready_task_sequence.yaml to document the reproducible workflow specification.
- Replaced codex_task_sequence.py with offline CLI orchestrating preparation, mapping, construction, pruning, and finalization phases.
- Introduced evaluate_dataloader helper and improved validation logging in training/functional_training.py with gradient accumulation checks.
- Added configs/base_config.py and regression tests covering evaluation loop, gradient accumulation, config loading, and MLflow fallback.
- Documented deferred modules and guarded pytest via a new
deferredmarker and skip logic.