Skip to content

Codex Changelog

2025-10-14 – Evaluation helper & tokenizer adapter refresh

WHY

  • Provide the reusable evaluate_dataloader helper promised in the audit plan.
  • Ensure GPU metrics logging degrades gracefully on CPU-only environments.
  • Offer a lightweight Hugging Face tokenizers adapter for offline JSON artefacts.
  • Surface LoRA defaults in Hydra config while keeping changelog traceability.

Changes

  • src/codex_ml/eval/evaluator.py: add _MetricAggregator utilities and the public evaluate_dataloader helper with optional metric hooks.
  • src/codex_ml/callbacks/system_metrics.py: import-guard NVML and emit zeroed GPU metrics when unavailable.
  • src/codex_ml/interfaces/tokenizer.py: register HFTokenizerAdapter around tokenizer.json files and expose via package exports.
  • configs/base/default.yaml: include explicit nested training.lora defaults aligned with audit guidance.
  • Tests:
  • tests/eval/test_evaluate_dataloader_helper.py: unit tests covering averaging behaviour and torch guardrails.
  • Metadata: export evaluate_dataloader through __all__ and document the change in this changelog.

Risk

  • Minimal. The helper is additive and gated on torch presence; NVML fallback only affects metrics reporting.

Rollback

  • Remove the new helper/test and delete the tokenizer adapter registration; restore the previous changelog entry order if needed.

Tests/Docs

  • Added focused pytest coverage for the evaluation helper. No documentation build required.

2025-10-06 β€” Unified training + tracking guards + data determinism tests

WHY

  • Reduce training-loop drift and standardize resume/grad-clipping hooks.
  • Enforce offline-first tracking to avoid accidental remote egress.
  • Improve data determinism confidence with simple, fast property tests.

Changes

  • src/codex_ml/training/unified_training.py: new faΓ§ade run_unified_training with legacy shims emitting DeprecationWarning.
  • src/codex_ml/checkpointing/checkpoint_core.py: add load_checkpoint(...) to match save_checkpoint(...).
  • src/codex_ml/tracking/guards.py: add decide_mlflow_tracking_uri(...) returning a structured decision and normalizing URIs.
  • Tests:
  • tests/tracking/test_tracking_guards.py: parameterized matrix for MLflow/W&B offline gates and allow-remote override.
  • tests/data/test_dataset_determinism.py: checksum stability, seed-diff, shard coverage, UTF-8 fallback.
  • tests/training/test_unified_training_warnings.py: DeprecationWarning assertions on legacy shims.
  • Docs:
  • docs/unified_training.md, docs/SEARCH_NOTES.md.

Risk

  • Behavior change for users relying on remote MLflow endpoints when offline signals are set. Mitigated by CODEX_ALLOW_REMOTE_TRACKING=1 escape hatch.

Rollback

  • Remove imports/usages of codex_ml.tracking.guards and legacy shims will remain no-ops.
  • Revert the added modules; legacy training paths unaffected.

Tests/Docs

  • See new tests/* and docs/* files. No GitHub Actions were added or modified.

2025-09-17 – Checkpoint resume & dataset manifest updates

  • Removed duplicate src/codex/training.py01 after verifying no live references remained.
  • Implemented CheckpointManager.load_latest, updated the training CLI to auto-discover the latest checkpoint for --resume-from, and added regression tests.
  • Updated the dataset registry to apply seeded shuffling, emit manifest files, and expanded data loader tests for deterministic coverage.

  • Pinned pre-commit==4.0.1, nox==2024.5.1, and pytest-cov==7.0.0 across dev requirements and lockfiles to ensure offline availability.

  • Extended codex_setup.py, scripts/codex_local_gates.sh, and codex_workflow.py to record gate CLI availability in .codex/session_logs.db.
  • Hardened configs/development/noxfile.py coverage sessions to emit hashed JSON reports under artifacts/coverage/ and log the artifact metadata.

2025-08-28 – Codex offline runner

  • Added tools/codex_run.py orchestrator with audit fallback and local gates.
  • Added tools/codex_run.sh wrapper.

2025-08-31 – CLI testing improvements

  • Restricted Nox test session to Python 3.12 and installed missing CLI/API dependencies.
  • Enabled importlib import mode via pytest.ini to prevent module name collisions during collection.

2025-11-25 – Static code analysis step

  • Added static_code_analysis stage to analysis/audit_pipeline.py and integrated it with scripts/codex_local_gates.sh.
  • Logs syntax-check metrics for Python sources.
  • Introduced a unit test verifying metric emission.

2025-11-24 – Offline upgrade script

  • Added codex_ast_upgrade.py to automate tiered parsing setup and offline auditing.

2025-11-23 – Tiered parsing and offline audit pipeline

  • Added analysis modules with tiered parsing fallbacks and search providers.
  • Added CLI codex_ml.cli.audit_pipeline and tests for AST extraction.
  • Documented "Fallback Modes & Feature Flags" in README.
  • Deferred advanced codemods and online external search; kept AST-only analyzers as fallback.

2025-05-19 – Validation metrics & splits

  • Added: --val-split/--test-split flags and per-epoch validation logging to metrics.json.
  • Deferred: stratified splits, GPU-heavy metrics, and online trackers.
  • Risks: small datasets may skip evaluation when insufficient tokens.

2025-11-09 – Offline experiment tracking

  • Added unified codex_ml.monitoring.codex_logging with optional TensorBoard, W&B, and MLflow sinks.
  • Patched engine_hf_trainer.py and functional_training.py to sample CPU/GPU metrics and log per-step scalars.
  • Added offline test coverage for logging bootstrap and docs for monitoring and experiment tracking.
  • Deferred: online W&B/remote MLflow servers, full Trainer callbacks, and extended NVML telemetry.

2025-08-26 – LoRA and deterministic splits

  • Implemented optional LoRA adapter with graceful fallback when peft is missing.
  • Added grad accumulation and mixed precision helpers to functional_training.py.
  • Introduced deterministic data splitting utility.
  • Generated requirements/lock.txt and local test gate script.
  • Sanitized external links in README for offline use.

CI policy docs β€” 2025-08-26T20:17:49Z

  • Created /workspace/codex/docs/ci.md (web search allowed; remote CI disallowed)

Disable remote CI β€” 2025-08-26T20:17:49Z

  • Patched 5 workflow file(s) to workflow_dispatch and guarded jobs.
  • Total jobs guarded: 7

2025-08-26 – Ξ” PR Checklist Applied

New

  • standalone analysis package with audit pipeline for offline checks.

Modified

  • README includes "Offline CI & Local Parity" policy block.
  • scripts/codex_local_gates.sh enforces coverage during local tests.
  • Workflows guarded with _codex_guard job and manual triggers.

Removed

  • none

Deferred / Pruned

  • existing analysis utilities under src/codex_ml/analysis retained without duplication.

2025-08-28 – Portable workflow tooling

New

  • tools/audit_runner.py provides dual-path audit execution with optional external CLI.
  • tools/run_precommit.py adds verbose pre-commit runner with timeout and cache cleanup.
  • tools/run_tests.py wraps pytest with optional coverage fallback.
  • tools/codex_workflow.py and tools/codex_workflow.sh orchestrate audit, hooks, and tests locally.

2025-08-29 – Misc bug fixes and utilities

  • Added shebang and docs to tools/label_policy_lint.py.
  • Ensured git_tag.current_commit decodes byte output.
  • Added setter for SentencePieceAdapter model_prefix.
  • Guarded MLflow run initialization by checking MLFLOW_TRACKING_URI.
  • Corrected EarlyStopping patience comparison.
  • Introduced importlib-based CLI viewer module.
  • Exposed RNG state helpers and best-k retention tests.
  • Implemented placeholder keyword risk scoring.
  • Added seed-controlled shuffling to data loaders.
  • Warned on duplicate registry registrations.

2025-08-29 – Utilities and test cleanup

  • Added standalone utils.training_callbacks with EarlyStopping.
  • WHY: share training callback outside codex_ml package.
  • RISK: low; new module.
  • ROLLBACK: revert src/utils/training_callbacks.py.
  • Improved git tag decoding to try locale and latin-1 fallbacks.
  • WHY: handle non-UTF-8 git outputs gracefully.
  • RISK: minimal; affects only metadata helpers.
  • ROLLBACK: revert changes in src/codex_ml/tracking/git_tag.py.
  • Fixed missing imports in label_policy_lint tests.
  • WHY: ensure lint helper tests run.
  • RISK: none; tests only.
  • ROLLBACK: revert tests/test_label_policy_lint.py.

Codex Changelog

2025-08-30 – Tokenizer, MLflow, and ingestion utilities

WHY

  • Introduce a canonical HFTokenizer adapter with batching and decode/pad APIs to unify tokenization usage across the codebase.
  • Add/standardize ingestion helpers including encoding detection (read_text(..., encoding="auto")) and deterministic shuffling for reproducible data splits.
  • Provide MLflow tracking utilities with configurable system-metrics toggle and safe no-op behavior when tracking is disabled or the dependency is missing.
  • Consolidate ingestion utilities and tooling fixes for consistent execution and packaging.

RISK

  • Low: modules are thin wrappers with safe fallbacks and preserve prior interfaces where practical. Optional dependencies (transformers, peft, mlflow, charset-normalizer, pytest-cov, etc.) degrade gracefully.

ROLLBACK

  • Remove newly added modules (tokenizer adapter, ingestion helpers, tracking helpers) and revert any changed imports to restore previous behavior.

TEST

  • Recommended checks:
  • pre-commit run --files src/codex_ml/interfaces/tokenizer.py src/codex_ml/tracking/mlflow_utils.py src/ingestion/encoding_detect.py src/ingestion/io_text.py src/ingestion/utils.py tests/interfaces/test_tokenizer_hf.py tests/tracking/test_mlflow_utils.py tests/ingestion/test_io_text.py
  • pytest (note: during initial integration this run reported a number of collection/compatibility issues β€” expect follow-up fixes; historically some branches reported ~13 collection errors).

2025-08-30 – Tokenizer unification and ingestion consolidation (alternate notes)

WHY

  • Consolidate tokenization into a single HFTokenizer adapter with batch helpers.
  • Merge ingestion utilities with encoding detection and deterministic shuffling.
  • Restore MLflow helpers with safe no-op fallbacks and environment-variable-aware handling.
  • Add package markers and tooling fixes to improve consistent execution across environments.

RISK

  • Low: changes preserve existing interfaces and degrade gracefully when optional dependencies are missing.

ROLLBACK

  • Revert this commit to restore the previous module layout.

2025-08-29 – Restore no-op MLflow context manager

WHY

  • Maintain previous behaviour where disabled MLflow tracking yields a no-op context manager so with start_run(cfg) remains safe when tracking is off or mlflow is unavailable.

RISK

  • Low: ensures backward compatibility for callers that expect a falsy/no-op context manager rather than an exception.

ROLLBACK

  • Revert this commit to return None instead (if required by alternate compatibility concerns).

2025-08-29 – Local orchestration scripts

WHY

  • Add local tooling to run the sequential Codex workflow:
  • tools/codex_exec.py and tools/codex_exec.sh to run the end-to-end local workflow (preparation, scan, suggest patches, capture errors, finalize).
  • Local task runner utilities for running pytest-selected tasks and capturing failure outputs for later inspection.

RISK

  • Low: scripts are optional and operate only on the local repository.

ROLLBACK

  • Remove the newly added scripts.

REPRO

  • bash tools/codex_exec.sh to generate local artifacts and reports.

2025-08-29 – Tokenizer & training wiring

WHY

  • Thread gradient-accumulation and bf16/fp16 flags into the HF Trainer wrapper.
  • Provide optional LoRA integration points and deterministic ingestion helpers for training reproducibility.
  • Add utilities to capture failing task output into commit-comment artifacts to help maintainers triage issues.

RISK

  • Low: trainer wiring is additive and optional; defaults preserve prior behaviour if optional dependencies are absent.

ROLLBACK

  • Revert this commit to remove tokenizer, training, and tracking utilities.

2025-08-29 – Tokenizer and tracking utilities (detailed)

WHY

  • Introduce HFTokenizer interface with padding/truncation controls and batch encode/decode helpers.
  • Add deterministic ingestion helpers (seeded_shuffle / deterministic_shuffle), and robust read_text with optional automatic encoding detection.
  • Simplify MLflow tracking via optional start_run helper with safe no-op behavior and configurable environment flags.
  • Expose Trainer precision and LoRA-related options in the HF trainer wrapper.

RISK

  • Low: new utilities are optional and default to existing behaviour when optional dependencies are missing.

ROLLBACK

  • Revert this commit to restore previous behaviour.

REPRO CHECKLIST

  • Set PYTHONHASHSEED=0 for deterministic ordering where needed.
  • Use seeded_shuffle (or deterministic_shuffle) for reproducible splits.
  • Use read_text(..., encoding="auto") where encoding may vary.
  • pre-commit run --all-files --verbose.
  • pytest --cov=src/codex_ml --cov-report=term.

2025-08-29 – Phase 3 integrations

WHY

  • Guard MLflow run initialization behind CODEX_ENABLE_MLFLOW and related flags to avoid accidental tracking.
  • Resolve CLI viewer resources using importlib.resources for packaging safety.
  • Provide a lightweight checkpoint manager with RNG persistence and best-K pruning when available.
  • Add local task runner utilities (tools/codex_run_tasks.py) and optional venv bootstrap helpers.

RISK

  • Low: features are optional and protected by environment flags.

ROLLBACK

  • Revert this commit to restore previous behavior.

Notes and Next Steps

  • Multiple entries on 2025-08-29/30 reflect iterative integration steps across tokenizer, ingestion, tracking, and trainer wiring. The changes are intentionally additive and guarded; follow-up PRs should address the remaining test collection errors and tighten compatibility layers (e.g., signature normalization for legacy read_text implementations, optional dependency detection, and test environment setup).
  • When backporting or cherry-picking these changes to other branches, ensure environment flags (e.g., CODEX_ENABLE_MLFLOW, CODEX_POST_COMMIT_COMMENT) and optional-dependency handling remain consistent to avoid surprising behavior in CI.

2025-08-28 β€” Codex Run

  • Enforced self-hosted-only gates via make codex-gates and scripts/codex_local_gates.sh
  • Added doctor script and workflow executor
  • Updated README and docs to recommend self-hosted runners and MLflow tracking

$(date -u +%Y-%m-%d) β€” Codex Run

  • Added pad_id and eos_id accessors to HFTokenizerAdapter.
  • Surfaced monitoring exceptions in functional_training via stderr logging.
  • Introduced --grad-accum support and metric logging in train_loop.
  • CheckpointManager now writes system.json for hardware metadata.
  • Registered codex-ml-cli entry point in pyproject.
  • Updated README with offline CI instructions and codex-ml-cli usage.
  • Added tests for tokenizer IDs, grad accumulation, and checkpoint system metadata.
  • Added codex_seq_runner and run_codex_sequence utilities.

2025-09-02 β€” Codex Run

  • Enabled optional padding/truncation in HFTokenizerAdapter.encode.
  • Wired apply_lora into run_hf_trainer with dtype/device placement.
  • Added checkpoint resume support via resume_from parameter.

[Unreleased] - 2025-09-02

  • Format src/codex_ml/safety/risk_score.py with Black.
  • Correct README typos and complete environment description.
  • Pin peft dependency to ensure nox -s tests passes.
  • Applied fallback_patch_4.1-4.8 with safe sanitizeβ†’apply fallbacks; preserved intended functionality.
  • Normalized line endings/BOM; stripped markdown/email artifacts from patch.
  • Conformed to local gates (pre-commit/Black/isort/tests), Codex-only (no GitHub Actions). - Ensure test dependencies (including langchain) are installed so nox -s tests passes.
  • Add runbook for offline wheelhouse usage at docs/runbooks/offline_wheelhouse.md.
  • Add smoke test proving nox -s tests delegates to coverage.
  • Add wheelhouse alias in Makefile for bootstrap script.
  • Expand configs/development/noxfile.py with tests_sys and tests_ssp sessions, optional uv|virtualenv backend, PIP_CACHE_DIR default.
  • feat(gates): Add black/isort/bandit/detect-secrets/safety hooks; nox sec_scan; Make sys-tests/ssp-tests.
  • feat(deps): Introduce tools/uv_lock_refresh.sh to generate uv.lock and compiled requirements.
  • feat(trainer): Early stopping + scheduler selection wired into Trainer.
  • feat(logging): Rotating file handler with sane defaults.
  • feat(tokenization): SentencePiece adapter padding/truncation shims + __all__.
  • tests(tokenization): Edge-case test gated by SPM_TINY_MODEL.

  • Introduced general TokenizerAdapter with HuggingFace and whitespace implementations; added basic round-trip tests.

  • Added simple dataset loader supporting text/NDJSON/CSV with caching and safety hooks, plus deterministic split utilities.

Unreleased - 2025-09-07

  • Updated README references to current configuration structure.
  • Integrate safety filter with data loader and add license checker script.
  • Generated gaps report for TODOs and stubs.
  • Executed pre-commit and nox quality gate sessions.
  • Add dataset loader supporting TXT/NDJSON/CSV with caching and safety filtering.
  • feat(model): introduce model registry with optional LoRA configuration and tests.
  • docs: document tokenizer adapter, Hydra configuration groups, and data loading utilities.

[Unreleased] - 2025-09-02

  • Format src/codex_ml/safety/risk_score.py with Black.
  • Correct README typos and complete environment description.
  • Pin peft dependency to ensure nox -s tests passes.
  • Applied fallback_patch_4.1-4.8 with safe sanitizeβ†’apply fallbacks; preserved intended functionality.
  • Normalized line endings/BOM; stripped markdown/email artifacts from patch.
  • Conformed to local gates (pre-commit/Black/isort/tests), Codex-only (no GitHub Actions). - Ensure test dependencies (including langchain) are installed so nox -s tests passes.
  • Add runbook for offline wheelhouse usage at docs/runbooks/offline_wheelhouse.md.
  • Add smoke test proving nox -s tests delegates to coverage.
  • Add wheelhouse alias in Makefile for bootstrap script.
  • Expand configs/development/noxfile.py with tests_sys and tests_ssp sessions, optional uv|virtualenv backend, PIP_CACHE_DIR default.
  • feat(gates): Add black/isort/bandit/detect-secrets/safety hooks; nox sec_scan; Make sys-tests/ssp-tests.
  • feat(deps): Introduce tools/uv_lock_refresh.sh to generate uv.lock and compiled requirements.
  • feat(trainer): Early stopping + scheduler selection wired into Trainer.
  • feat(logging): Rotating file handler with sane defaults.
  • feat(tokenization): SentencePiece adapter padding/truncation shims + __all__.
  • tests(tokenization): Edge-case test gated by SPM_TINY_MODEL.

  • Introduced general TokenizerAdapter with HuggingFace and whitespace implementations; added basic round-trip tests.

  • Added simple dataset loader supporting text/NDJSON/CSV with caching and safety hooks, plus deterministic split utilities.

Unreleased - 2025-09-07

  • Updated README references to current configuration structure.
  • Integrate safety filter with data loader and add license checker script.
  • Generated gaps report for TODOs and stubs.
  • Executed pre-commit and nox quality gate sessions.
  • Add dataset loader supporting TXT/NDJSON/CSV with caching and safety filtering.
  • feat(model): introduce model registry with optional LoRA configuration and tests.
  • docs: document tokenizer adapter, Hydra configuration groups, and data loading utilities.

2025-09-26 – Codex-ready task sequence foundation

  • Added codex_ready_task_sequence.yaml to document the reproducible workflow specification.
  • Replaced codex_task_sequence.py with offline CLI orchestrating preparation, mapping, construction, pruning, and finalization phases.
  • Introduced evaluate_dataloader helper and improved validation logging in training/functional_training.py with gradient accumulation checks.
  • Added configs/base_config.py and regression tests covering evaluation loop, gradient accumulation, config loading, and MLflow fallback.
  • Documented deferred modules and guarded pytest via a new deferred marker and skip logic.