Change Log¶

2025-10-16¶

Added src/modeling.py, src/logging_utils.py, src/data/datasets.py, and src/training/trainer.py to implement the modular training stack.
Updated Hydra configs (configs/base/default.yaml, configs/model/base.yaml, configs/training/base.yaml, configs/training/data/tiny.yaml) and added a sample TSV dataset at data/tiny_text_classification.tsv.
Documented the trainer stack and reproducibility checklist in README.md; refreshed src/data/__init__.py exports and error logging notes.

Created src/modeling.py to centralise Hugging Face model/tokenizer loading with optional LoRA/PEFT hooks driven by Hydra config values.
Added src/training/trainer.py and exported classes to provide a mixed-precision aware trainer with evaluation, gradient accumulation, logging integration, and best-k checkpoint retention.
Introduced src/logging_utils.py to initialise TensorBoard/MLflow sessions and emit metrics from the trainer.
Added src/data/datasets.py for TSV-based text classification datasets and DataLoader construction.
Restructured Hydra configs (configs/base/default.yaml, configs/model/base.yaml, configs/training/base.yaml, configs/training/data/tiny.yaml) to compose defaults while preserving legacy aliases.
Documented the modular training stack and config layout updates in README.md and refreshed the model registry / reproducibility docs.
Added targeted unit tests for modeling, datasets, trainer, and logging utilities under tests/ to guard the new functionality.