DOCUMENTATION COVERAGE BY PACKAGE - PRIORITIZATION GUIDE¶
Phase 5 Package-Level Priorities¶
HIGH PRIORITY (Phases 1-4)¶
1. codex_ml/ - CRITICAL¶
- Undocumented Items: 1,940
- Files: 439
- Coverage: 51.5%
- Priority: P0
- Rationale: Largest package, core ML functionality
- Effort: ~161 pre-commits (48% of total)
- Key Areas:
data/- Data loading and preprocessingmodeling/- Model architecturestraining/- Training loopscheckpointing/- Model persistencereward_models/- RL reward models
2. codex/ - HIGH¶
- Undocumented Items: 551
- Files: 257
- Coverage: 75.5%
- Priority: P0
- Rationale: Core application package, user-facing
- Effort: ~46 pre-commits (14% of total)
- Key Areas:
zendesk/- CRM integrationsevidence/- Evidence trackingplans/- Planning systemrag/- RAG pipeline
3. training/ - HIGH¶
- Undocumented Items: 130
- Files: 17
- Coverage: 36.0%
- Priority: P0
- Rationale: Critical for ML operations, poor coverage
- Effort: ~11 pre-commits (3% of total)
4. mcp/ - MEDIUM-HIGH¶
- Undocumented Items: 123
- Files: 60
- Coverage: 62.6%
- Priority: P1
- Rationale: MCP protocol implementation
- Effort: ~10 pre-commits (3% of total)
MEDIUM PRIORITY (Phases 5-6)¶
5. hhg_logistics/ - MEDIUM¶
- Undocumented Items: 57
- Files: 26
- Coverage: 29.6%
- Priority: P1
- Rationale: Low coverage, specific domain
- Effort: ~5 pre-commits
6. common/ - MEDIUM¶
- Undocumented Items: 48
- Files: 9
- Coverage: 21.3%
- Priority: P1
- Rationale: Shared utilities, low coverage
- Effort: ~4 pre-commits
7. tokenization/ - MEDIUM¶
- Undocumented Items: 31
- Files: 7
- Coverage: 34.0%
- Priority: P1
- Rationale: Text processing core
- Effort: ~3 pre-commits
8. codex_audit/ - CRITICAL (Zero Coverage)¶
- Undocumented Items: 31
- Files: 6
- Coverage: 0.0%
- Priority: P0
- Rationale: Zero documentation is unacceptable
- Effort: ~3 pre-commits
9. codex_harness/ - HIGH¶
- Undocumented Items: 30
- Files: 4
- Coverage: 6.2%
- Priority: P0
- Rationale: Nearly zero coverage
- Effort: ~3 pre-commits
10. codex_cli/ - HIGH¶
- Undocumented Items: 24
- Files: 2
- Coverage: 7.7%
- Priority: P0
- Rationale: User-facing CLI, poor coverage
- Effort: ~2 pre-commits
LOW PRIORITY (Phases 7-8 or Phase 6)¶
Well-Documented Packages (>90% coverage):¶
- cognitive_brain/ - 97.7% ✅
- context_management/ - 99.6% ✅
- zendesk/ - 93.6% ✅
- ingestion/ - 88.2% ✅
- agent/ - 89.3% ✅
- security/ - 84.0% ✅
These packages serve as examples of good documentation and can be used as templates.
FOCUSED STRATEGY: 80/20 RULE¶
Top 5 Packages Account for 80% of Work¶
| Package | Undocumented | % of Total | Cumulative |
|---|---|---|---|
| codex_ml | 1,940 | 58.7% | 58.7% |
| codex | 551 | 16.7% | 75.4% |
| training | 130 | 3.9% | 79.3% |
| mcp | 123 | 3.7% | 83.0% |
| hhg_logistics | 57 | 1.7% | 84.7% |
Recommendation: Focus 80% of Phase 5 effort on these 5 packages.
PHASE BREAKDOWN¶
Phases 1-2: Foundation & Quick Wins¶
- codex_audit/ (3 pre-commits) - Zero to 90%
- codex_harness/ (3 pre-commits) - 6% to 90%
- codex_cli/ (2 pre-commits) - 8% to 95%
- Training package core (11 pre-commits) - 36% to 80%
- Start codex_ml/data/ (20 pre-commits)
Total: 39 pre-commits
Impact: 3 packages to >90%, training to 80%
Phases 3-4: codex_ml Deep Dive¶
- codex_ml/modeling/ (40 pre-commits)
- codex_ml/training/ (30 pre-commits)
- codex_ml/data/ (continued, 30 pre-commits)
Total: 100 pre-commits
Impact: Core ML components documented
Phases 5-6: codex Package & MCP¶
- codex/rag/ (15 pre-commits)
- codex/zendesk/ (10 pre-commits)
- codex/plans/ (10 pre-commits)
- mcp/ (10 pre-commits)
- Tutorial creation (21 pre-commits)
Total: 66 pre-commits
Impact: Main packages to 90%+, tutorials complete
Phases 7-8: Polish & Remaining¶
- codex_ml/remaining (30 pre-commits)
- tokenization/ (3 pre-commits)
- common/ (4 pre-commits)
- hhg_logistics/ (5 pre-commits)
- Link fixes & API reference (25 pre-commits)
Total: 67 pre-commits
Impact: All P0/P1 complete, docs polished
PACKAGE-SPECIFIC DOCUMENTATION TEMPLATES¶
For ML Packages (codex_ml, training)¶
"""Machine learning model training utilities.
This module provides training loops, optimization strategies,
and checkpoint management for neural network models.
Key Components:
- Trainer: Main training orchestration
- Optimizer: Learning rate scheduling
- Checkpointer: Model persistence
Example:
>>> from codex_ml.training import Trainer
>>> trainer = Trainer(model, config)
>>> trainer.train(dataset)
See Also:
- codex_ml.modeling: Model architectures
- codex_ml.data: Data loading
"""
For API/Service Packages (mcp, codex)¶
"""MCP protocol server implementation.
Implements the Model Context Protocol for AI model serving,
including request handling, authentication, and rate limiting.
Architecture:
FastAPI server with async request handling, Redis caching,
and PostgreSQL for persistence.
Example:
>>> from mcp.server import create_app
>>> app = create_app(config)
>>> app.run(host="0.0.0.0", port=8000)
See Also:
- docs/MCP_SETUP_GUIDE.md
- docs/api/mcp_reference.md
"""
MEASUREMENT & TRACKING¶
Phase Progress Metrics¶
phase_1:
target_coverage: 88.0
packages_completed:
- codex_audit
- codex_harness
- codex_cli
pre_commits: 39
phase_2:
target_coverage: 89.0
packages_in_progress:
- codex_ml
- training
pre_commits: 40
# ... continue tracking
Package Coverage Goals¶
| Package | Current | Phase 4 | Phase 8 | Final Goal |
|---|---|---|---|---|
| codex_ml | 51.5% | 65% | 80% | 85% |
| codex | 75.5% | 85% | 92% | 95% |
| training | 36.0% | 80% | 90% | 90% |
| mcp | 62.6% | 75% | 90% | 90% |
| codex_audit | 0.0% | 90% | 95% | 95% |
AUTOMATION SCRIPTS¶
Generate Package Report¶
#!/bin/bash
# check_package_coverage.sh
PACKAGE=$1
interrogate --fail-under=80 \
--ignore-init-method \
--ignore-module \
--verbose \
src/${PACKAGE}/
echo "Package: ${PACKAGE}"
echo "Coverage threshold: 80%"
Pre-commit Hook¶
# .pre-commit-config.yaml addition
- repo: local
hooks:
- id: docstring-coverage
name: Check docstring coverage
entry: interrogate
args: ['--fail-under=75', 'src/']
language: python
pass_filenames: false
SUCCESS CRITERIA BY PACKAGE¶
Tier 1 (Must Achieve - P0)¶
- codex_ml: 80%+ coverage
- codex: 90%+ coverage
- training: 85%+ coverage
- codex_audit: 90%+ coverage (from 0%)
- codex_harness: 90%+ coverage (from 6%)
Tier 2 (Should Achieve - P1)¶
- mcp: 85%+ coverage
- hhg_logistics: 70%+ coverage
- common: 75%+ coverage
- tokenization: 75%+ coverage
Tier 3 (Nice to Have - P2)¶
- All other packages: 70%+ coverage
RISK MITIGATION¶
Package-Specific Risks¶
codex_ml (1,940 items): - Risk: Too large, may not complete - Mitigation: Focus on public APIs first (data/, modeling/, training/) - Fallback: Document top-level only, defer internals
training (36% coverage): - Risk: Complex ML concepts hard to document - Mitigation: Pair with ML engineer for technical review - Fallback: Link to external references (PyTorch docs)
codex_audit (0% coverage): - Risk: May be deprecated/unused code - Mitigation: Verify with maintainers before documenting - Fallback: Mark for removal if obsolete
CONCLUSION¶
Phase 5 should follow a package-centric approach with clear phase targets:
- Phases 1-2: Quick wins (3 packages from <10% to >90%)
- Phases 3-4: codex_ml core (51% → 65%)
- Phases 5-6: codex & MCP (75% → 92%, 63% → 85%)
- Phases 7-8: Remaining packages + polish
Expected Final Coverage: - Overall: 92/100 (from 85.5) - codex_ml: 80%+ (from 51.5%) - codex: 92%+ (from 75.5%) - All P0 packages: 85%+
This focused approach delivers maximum impact within the 8-phase timeline.