Skip to content

Codebase Cognitive Map

Generated: 2026-01-23T08:42:00Z | Updated by: doc-freshness-checker agent PR: #2960 | Branch: copilot/update-html-documentation-standards


🎯 Mission Overview

Objective: Provide a high-level cognitive map of the _codex_ repository including components, flows, dependencies, and operational context for AI agents and human contributors.

Energy Level: ⚑⚑⚑⚑ (4/5 - High Priority Reference Document)

Status: 🟒 Active

Last Updated: 2026-01-23T08:42:00Z | Version: 2.0.0 | Last Reviewed: 2026-01-23T08:42:00Z


Architecture Overview

Type: Modular ML/AI Platform with Agent Orchestration MLOps Maturity: Level 4 (100/100 Azure MLOps) - Production Ready Stats: 1500+ tests (100% passing), 72% coverage, 0 vulnerabilities

Repository Structure

_codex_/
β”œβ”€β”€ src/              # Core application code
β”‚   β”œβ”€β”€ codex/       # Ingestion pipeline (ingestβ†’analyzeβ†’transformβ†’verify)
β”‚   β”œβ”€β”€ rag/         # RAG pipelines & retrieval
β”‚   β”œβ”€β”€ verification/ # Chain-of-Verification (CoVe)
β”‚   β”œβ”€β”€ mcp/         # Model Context Protocol adapters
β”‚   └── tools/       # Tool registry
β”œβ”€β”€ agents/          # Autonomous agents (workflow, quantum, physics)
β”œβ”€β”€ scripts/         # Automation & utilities
β”‚   └── mcp/        # ChatGPT Project packaging system
β”œβ”€β”€ tests/           # 1500+ test suite
β”œβ”€β”€ docs/            # Documentation hub
β”‚   β”œβ”€β”€ mcp/        # MCP packaging docs (93+ KB)
β”‚   β”œβ”€β”€ system/     # Cognitive brain (this file)
β”‚   └── capabilities/ # Capability guides
└── .github/         # CI/CD workflows & automation

Core Components

1. Codex Ingestion Pipeline (src/codex/)

Purpose: Complete Python code processing system

Commands:

python -m codex.cli ingest <source>      # Ingest code (file/ZIP/Git)
python -m codex.cli analyze <snapshot-id> # Static + runtime analysis
python -m codex.cli transform <snapshot-id> --tier A # Apply transformations
python -m codex.cli verify <snapshot-id> # Behavior verification

Flow: Source β†’ Ingest β†’ Analyze β†’ Transform β†’ Verify β†’ PR

2. Agent System (agents/)

Purpose: Autonomous AI agents with physics-inspired optimization

Key Agents: - workflow_navigator.py - Tokenized workflows (AUDIT_EXEC, DOC_GEN) - quantum_game_theory.py - Quantum-inspired decisions - physics_orchestrator.py - 6 physics paradigms - mental_mapping.py - Context tracking

Tokens: audit, decide, docs, organize, review, heal

3. MCP Package System (scripts/mcp/)

Purpose: Package codebase for ChatGPT Projects

Commands:

./scripts/mcp/mcp-package --list              # List 9 topics
./scripts/mcp/mcp-package --topic agents      # Package by topic
./scripts/mcp/mcp-package --custom "patterns" # Custom patterns

Topics: zendesk, agents, quantum, docs, mcp, workflows, python_dev, testing, security

Output: Flat ZIP with manifest.json, README_dataset.md, index.md

Docs: docs/mcp/ - 93+ KB across 8 comprehensive guides

4. RAG & Verification (src/rag/, src/verification/)

  • RAG pipelines: Chunking, embedding, retrieval
  • CoVe: Chain-of-Verification fact-checking
  • MCP adapters: Pinecone, Mock integrations

Data Flows

Code Ingestion

External Source β†’ Ingest β†’ Static Analysis β†’ Runtime Analysis β†’
LLM Intent Inference β†’ Transformation β†’ Verification β†’ PR Creation

Agent Workflow

Request β†’ WorkflowNavigator β†’ Agent Orchestration β†’
Task Execution β†’ Verification β†’ State Persistence

MCP Packaging

Human Request β†’ Component Selection β†’ File Flattening β†’
Manifest Generation β†’ ZIP Creation β†’ ChatGPT Upload

CI/CD

Git Push β†’ Status Validation β†’ Security Gates β†’ Quality Gates β†’
Test Execution β†’ Cache Management β†’ Artifact Generation

Dependencies & Integrations

External Services

  • OpenAI API: LLM intent inference (OPENAI_API_KEY)
  • GitHub API: PR creation, workflows (GITHUB_TOKEN)
  • Pinecone: Vector embeddings (optional)
  • CodeQL/Semgrep: Security scanning

Python Dependencies

  • Core: numpy, pandas, openai, httpx, pydantic, hydra-core
  • Dev: pytest, black, ruff, mypy, nox, pre-commit
  • ML/AI: torch, transformers, safetensors (optional)

CI/CD Pipeline

Key Workflows (.github/workflows/)

Workflow Trigger Purpose Cache
status_validation.yml push, PR Repo status -
security_gates.yml push, PR Security -
nox_gates.yml push, PR Quality (lint/type) Ruff, MyPy
optimized-ci.yml push, PR Optimized CI All tools
build-chatgpt-package.yml dispatch MCP packaging -
scan-secrets-variables.yml schedule Secrets scan Gitleaks

Cache Strategy (Phase 3C-Lite)

  • Ruff: ~20-30 MB | MyPy: ~50-80 MB
  • Pytest: ~30-50 MB | pre-commit: ~50-100 MB
  • Total: 7.69 GB / 10 GB limit (23% buffer)
  • Keys: ${{ runner.os }}-${{ github.workflow }}-<tool>-${{ hashFiles(...) }}

Security & Secrets

Secrets (GitHub UI injected)

  • OPENAI_API_KEY - OpenAI API
  • PINECONE_API_KEY - Pinecone (optional)
  • CODEX_MASTER_KEY - Genesis Protocol

Security Scanning

  • Gitleaks, Trufflehog - Secret detection
  • Semgrep SAST - Static analysis
  • CodeQL - Code scanning

Anti-/tmp/ Protection

Policy: Use .github/tmp/ instead of /tmp/ Applied: emergency_cache_cleanup.sh, MCP tools Doc: docs/system/ANTI_TMP_PROTECTION_SYSTEM.md


MCP & ChatGPT Integration

Packaging Capabilities

  1. 9 Predefined Topics: All major capabilities covered
  2. Custom Patterns: Glob-based file selection
  3. Flat Structure: Optimized for ChatGPT
  4. Metadata: SHA256, sizes, language detection
  5. Navigation: Manifest-driven discovery

Methodology Transfer (8 Capabilities)

  1. Python script development/deconstruction
  2. Workflow navigation & state management
  3. Quantum game theory application
  4. API integration patterns
  5. CI/CD workflow optimization
  6. Agent-based architecture
  7. TDD methodology
  8. Documentation generation

Documentation (docs/mcp/)

  • QUICK_START.md - 5-minute onboarding
  • PACKAGING_GUIDE.md - Complete workflows
  • PACKAGEABLE_CAPABILITIES.md - Capability transfer
  • ChatGPT_Project_SYSTEM_PROMPT.md - AI prompt
  • GENERIC_NAVIGATION_SYSTEM.md - Universal navigation
  • ADVANCED_FEATURES_PLANSET.md - Roadmap (Future iterations)

Operational Context

GitHub Limits

  • Copilot Pro+: 64K tokens/session
  • GitHub Team: 10 GB cache, limited Actions minutes
  • Current Cache: 7.69 GB (23% buffer)

Quality Metrics

  • Tests: 1500+ (100% passing)
  • Coverage: 72% (target: 80%+)
  • Security: 0 vulnerabilities
  • Cache Hit Rate: 90%+ projected

Performance Targets

  • Test execution: <5 min
  • Lint/type: <2 min
  • Package creation: <2 min

Quick Reference

Common Commands

# Codex
python -m codex.cli ingest|analyze|transform|verify

# MCP
./scripts/mcp/mcp-package --list|--topic|--custom

# Testing
make docker-test
pytest tests/ --cov=src/

# Quality
nox -s lint|type|format

# Agent
python -m scripts.space_traversal.audit_runner agent-interface

Entry Points

System Entry Type
Codex CLI python -m codex.cli Module
MCP Package ./scripts/mcp/mcp-package Script
Agent Navigator agents.workflow_navigator Class
Tests pytest / make docker-test Command

Getting Started

  1. Architecture: This doc β†’ docs/ARCHITECTURE.md
  2. Capabilities: docs/capabilities/*.md
  3. Workflows: agents/TOKENIZED_WORKFLOWS.md
  4. MCP: docs/mcp/QUICK_START.md
  5. Contributing: docs/CONTRIBUTING.md

Finding Things

  • Code: src/ (app), agents/ (agents)
  • Tests: tests/ (mirrors src/)
  • Scripts: scripts/ (automation)
  • Docs: docs/ (organized by topic)
  • CI/CD: .github/workflows/

Common Tasks

  • New capability: docs/capabilities/ template
  • Extend agents: agents/workflow_navigator.py
  • Add CI: .github/workflows/ templates
  • Package code: scripts/mcp/mcp-package
  • Run tests: make docker-test


Owner: DevOps + Agent Development Team Review: Monthly or after major changes Last Reviewed: 2026-01-23T08:42:00Z


βš–οΈ Verification Checklist

Architecture Accuracy

  • Component structure matches current repository layout
  • Data flows reflect actual implementation
  • Dependencies list is up-to-date
  • Integration points correctly documented

Documentation Quality

  • All code examples are valid and tested
  • Links to related documents are functional
  • Tables render correctly in GitHub/browser
  • Commands and paths are accurate

Currency

  • Updated to reflect latest repository state (2026-01-23)
  • Version number incremented (2.0.0)
  • Iteration-based workflow language used throughout

πŸ“ˆ Success Metrics

Metric Target Current Status
Documentation freshness <30 iterations 0 iterations βœ…
Broken links 0 0 βœ…
Outdated references 0 0 βœ…
Table rendering issues 0 0 βœ…

βš›οΈ Physics Alignment

Principle Application Section
Path πŸ›€οΈ Clear navigation from overview to detailed components All sections
Fields πŸ”„ Data flows show transformation through pipeline Data Flows
Patterns πŸ‘οΈ Architecture patterns visible and documented Components
Redundancy πŸ”€ Multiple entry points and cross-references Navigation
Balance βš–οΈ Balanced detail across all major components All sections

🧠 Redundancy Patterns

Navigation Redundancy: - Multiple access paths: By component, by workflow, by role - Cross-references between related sections - Both top-down and bottom-up navigation supported

Update Strategy: - Version-controlled documentation - Git history maintains all previous versions - Rollback available via commit history


⚑ Energy Distribution

Section Energy Rationale
Architecture Overview ⚑⚑⚑⚑ Critical for understanding system structure
Core Components ⚑⚑⚑⚑⚑ Essential for development and maintenance
Data Flows ⚑⚑⚑ Important for troubleshooting and optimization
CI/CD Pipeline ⚑⚑⚑ Key for deployment and automation
Quick Reference ⚑⚑ Utility section for common tasks

Questions? β†’ Dashboard