Deployment Guide¶

This guide covers building container images, validating Helm releases, and promoting builds through dev → staging → production.

Multi-Stage Docker Images¶

The primary Dockerfile now uses a three-stage build: 1. builder installs dependencies in a deterministic environment. 2. gpu-runtime extends nvidia/cuda:11.8.0-runtime-ubuntu22.04 for accelerator workloads. 3. cpu-runtime ships a minimal python:3.10-slim image with built-in health checks.

Both runtime stages create a non-root appuser, install curl for health probes, and expose python -m codex_ml.cli.main as the entrypoint.

# CPU build
./scripts/deploy/orchestrate.sh build

# GPU build
./scripts/deploy/orchestrate.sh build --gpu

docker-compose¶

docker-compose.yml defines a CPU profile with mounted data/artifact volumes and an HTTP healthcheck.

docker compose up --build codex-cpu

Environment variables: - MODEL_NAME, TOKENIZER_NAME, MAX_NEW_TOKENS control inference defaults. - API_RATE_LIMIT enforces middleware throttling.

Helm Deployment¶

Updated chart values introduce replicas, resource requests, liveness/readiness probes, and autoscaling.

helm lint deploy/helm
helm template codex deploy/helm

Override values per environment using --values or --set flags. For production, ensure GPU nodes are available to satisfy nvidia.com/gpu limits.

CI/CD Integration¶

Run pre-commit hooks (black, ruff, mypy, pytest-quick).

Execute targeted deployment tests:

pytest tests/deployment/ -k "health or orchestrate"

Build and push container via orchestrator script with --dry-run in CI to validate commands.

Runbooks & Architecture¶

DEPLOYMENT_RUNBOOK.md – step-by-step promotion and rollback procedure.
deployment_architecture.md – infrastructure overview and scaling guidance.

Environment Matrix¶

Environment	Purpose	Notes
Development	Local iteration via docker compose	Uses CPU runtime, minimal replicas
Staging	Pre-production parity	Enable readiness probes and autoscaling
Production	Customer traffic	GPU runtime, SLO monitoring, incident response on-call

Smoke Tests¶

After deploying, validate endpoints:

pytest tests/deployment/test_api_integration.py -k health
curl https://localhost:8000/health

Secrets Management¶

Inject API keys and model credentials via Kubernetes Secrets. Avoid hard-coding values in values.yaml; reference environment variables instead.