Skip to content

Production Deployment Runbook

Prerequisites

  • Kubernetes cluster with GPU node pool available
  • Helm 3.x installed locally
  • Docker registry credentials configured via docker login

Deployment Steps

  1. Build image:
    ./scripts/deploy/orchestrate.sh build --gpu
    
  2. Push to registry:
    ./scripts/deploy/orchestrate.sh push
    
  3. Deploy Helm chart:
    helm upgrade --install codex deploy/helm/
    
  4. Verify health probes:
    kubectl get pods -l app=codex
    
  5. Run smoke tests:
    ./scripts/deploy/orchestrate.sh run --dry-run
    pytest tests/deployment/test_api_integration.py -k health
    

Rollback Procedure

  1. Identify last known good chart version using helm history codex.
  2. Roll back:
    helm rollback codex <revision>
    
  3. Validate /ready endpoint until status returns 200.

Observability

Metric SLO Monitoring
Availability 99.9% Prometheus + Alertmanager
Latency p95 < 200ms Grafana dashboards
Error Rate < 0.1% Sentry alerts

Incident Response

Escalation and communication steps are documented in docs/security/incident_response.md.