Deployment Architecture¶
High-Level Components¶
- Ingress Controller – Routes HTTPS traffic to the Codex API service.
- Codex API Deployment – FastAPI application scaled via Kubernetes Deployment.
- Model Artifact Storage – Mounted volume or object store containing model weights.
- Observability Stack – Prometheus + Grafana + Loki for metrics and logs.
Data Flow¶
- Client sends request to ingress.
- Request forwarded to Codex API pod where security middleware validates payload.
- API interacts with model runtime and returns response through ingress.
- Logs and metrics emitted to monitoring stack.
Scaling Strategy¶
replicaCount: 3ensures baseline redundancy.- Horizontal Pod Autoscaler scales between 2 and 10 pods at 70% CPU utilization.
- GPU workloads scheduled via
nvidia.com/gpuresource limits defined invalues.yaml.
Health & Readiness¶
/healthliveness probe checks runtime heartbeat./readyreadiness probe verifies model availability and dependency checks.
Deployment Workflow¶
- Build multi-stage Docker image (CPU or GPU runtime).
- Push to registry with semantic tag (e.g.,
1.0.0). - Deploy Helm chart with environment-specific values.
- Monitor pods via
kubectl, Prometheus alerts, and run smoke tests.
Secrets & Configuration¶
- Secrets injected via Kubernetes Secret objects referenced in Helm values.
- Rate limits and API keys configured through environment variables.
Diagram (Textual)¶
text
Client -> Ingress -> Codex API Service -> Model Runtime
\-> Observability Stacktext