ADR 020: EKS Provisioning and FinOps Strategy
Context
Hybrid GitOps in Phase 12 requires an AWS Kubernetes spoke that is:
- Operationally simple to provision and destroy.
- Cost-bounded for a lab environment.
- Safe enough to avoid accidental always-on spend.
This phase also introduces cloud workload continuity requirements (status-api-cloud) and secret management choices for multi-cluster Sealed Secrets.
Decision
EKS Provisioning Baseline
- Use a managed EKS control plane.
- Use one managed node group with
t3.mediumand scaling boundsmin=1,max=2. - Enable IAM OIDC provider for IRSA prerequisites.
- Expose cluster metadata outputs (endpoint, CA data, kubeconfig context alias) for ArgoCD registration.
FinOps Guardrails
- Define monthly AWS Budget in Terraform with SNS email notifications.
- Trigger alerts at 80% (warning) and 100% (critical) of the configured monthly cap.
- Keep EKS apply as a manual operator action (no CI auto-apply).
- Use lifecycle operations (
up/down) to support weekend-only runtime and reduce idle spend.
Budget Verification Gate (Mandatory)
Before the first EKS apply in any environment:
- Confirm the SNS subscription email is in
Confirmedstate. - Confirm both budget notifications are active (80% and 100%).
- Only then proceed with
tofu applyfor EKS resources.
Cost Model
Target operating profile is approximately 2.50 EUR per active day for a minimal footprint, with explicit caveats:
- Prices vary by region, control-plane billing, and network traffic.
- Weekend-only lifecycle is part of the model; always-on operation will exceed the target.
- Budget alerting is detection, not enforcement.
Sealed Secrets Key-Sharing Strategy
| Option | Description | Pros | Cons |
|---|---|---|---|
| A (Chosen) | Reuse on-prem Sealed Secrets master key in EKS | Fast migration, no immediate reseal campaign | Shared blast radius for compromise and rotation |
| B | Dedicated EKS key and reseal all secrets | Stronger isolation per cluster | Higher operational overhead and migration complexity |
We choose Option A for Phase 12 speed and simplicity, and document Option B as a future hardening path.
Operational key-sharing sequence for Option A:
kubectl -n kube-system get secret \
-l sealedsecrets.bitnami.com/sealed-secrets-key \
-o yaml > /tmp/sealed-secrets-master-key.yaml
kubectl --context=aws-eks-prod -n kube-system apply -f /tmp/sealed-secrets-master-key.yaml
Why EKS Apply Stays Manual
- Cost-sensitive infrastructure changes need explicit human confirmation after reviewing budget state.
- Cloud apply can create irreversible billing/resource side effects if misconfigured.
- Plan-only CI provides review visibility without accidental drift-triggered provisioning.
Consequences
Positive
- Cloud cluster provisioning is reproducible and reviewable.
- Budget notifications provide early warning before runaway spend.
- Manual apply preserves a human control gate for high-cost changes.
- Option A enables rapid cross-cluster secret portability in this phase.
Negative
- Budget alarms do not block spend automatically.
- Manual apply increases operational dependency on runbook discipline.
- Option A increases shared secret-key blast radius across environments.
- Weekend-only lifecycle can delay troubleshooting outside active windows.
Status
Accepted and implemented in Phase 12.