Runbook: Migrate EKS Control Plane Log Group to CloudWatch Logs Infrequent Access
This runbook covers migrating the EKS control plane log group to the CloudWatch Logs INFREQUENT_ACCESS class, reducing ingestion cost by ~50% ($0.50/GB → $0.25/GB).
log_group_class is immutable — the existing log group must be deleted and recreated. The AWS provider upgrade to v6.41.0 (included in this PR) fixes a bug where log_group_class was silently ignored on creation, so Terraform will now correctly create the log group with INFREQUENT_ACCESS.
Scope: prod2 is the only environment with logging enabled (
api,audit,controllerManager) and requires an S3 export before deletion. All other environments (prod1,staging,preprod,dev) have logging disabled — no export needed.
Prerequisites
- devadmin AWS credentials
- OpenTofu installed (
brew install opentofu) - This PR merged
Steps
1. Migrate preprod and staging
Both environments have logging disabled so their log groups are empty — safe to delete and let Terraform recreate with INFREQUENT_ACCESS.
preprod
aws logs delete-log-group \
--log-group-name /aws/eks/cell1-preprod-MultitenantEKSCluster/cluster \
--region us-east-1
cd infra/multitenant_eks_cluster
./scripts/deploy.sh --env preprod --target aws_cloudwatch_log_group.eks_cluster
# Verify
aws logs describe-log-groups \
--log-group-name-prefix /aws/eks/cell1-preprod-MultitenantEKSCluster/cluster \
--region us-east-1 \
--query 'logGroups[0].logGroupClass'
# expected: "INFREQUENT_ACCESS"
staging
aws logs delete-log-group \
--log-group-name /aws/eks/cell1-staging-MultitenantEKSCluster/cluster \
--region us-east-1
./scripts/deploy.sh --env staging --target aws_cloudwatch_log_group.eks_cluster
# Verify
aws logs describe-log-groups \
--log-group-name-prefix /aws/eks/cell1-staging-MultitenantEKSCluster/cluster \
--region us-east-1 \
--query 'logGroups[0].logGroupClass'
# expected: "INFREQUENT_ACCESS"
2. Create the export S3 bucket (prod2)
Only needed once. The bucket requires a policy allowing CloudWatch Logs to write to it.
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
BUCKET="cell2-eks-logs-migration"
aws s3api create-bucket --bucket $BUCKET --region us-east-1
aws s3api put-bucket-policy --bucket $BUCKET --policy "{
\"Version\": \"2012-10-17\",
\"Statement\": [
{
\"Effect\": \"Allow\",
\"Principal\": { \"Service\": \"logs.us-east-1.amazonaws.com\" },
\"Action\": \"s3:GetBucketAcl\",
\"Resource\": \"arn:aws:s3:::${BUCKET}\",
\"Condition\": { \"StringEquals\": { \"aws:SourceAccount\": \"${ACCOUNT_ID}\" } }
},
{
\"Effect\": \"Allow\",
\"Principal\": { \"Service\": \"logs.us-east-1.amazonaws.com\" },
\"Action\": \"s3:PutObject\",
\"Resource\": \"arn:aws:s3:::${BUCKET}/eks-control-plane-logs/*\",
\"Condition\": { \"StringEquals\": { \"s3:x-amz-acl\": \"bucket-owner-full-control\", \"aws:SourceAccount\": \"${ACCOUNT_ID}\" } }
}
]
}"
3. Submit the export task
prod2 has 60-day log retention. Submit the task and exit — AWS processes it asynchronously (can take up to 12h). Only one export task can run at a time per AWS account.
FROM_MS=$(date -d '60 days ago' +%s%3N)
TO_MS=$(date +%s%3N)
aws logs create-export-task \
--log-group-name /aws/eks/cell2-MultitenantEKSCluster/cluster \
--from $FROM_MS \
--to $TO_MS \
--destination $BUCKET \
--destination-prefix eks-control-plane-logs \
--region us-east-1
Note the taskId from the output.
4. Check export status
Come back later and check — no need to wait:
aws logs describe-export-tasks --task-id <taskId> --region us-east-1 \
--query 'exportTasks[0].status.code'
Do not proceed until the status is COMPLETED.
5. Migrate prod2
⚠️ Expected log loss: ~1-3 minutes. There is an unavoidable gap between deleting the log group and Terraform recreating it. EKS control plane logs written during this window will be dropped. This is acceptable since these logs are only accessed for audits/incidents, not real-time monitoring.
aws logs delete-log-group \
--log-group-name /aws/eks/cell2-MultitenantEKSCluster/cluster \
--region us-east-1
cd infra/multitenant_eks_cluster
./scripts/deploy.sh --env prod2 --target aws_cloudwatch_log_group.eks_cluster
6. Verify
aws logs describe-log-groups \
--log-group-name-prefix /aws/eks/cell2-MultitenantEKSCluster/cluster \
--region us-east-1 \
--query 'logGroups[0].logGroupClass'
# expected: "INFREQUENT_ACCESS"