Runbook: auth_keys_refresh_failure
This runbook covers steps to investigate and remediate auth key refresh failures. The alert fires when there are 4 or more refresh failures within 3 minutes for an index.
Impact
If auth keys cannot be refreshed, the reverse proxy sidecar may serve stale or missing API keys, potentially causing authentication failures for end users.
Steps
1. Get admin permissions via Escalator
Request admin access through Escalator:
https://escalator.marqo-staging.com/
2. Copy admin credentials to local terminal
Copy the admin credentials from Escalator and export them in your terminal.
3. Get EKS cluster credentials
aws eks update-kubeconfig --region us-east-1 --name cell2-MultitenantEKSCluster
4. Identify the affected index
The alert labels include index_name and account_id. Find the reverse proxy pod for the affected index:
kubectl get pods -A | grep <index-name>
5. Check reverse proxy sidecar logs
kubectl logs <pod-name> -n <namespace> -c reverse-proxy --tail=200
Look for DynamoDB permission errors or connectivity issues related to auth key refresh.
6. Verify IAM permissions on the reverse proxy sidecar
Check if the IAM role assigned to the reverse proxy sidecar container has the correct permissions to read from DynamoDB:
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 5 serviceAccount
Verify the associated IAM role has DynamoDB read permissions (e.g., dynamodb:GetItem, dynamodb:Query) for the relevant auth keys table.
7. Check the ack-iam CRD
If the IAM role is missing permissions, inspect the ack-iam CRD that manages the role:
kubectl get roles.iam.services.k8s.aws -A
kubectl get policies.iam.services.k8s.aws -A
Find the relevant role/policy for the reverse proxy sidecar and check its policy document.
8. Remediate
If the role is missing DynamoDB read permissions:
- Update the ack-iam CRD to add the required DynamoDB permissions to the role.
- Apply the change:
kubectl apply -f <updated-role-or-policy.yaml>
- Verify the IAM role in AWS Console now has the correct policy attached.
- The reverse proxy sidecar should automatically pick up the updated credentials. If not, restart the pod:
kubectl delete pod <pod-name> -n <namespace>