Runbook: keda_scaler_errors
This runbook covers steps to investigate and remediate KEDA scaler errors. The alert fires when the scaler error count is greater than 0 for more than 5 minutes.
Impact
Autoscaling metrics collection may be impacted for the affected ScaledObject. The workload may not scale correctly in response to load changes if the scaler cannot fetch metrics.
Steps
1. Get admin permissions via Escalator
Request admin access through Escalator:
https://escalator.marqo-staging.com/
2. Copy admin credentials to local terminal
Copy the admin credentials from Escalator and export them in your terminal.
3. Get EKS cluster credentials
aws eks update-kubeconfig --region us-east-1 --name cell2-MultitenantEKSCluster
4. Identify the affected ScaledObject
The alert labels include the scaledObject and namespace. Check the ScaledObject status:
kubectl get scaledobject <scaled-object-name> -n <namespace> -o yaml
5. Check KEDA operator logs for scaler errors
kubectl logs -n keda -l app=keda-operator --tail=300 | grep -i "scaler.*error\|<scaled-object-name>"
Look for:
- Prometheus query errors (connection refused, query syntax)
- Metric not found errors
- Authentication/authorization errors against the metrics source
6. Verify the metrics source
If the scaler uses Prometheus, verify the Prometheus endpoint is reachable and the query returns data:
# Check if Prometheus is healthy
kubectl get pods -n prometheus
7. Check the ScaledObject trigger configuration
kubectl get scaledobject <scaled-object-name> -n <namespace> -o jsonpath='{.spec.triggers}' | jq .
Verify that:
- The metrics server address is correct
- The query/metric name is valid
- Authentication credentials are present and valid
8. Remediate
Depending on the root cause:
- If Prometheus is down: Refer to the
prometheus_server_unhealthyorprometheus_autoscaling_unhealthyrunbook. - If the query is invalid: Fix the ScaledObject trigger query.
- If transient connectivity: The scaler should recover automatically. Monitor to see if errors stop.
- If persistent: Restart the KEDA operator:
kubectl rollout restart deployment keda-operator -n keda