Runbook: kserve_inference_error_rate
This runbook covers steps to investigate and remediate high inference error rate alerts for KServe models. The alert fires when the error rate exceeds 5% for an inference service.
Steps
1. Verify error rate metrics
Open the KServe Inference dashboard and check the error rate for the affected inference service:
Look at the nv_inference_request_failure and nv_inference_request_success metrics to understand the scope.
2. Get read-only credentials for prod-cell-1
Log in to IAM Identity Center and copy read-only credentials for the prod-cell-1 account:
https://d-9067a2ad56.awsapps.com/start/#/?tab=accounts
3. Identify the affected predictor pods
Use kubectl to find the kserve-predictor pods for the affected inference service:
kubectl get pods -n kserve-models | grep <inference-service-name>
Check pod logs for errors:
kubectl logs -n kserve-models <pod-name> --tail=200
4. Get admin permissions via Escalator
Request admin access through Escalator:
https://escalator.marqo-staging.com/
5. Copy admin credentials to local terminal
Copy the admin credentials from Escalator and export them in your terminal.
6. Get EKS cluster credentials
aws eks update-kubeconfig --region us-east-1 --name cell2-MultitenantEKSCluster
7. Remediate
Depending on the root cause:
-
If pods are unhealthy or OOMKilled: restart the affected pods:
kubectl delete pod <pod-name> -n kserve-models -
If the model is overloaded: update the KEDA scaling object to increase replicas:
kubectl get scaledobjects -n kserve-modelskubectl edit scaledobject <scaled-object-name> -n kserve-modelsIncrease
minReplicaCountand/ormaxReplicaCountas needed.