Runbook: kserve_inference_error_rate

This runbook covers steps to investigate and remediate high inference error rate alerts for KServe models. The alert fires when the error rate exceeds 5% for an inference service.

Steps

1. Verify error rate metrics

Open the KServe Inference dashboard and check the error rate for the affected inference service:

https://g-3d216b3ddc.grafana-workspace.us-east-1.amazonaws.com/d/kserve_inference_dashboard/kserve-inference?orgId=1&refresh=30s

Look at the nv_inference_request_failure and nv_inference_request_success metrics to understand the scope.

2. Get read-only credentials for prod-cell-1

https://d-9067a2ad56.awsapps.com/start/#/?tab=accounts

3. Identify the affected predictor pods

Use kubectl to find the kserve-predictor pods for the affected inference service:

kubectl get pods -n kserve-models | grep <inference-service-name>

Check pod logs for errors:

kubectl logs -n kserve-models <pod-name> --tail=200

4. Get admin permissions via Escalator

Request admin access through Escalator:

https://escalator.marqo-staging.com/

5. Copy admin credentials to local terminal

Copy the admin credentials from Escalator and export them in your terminal.

6. Get EKS cluster credentials

aws eks update-kubeconfig --region us-east-1 --name cell2-MultitenantEKSCluster

7. Remediate

Depending on the root cause:

If pods are unhealthy or OOMKilled: restart the affected pods:
```
kubectl delete pod <pod-name> -n kserve-models
```
If the model is overloaded: update the KEDA scaling object to increase replicas:
```
kubectl get scaledobjects -n kserve-models
kubectl edit scaledobject <scaled-object-name> -n kserve-models
```
Increase minReplicaCount and/or maxReplicaCount as needed.

Steps​

1. Verify error rate metrics​

2. Get read-only credentials for prod-cell-1​

3. Identify the affected predictor pods​

4. Get admin permissions via Escalator​

5. Copy admin credentials to local terminal​

6. Get EKS cluster credentials​

7. Remediate​