Skip to main content

Runbook: kserve_inference_error_rate

This runbook covers steps to investigate and remediate high inference error rate alerts for KServe models. The alert fires when the error rate exceeds 5% for an inference service.

Steps

1. Verify error rate metrics

Open the KServe Inference dashboard and check the error rate for the affected inference service:

https://g-3d216b3ddc.grafana-workspace.us-east-1.amazonaws.com/d/kserve_inference_dashboard/kserve-inference?orgId=1&refresh=30s

Look at the nv_inference_request_failure and nv_inference_request_success metrics to understand the scope.

2. Get read-only credentials for prod-cell-1

Log in to IAM Identity Center and copy read-only credentials for the prod-cell-1 account:

https://d-9067a2ad56.awsapps.com/start/#/?tab=accounts

3. Identify the affected predictor pods

Use kubectl to find the kserve-predictor pods for the affected inference service:

kubectl get pods -n kserve-models | grep <inference-service-name>

Check pod logs for errors:

kubectl logs -n kserve-models <pod-name> --tail=200

4. Get admin permissions via Escalator

Request admin access through Escalator:

https://escalator.marqo-staging.com/

5. Copy admin credentials to local terminal

Copy the admin credentials from Escalator and export them in your terminal.

6. Get EKS cluster credentials

aws eks update-kubeconfig --region us-east-1 --name cell2-MultitenantEKSCluster

7. Remediate

Depending on the root cause:

  • If pods are unhealthy or OOMKilled: restart the affected pods:

    kubectl delete pod <pod-name> -n kserve-models
  • If the model is overloaded: update the KEDA scaling object to increase replicas:

    kubectl get scaledobjects -n kserve-models
    kubectl edit scaledobject <scaled-object-name> -n kserve-models

    Increase minReplicaCount and/or maxReplicaCount as needed.