Runbook: kserve_scaling_capacity
This runbook covers steps to investigate and remediate scaling capacity alerts for KServe inference workloads.
Steps
1. Verify scaling metrics
Open the KServe Inference dashboard and check scaling metrics for the affected models:
2. Get admin permissions via Escalator
Request admin access through Escalator:
https://escalator.marqo-staging.com/
3. Copy admin credentials to local terminal
Copy the admin credentials from Escalator and export them in your terminal.
4. Get EKS cluster credentials
aws eks update-kubeconfig --region us-east-1 --name cell2-MultitenantEKSCluster
5. Scale up the affected models
Update the KEDA scaling object for the affected models to increase the min/max replicas:
# List KEDA scaled objects to find the one for the affected model
kubectl get scaledobjects -A
# Edit the scaled object to increase min/max replicas
kubectl edit scaledobject <scaled-object-name> -n <namespace>
Increase minReplicaCount and/or maxReplicaCount as needed to handle the load.