2026-06-19-kark-p90
Karkkainen _fork p90 — is the slow p90 the q='*' query? (2026-06-19)
Question (Pandu): "kark p90 is a bit slow… is the tail latency due to the * query?" Panel in question: the Grafana per-index p90 panel (AMP-backed marqo_request_duration histogram), live today.
Diagnosis (one sentence)
The q='*' query is responsible for the dramatic p90 spikes (1.4–1.9 s) you see in the quiet/overnight hours — but only because of a percentile-dilution effect (a roughly constant trickle of expensive * queries ÷ thin off-peak traffic = a big fraction that crosses the top-10% line); during busy Finnish business hours the p90 (~370 ms) is NOT the * tail — there the * queries are diluted to ~1% (they set p99 ≈ 1 s, not p90), and that ~370 ms is the normal query distribution, which has separately drifted up ~2–2.4× since 06-15 (138 → 335 ms) as traffic grew ~37%.
TL;DR — the answer depends on which p90 you're looking at
┌─────────────────────────────────────────────────────────┬──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ What you're seeing on the panel │ Is it the * │ Why │
│ │ query? │ │
├─────────────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ p90 spiking to 1.4–1.9 s (overnight / │ Yes │ At 20–35 req/min the steady */prefix slow-query rate is 10–15% of all traffic → crosses the 90th-pct line → is p90. Killing the * shape │
│ early-Finnish-morning) │ │ collapses these. │
├─────────────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ p90 ~370 ms, steady, business hours │ No │ At 470–550 req/min the * queries are only ~0.9% → they sit at p99 (~1 s). p90 here is the bulk/"normal heavy" HYBRID-lexical cost. │
├─────────────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ p90 ~140 → 335 ms creeping up over the last few days │ No │ Bulk distribution rising with load (traffic +37%, corpus flat). A capacity/concurrency story, not a query-shape one. │
└─────────────────────────────────────────────────────────┴──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
So: your instinct is right for the visible spikes (those are *), but the * query is not what sets p90 during the hours that matter for customers (business hours), and it does not explain the gradual creep. Fixing * flattens the overnight spikes and always helps p99; it won't move the business-hours p90 much.
Scope
┌─────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index │ karkkainen-prod-000_fork — ns 7d48ceb19a3cb04196566c0f9b8b3756, cell2 (the live index; prod-000 alias → fork) │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ State vs 06-15 │ No cutover. Still the fork. Change: marqo API node count 2 → 5 (autoscaled up under load). Corpus flat at 895k docs. │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ prod-000 itself │ idle — p90 ~50–75 ms, near-zero traffic. Not your concern. │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Window │ live, 24 h to ~11:16 am AEST (2026-06-19T01:16 UTC). Finnish local = UTC+3 (EEST). │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Cost │ $0.0012 Athena (1 query, 231 MB) + ~free AMP. │
└─────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Evidence 1 — the p90 panel is inversely correlated with traffic (the tell)
p90 is worst when traffic is lowest — the opposite of a saturation problem:
┌──────────┬─────────┬─────────┬─────┬──────┬──────┬──────┬─────────┐
│ hour UTC │ Finland │ req/min │ p50 │ p90 │ p95 │ p99 │ 5xx/4xx │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 02:00 │ 05h │ 17 │ 196 │ 1863 │ 2181 │ 2436 │ 0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 03:00 │ 06h │ 37 │ 226 │ 1831 │ 2165 │ 5650 │ 0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 04:00 │ 07h │ 46 │ 218 │ 1891 │ 2196 │ 3675 │ 0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 07:00 │ 10h │ 554 │ 73 │ 360 │ 469 │ 1062 │ 0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 08:00 │ 11h │ 624 │ 76 │ 361 │ 475 │ 1248 │ 0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 10:00 │ 13h │ 599 │ 89 │ 385 │ 470 │ 988 │ 0 │
└──────────┴─────────┴─────────┴─────┴──────┴──────┴──────┴─────────┘
Zero 5xx/504/4xx all day — purely a latency-shape question, no errors/saturation cliff.
Evidence 2 — the mechanism: % of requests over 1 s tracks inversely with volume
The >1 s band (the q='*' slow queries) is a near-constant rate, so its fraction — and thus its grip on p90 — explodes when traffic thins out:
┌──────────┬─────────┬─────────┬─────────┬─────────┬───────┬─────────────────────────────────────────────────┐
│ hour UTC │ Finland │ req/min │ %>250ms │ %>500ms │ %>1s │ resulting p90 │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 03:00 │ 06h │ 21 │ 29.6% │ 13.5% │ 12.4% │ ~1.8 s (>10% are >1s → p90 is in the slow tail) │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 04:00 │ 07h │ 24 │ 29.7% │ 18.5% │ 15.5% │ ~1.9 s │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 08:00 │ 11h │ 539 │ 13.9% │ 3.0% │ 1.0% │ ~360 ms (90th pct lands in the 250–500ms band) │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 10:00 │ 13h │ 550 │ 15.3% │ 2.6% │ 0.8% │ ~385 ms │
└──────────┴─────────┴─────────┴─────────┴─────────┴───────┴─────────────────────────────────────────────────┘
This is the per-index-panel cousin of the documented "worst-host" artifact: a rare expensive query dominates a thin sample window.
The same thing as a latency-band split (where does the 90th-percentile line fall?):
┌───────────────────────────────────┬────────────────────────────────────┬─────────────────────────────────────┐
│ latency band │ BUSY hr (538/min) │ QUIET hr (20/min) │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ ≤250 ms (cumulative) │ 86.2% │ 70.8% │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ ≤500 ms (cumulative) │ 96.9% │ 86.3% │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ 1000–2500 ms band (the * queries) │ 1.0% │ 12.0% │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ → 90th-pct line lands in… │ the 250–500 ms band → p90 ≈ 360 ms │ the 1000–2500 ms band → p90 ≈ 1.8 s │
└───────────────────────────────────┴────────────────────────────────────┴─────────────────────────────────────┘
Identical absolute query mix; only the denominator (total traffic) changed — which is the whole point.
Evidence 3 — the slow tail is the same q='*' pathology (Athena, day 18, 3,395 slow bodies)
┌───────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────┐
│ Attribute │ Value │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ q='*' │ 2,789 / 3,395 = 82% (rest = single-letter prefix autocomplete: "Pa","M","T","Ja"…) │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ trackTotalHits=True │ 3,395 / 3,395 (100%) │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ has collapse │ 3,395 / 3,395 (100%) │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ limit:1 (pure count-only queries) │ 2,067 — pay a full-corpus count + collapse to return 1 doc │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ slow-query latency │ p50 1.46 s, p90 1.72 s, p99 3.54 s, max 5.1 s │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ pod spread │ across all 5 active marqo pods — not a single bad pod │
└───────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┘
Identical shape to 06-15 and the Kogan pathology: q='*' HYBRID-lexical + scoreModifiersLexical + 12 searchable attrs + collapse + trackTotalHits. The limit:1 count-only queries (61% of the slow set) are the most wasteful — they exist only to populate a category count in the UI.
Evidence 4 — no new infra problem; the bulk distribution drifted up with load
5-day business-hours-only p90 (same AMP metric; 06-14 & 06-16 excluded — only ~2–4 req/min, unreliable):
┌────────────┬───────────────┬─────┬────────────────────────────────────────────────────────────────────┐
│ date │ req/min (med) │ p50 │ p90 (med) │
├────────────┼───────────────┼─────┼────────────────────────────────────────────────────────────────────┤
│ 2026-06-15 │ 312 │ 47 │ 138 ← matches the 06-15 HAProxy Tt p90 of 142 ms (cross-validates) │
├────────────┼───────────────┼─────┼────────────────────────────────────────────────────────────────────┤
│ 2026-06-17 │ 380 │ 45 │ 192 │
├────────────┼───────────────┼─────┼────────────────────────────────────────────────────────────────────┤
│ 2026-06-18 │ 428 │ 70 │ 335 │
└────────────┴───────────────┴─────┴────────────────────────────────────────────────────────────────────┘
- Corpus flat (893k → 895k, +0.2%) → the rise is not data growth.
- Live drivers healthy (quiet-time): content CPU 28%, marqo→Vespa p95 75 ms, inflight 2, Vespa's own query_latency_99percentile ~290 ms.
- So the business-hours p90 doubling tracks rising load (+37% traffic; API tier autoscaled 2→5). 3-point trend — worth confirming it holds — but it's the part most likely behind "a bit slow," and it is not the * tail.
Recommended actions
- Kill / cap the q='*' + trackTotalHits pathology (frontend). Especially the 2,067 limit:1 count-only queries — drop or cache trackTotalHits on count queries; paginate the limit:60/100 dumps; trim the 12 searchableAttributesLexical. Effect: overnight p90 spikes (1.4–1.9 s) collapse toward the busy-hour baseline, and p99 drops everywhere. (Same rec as 06-15 — still un-actioned; 3,395 slow bodies on day 18 confirms it's live.)
- For the business-hours p90 creep (138→335 ms), treat it as capacity, not query shape. Watch marqo_vespa_semaphore_wait + content CPU on the 1-min Max during peak; the API tier already scaled 2→5. The * fix won't move it.
- Set the panel expectation: the per-index p90 panel is genuinely misleading off-peak (thin-traffic artifact). For a number that reflects customer pain, read p90 weighted to business hours (or just watch p99, where the * shape lives consistently), not the 24 h panel max.
Caveats
- Assumes the "p90 panel" = histogram_quantile(0.9, marqo_request_duration…{path="/indexes/
/search"}) — the only metric that can yield a per-index p90 (raw Vespa exposes only query_latency_95/99percentile). - 06-15's "142 ms" was HAProxy Tt; the apples-to-apples AMP p90 for 06-15 is 138 ms, so the comparison holds.
- Business-hours trend is 3 reliable days; low-traffic days excluded (same percentile artifact).
- marqo only logs slow (>~1 s) bodies, so the shape census is the >1 s tail; sub-1 s * queries are covered by the histogram bands.