2026-06-19-kark-p90

Karkkainen _fork p90 — is the slow p90 the q='*' query? (2026-06-19)

Question (Pandu): "kark p90 is a bit slow… is the tail latency due to the * query?" Panel in question: the Grafana per-index p90 panel (AMP-backed marqo_request_duration histogram), live today.

Diagnosis (one sentence)

The q='*' query is responsible for the dramatic p90 spikes (1.4–1.9 s) you see in the quiet/overnight hours — but only because of a percentile-dilution effect (a roughly constant trickle of expensive * queries ÷ thin off-peak traffic = a big fraction that crosses the top-10% line); during busy Finnish business hours the p90 (~370 ms) is NOT the * tail — there the * queries are diluted to ~1% (they set p99 ≈ 1 s, not p90), and that ~370 ms is the normal query distribution, which has separately drifted up ~2–2.4× since 06-15 (138 → 335 ms) as traffic grew ~37%.

TL;DR — the answer depends on which p90 you're looking at

┌─────────────────────────────────────────────────────────┬──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│             What you're seeing on the panel             │   Is it the *    │                                                                   Why                                                                    │
│                                                         │      query?      │                                                                                                                                          │
├─────────────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ p90 spiking to 1.4–1.9 s (overnight /                   │ Yes              │ At 20–35 req/min the steady */prefix slow-query rate is 10–15% of all traffic → crosses the 90th-pct line → is p90. Killing the * shape  │
│ early-Finnish-morning)                                  │                  │ collapses these.                                                                                                                         │
├─────────────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ p90 ~370 ms, steady, business hours                     │ No               │ At 470–550 req/min the * queries are only ~0.9% → they sit at p99 (~1 s). p90 here is the bulk/"normal heavy" HYBRID-lexical cost.       │
├─────────────────────────────────────────────────────────┼──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ p90 ~140 → 335 ms creeping up over the last few days    │ No               │ Bulk distribution rising with load (traffic +37%, corpus flat). A capacity/concurrency story, not a query-shape one.                     │
└─────────────────────────────────────────────────────────┴──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

So: your instinct is right for the visible spikes (those are *), but the * query is not what sets p90 during the hours that matter for customers (business hours), and it does not explain the gradual creep. Fixing * flattens the overnight spikes and always helps p99; it won't move the business-hours p90 much.

Scope

┌─────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                 │                                                                                                                      │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index           │ karkkainen-prod-000_fork — ns 7d48ceb19a3cb04196566c0f9b8b3756, cell2 (the live index; prod-000 alias → fork)        │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ State vs 06-15  │ No cutover. Still the fork. Change: marqo API node count 2 → 5 (autoscaled up under load). Corpus flat at 895k docs. │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ prod-000 itself │ idle — p90 ~50–75 ms, near-zero traffic. Not your concern.                                                           │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Window          │ live, 24 h to ~11:16 am AEST (2026-06-19T01:16 UTC). Finnish local = UTC+3 (EEST).                                   │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Cost            │ $0.0012 Athena (1 query, 231 MB) + ~free AMP.                                                                        │
└─────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Evidence 1 — the p90 panel is inversely correlated with traffic (the tell)

p90 is worst when traffic is lowest — the opposite of a saturation problem:

┌──────────┬─────────┬─────────┬─────┬──────┬──────┬──────┬─────────┐
│ hour UTC │ Finland │ req/min │ p50 │ p90  │ p95  │ p99  │ 5xx/4xx │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 02:00    │ 05h     │      17 │ 196 │ 1863 │ 2181 │ 2436 │       0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 03:00    │ 06h     │      37 │ 226 │ 1831 │ 2165 │ 5650 │       0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 04:00    │ 07h     │      46 │ 218 │ 1891 │ 2196 │ 3675 │       0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 07:00    │ 10h     │     554 │  73 │  360 │  469 │ 1062 │       0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 08:00    │ 11h     │     624 │  76 │  361 │  475 │ 1248 │       0 │
├──────────┼─────────┼─────────┼─────┼──────┼──────┼──────┼─────────┤
│ 10:00    │ 13h     │     599 │  89 │  385 │  470 │  988 │       0 │
└──────────┴─────────┴─────────┴─────┴──────┴──────┴──────┴─────────┘

Zero 5xx/504/4xx all day — purely a latency-shape question, no errors/saturation cliff.

Evidence 2 — the mechanism: % of requests over 1 s tracks inversely with volume

The >1 s band (the q='*' slow queries) is a near-constant rate, so its fraction — and thus its grip on p90 — explodes when traffic thins out:

┌──────────┬─────────┬─────────┬─────────┬─────────┬───────┬─────────────────────────────────────────────────┐
│ hour UTC │ Finland │ req/min │ %>250ms │ %>500ms │ %>1s  │                  resulting p90                  │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 03:00    │ 06h     │      21 │   29.6% │   13.5% │ 12.4% │ ~1.8 s (>10% are >1s → p90 is in the slow tail) │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 04:00    │ 07h     │      24 │   29.7% │   18.5% │ 15.5% │                                          ~1.9 s │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 08:00    │ 11h     │     539 │   13.9% │    3.0% │  1.0% │  ~360 ms (90th pct lands in the 250–500ms band) │
├──────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────────────────────────────────────────────┤
│ 10:00    │ 13h     │     550 │   15.3% │    2.6% │  0.8% │                                         ~385 ms │
└──────────┴─────────┴─────────┴─────────┴─────────┴───────┴─────────────────────────────────────────────────┘

This is the per-index-panel cousin of the documented "worst-host" artifact: a rare expensive query dominates a thin sample window.

The same thing as a latency-band split (where does the 90th-percentile line fall?):

┌───────────────────────────────────┬────────────────────────────────────┬─────────────────────────────────────┐
│           latency band            │         BUSY hr (538/min)          │          QUIET hr (20/min)          │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ ≤250 ms (cumulative)              │                              86.2% │                               70.8% │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ ≤500 ms (cumulative)              │                              96.9% │                               86.3% │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ 1000–2500 ms band (the * queries) │                               1.0% │                               12.0% │
├───────────────────────────────────┼────────────────────────────────────┼─────────────────────────────────────┤
│ → 90th-pct line lands in…         │ the 250–500 ms band → p90 ≈ 360 ms │ the 1000–2500 ms band → p90 ≈ 1.8 s │
└───────────────────────────────────┴────────────────────────────────────┴─────────────────────────────────────┘

Identical absolute query mix; only the denominator (total traffic) changed — which is the whole point.

Evidence 3 — the slow tail is the same q='*' pathology (Athena, day 18, 3,395 slow bodies)

┌───────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────┐
│             Attribute             │                                       Value                                        │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ q='*'                             │ 2,789 / 3,395 = 82% (rest = single-letter prefix autocomplete: "Pa","M","T","Ja"…) │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ trackTotalHits=True               │ 3,395 / 3,395 (100%)                                                               │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ has collapse                      │ 3,395 / 3,395 (100%)                                                               │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ limit:1 (pure count-only queries) │ 2,067 — pay a full-corpus count + collapse to return 1 doc                         │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ slow-query latency                │ p50 1.46 s, p90 1.72 s, p99 3.54 s, max 5.1 s                                      │
├───────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ pod spread                        │ across all 5 active marqo pods — not a single bad pod                              │
└───────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┘

Identical shape to 06-15 and the Kogan pathology: q='*' HYBRID-lexical + scoreModifiersLexical + 12 searchable attrs + collapse + trackTotalHits. The limit:1 count-only queries (61% of the slow set) are the most wasteful — they exist only to populate a category count in the UI.

Evidence 4 — no new infra problem; the bulk distribution drifted up with load

5-day business-hours-only p90 (same AMP metric; 06-14 & 06-16 excluded — only ~2–4 req/min, unreliable):

┌────────────┬───────────────┬─────┬────────────────────────────────────────────────────────────────────┐
│    date    │ req/min (med) │ p50 │                             p90 (med)                              │
├────────────┼───────────────┼─────┼────────────────────────────────────────────────────────────────────┤
│ 2026-06-15 │           312 │  47 │ 138 ← matches the 06-15 HAProxy Tt p90 of 142 ms (cross-validates) │
├────────────┼───────────────┼─────┼────────────────────────────────────────────────────────────────────┤
│ 2026-06-17 │           380 │  45 │                                                                192 │
├────────────┼───────────────┼─────┼────────────────────────────────────────────────────────────────────┤
│ 2026-06-18 │           428 │  70 │                                                                335 │
└────────────┴───────────────┴─────┴────────────────────────────────────────────────────────────────────┘

Corpus flat (893k → 895k, +0.2%) → the rise is not data growth.
Live drivers healthy (quiet-time): content CPU 28%, marqo→Vespa p95 75 ms, inflight 2, Vespa's own query_latency_99percentile ~290 ms.
So the business-hours p90 doubling tracks rising load (+37% traffic; API tier autoscaled 2→5). 3-point trend — worth confirming it holds — but it's the part most likely behind "a bit slow," and it is not the * tail.

Recommended actions

Kill / cap the q='*' + trackTotalHits pathology (frontend). Especially the 2,067 limit:1 count-only queries — drop or cache trackTotalHits on count queries; paginate the limit:60/100 dumps; trim the 12 searchableAttributesLexical. Effect: overnight p90 spikes (1.4–1.9 s) collapse toward the busy-hour baseline, and p99 drops everywhere. (Same rec as 06-15 — still un-actioned; 3,395 slow bodies on day 18 confirms it's live.)
For the business-hours p90 creep (138→335 ms), treat it as capacity, not query shape. Watch marqo_vespa_semaphore_wait + content CPU on the 1-min Max during peak; the API tier already scaled 2→5. The * fix won't move it.
Set the panel expectation: the per-index p90 panel is genuinely misleading off-peak (thin-traffic artifact). For a number that reflects customer pain, read p90 weighted to business hours (or just watch p99, where the * shape lives consistently), not the 24 h panel max.

Caveats

Assumes the "p90 panel" = histogram_quantile(0.9, marqo_request_duration…{path="/indexes//search"}) — the only metric that can yield a per-index p90 (raw Vespa exposes only query_latency_95/99percentile).
06-15's "142 ms" was HAProxy Tt; the apples-to-apples AMP p90 for 06-15 is 138 ms, so the comparison holds.
Business-hours trend is 3 reliable days; low-traffic days excluded (same percentile artifact).
marqo only logs slow (>~1 s) bodies, so the shape census is the >1 s tail; sub-1 s * queries are covered by the histogram bands.