Configure pagination stability for an index
Context
Marqo's disjunction (RRF / hybrid) search has buggy native pagination: because each
page re-ranks a differently-sized candidate pool — and native offset is itself
unreliable — customers can see duplicate documents and missing documents as
they page through results.
The search proxy ships a client-side workaround that fetches a larger window and
slices it locally. It is tuned per-index through a paginationConfig block inside the
index's search_config. This runbook is for solution architects who need to tune that
value for a customer themselves.
This is a temporary workaround. It will be removed once Marqo provides stable RRF pagination natively. The implementation lives in
components/search_proxy/src/search.ts(performMarqoSearch).
How it works
On every search request the proxy computes the 1-based page index from the request's
offset/limit:
pageIndex = floor(offset / limit) + 1
The workaround only engages when the retrieval method is disjunction (the default
when hybridParameters.retrievalMethod is unset). Two knobs control it:
| Knob | Default | Behaviour |
|---|---|---|
strongStablePageSize | 1 | Pages 1..strong share one candidate pool of strong * limit docs. The proxy sends offset=0, limit=strong*limit to Marqo and slices the page out locally. Guarantees zero duplicate/missing docs across these pages. Cost: every page in this range pays to retrieve the full pool. |
weakStablePageSize | 5 | Pages strong+1..weak each retrieve their own page * limit pool and discard the leading (page-1) * limit docs locally. Cheaper (only the current page's pool is fetched), but duplicates can still leak between adjacent pages. |
Pages beyond weakStablePageSize fall through to Marqo's native pagination (no
workaround).
The defaults (1, 5) reproduce historical behaviour and are the right starting point.
Tuning guidance
- If a customer reports duplicate docs across pages and their
page_sizeis small, nudgestrongStablePageSizeup modestly (e.g.3–5). This makes the first N pages duplicate-free. - Avoid raising
strongStablePageSizefor customers using large page sizes — the retrieved pool isstrong * limitdocs on every page in the strong range, so a largelimitmultiplied by a largestrongretrieves a very large pool on each request and hurts latency. - Keep the total retrieved docs modest.
weakStablePageSizecan be set somewhat larger thanstrongbecause it only fetches the current page's pool, but it does not eliminate duplicates.
Scope: one index-level setting governs all search profiles
paginationConfig is read only from the index's top-level search_config. It is
not read from search profiles or per-query config overrides.
- A
paginationConfigset on the index defaultsearch_configapplies to every search on that index — including requests that carry aprofileId. ✅ - A
paginationConfigplaced inside a search profile is ignored. ❌
So there is no per-profile pagination override: configure it once on the index's
search_config and it applies universally.
Operation
The source of truth is the DynamoDB table prod-EcomIndexSettingsTable. Records are
keyed by pk=<system_account_id> and sk=INDEX#<index_name>. The paginationConfig
block is added under the search_config field. Changes propagate automatically to
Cloudflare KV (where the search proxy reads them) via the settings exporter Lambda — see
Settings Sync flow.
1. Get prod admin access
Use Escalator: Self-Service Admin to gain admin permission to the production Controller account ("Prod Controller Admin"), then sign into the prod Controller account as admin.
2. Find the index record
If you don't already know the index's system_account_id, look it up in
Ecommerce Customer Details.
Open the prod-EcomIndexSettingsTable item explorer and query with:
pk=<system_account_id>sk=INDEX#<index_name>
Or from the CLI:
aws dynamodb get-item --table-name prod-EcomIndexSettingsTable \
--key '{"pk": {"S": "<system_account_id>"}, "sk": {"S": "INDEX#<index_name>"}}'
3. Add paginationConfig to search_config
Click the record to open it for editing. Inside the search_config field, add a
paginationConfig object. For example, to make the first 5 pages duplicate-free:
{
"search_config": {
"...existing keys...": "...",
"paginationConfig": {
"strongStablePageSize": 5,
"weakStablePageSize": 10
}
}
}
Notes:
- Add
paginationConfigalongside the existingsearch_configkeys — do not replace the whole object. To avoid mistakes on a large config, copy the existing JSON into a diff tool (e.g. diffchecker, "JSON view" with "View DynamoDB JSON" disabled), add the block, and confirm the only change is the newpaginationConfigkey before pasting back. - Both values must be positive integers. Omitting the block (or either key) falls back to
the defaults
strongStablePageSize=1,weakStablePageSize=5. paginationConfigis a proxy-internal field. It is stripped from the request before it reaches Marqo, so it cannot interfere with the Marqo query.
Click Save to commit.
4. Verify the change reached prod
The settings exporter Lambda picks up the DDB stream change (up to a ~10s batch window) and writes it to Cloudflare KV. To confirm:
- Check the value in the
prod-search-proxy-kv
KV store (key
<system_account_id>-<index_name>, or the shop_id) — thesearch_config.paginationConfigblock should be present. - Run a paged search (
offset > 0) for the customer and confirm duplicates/missing docs are gone for the pages withinstrongStablePageSize.
If the change is correct in DDB but not in KV, re-export manually by invoking the
exporter Lambda with {"system_account_id": "...", "index_name": "..."} — see the
Settings Sync flow for details.
Related docs
- Edit ecommerce index settings — general procedure for editing this table
- Settings Sync flow — how DDB changes reach the search proxy
- Search Proxy component