Skip to main content

Configure pagination stability for an index

Context

Marqo's disjunction (RRF / hybrid) search has buggy native pagination: because each page re-ranks a differently-sized candidate pool — and native offset is itself unreliable — customers can see duplicate documents and missing documents as they page through results.

The search proxy ships a client-side workaround that fetches a larger window and slices it locally. It is tuned per-index through a paginationConfig block inside the index's search_config. This runbook is for solution architects who need to tune that value for a customer themselves.

This is a temporary workaround. It will be removed once Marqo provides stable RRF pagination natively. The implementation lives in components/search_proxy/src/search.ts (performMarqoSearch).

How it works

On every search request the proxy computes the 1-based page index from the request's offset/limit:

pageIndex = floor(offset / limit) + 1

The workaround only engages when the retrieval method is disjunction (the default when hybridParameters.retrievalMethod is unset). Two knobs control it:

KnobDefaultBehaviour
strongStablePageSize1Pages 1..strong share one candidate pool of strong * limit docs. The proxy sends offset=0, limit=strong*limit to Marqo and slices the page out locally. Guarantees zero duplicate/missing docs across these pages. Cost: every page in this range pays to retrieve the full pool.
weakStablePageSize5Pages strong+1..weak each retrieve their own page * limit pool and discard the leading (page-1) * limit docs locally. Cheaper (only the current page's pool is fetched), but duplicates can still leak between adjacent pages.

Pages beyond weakStablePageSize fall through to Marqo's native pagination (no workaround).

The defaults (1, 5) reproduce historical behaviour and are the right starting point.

Tuning guidance

  • If a customer reports duplicate docs across pages and their page_size is small, nudge strongStablePageSize up modestly (e.g. 35). This makes the first N pages duplicate-free.
  • Avoid raising strongStablePageSize for customers using large page sizes — the retrieved pool is strong * limit docs on every page in the strong range, so a large limit multiplied by a large strong retrieves a very large pool on each request and hurts latency.
  • Keep the total retrieved docs modest. weakStablePageSize can be set somewhat larger than strong because it only fetches the current page's pool, but it does not eliminate duplicates.

Scope: one index-level setting governs all search profiles

paginationConfig is read only from the index's top-level search_config. It is not read from search profiles or per-query config overrides.

  • A paginationConfig set on the index default search_config applies to every search on that index — including requests that carry a profileId. ✅
  • A paginationConfig placed inside a search profile is ignored. ❌

So there is no per-profile pagination override: configure it once on the index's search_config and it applies universally.

Operation

The source of truth is the DynamoDB table prod-EcomIndexSettingsTable. Records are keyed by pk=<system_account_id> and sk=INDEX#<index_name>. The paginationConfig block is added under the search_config field. Changes propagate automatically to Cloudflare KV (where the search proxy reads them) via the settings exporter Lambda — see Settings Sync flow.

1. Get prod admin access

Use Escalator: Self-Service Admin to gain admin permission to the production Controller account ("Prod Controller Admin"), then sign into the prod Controller account as admin.

2. Find the index record

If you don't already know the index's system_account_id, look it up in Ecommerce Customer Details.

Open the prod-EcomIndexSettingsTable item explorer and query with:

  • pk = <system_account_id>
  • sk = INDEX#<index_name>

Or from the CLI:

aws dynamodb get-item --table-name prod-EcomIndexSettingsTable \
--key '{"pk": {"S": "<system_account_id>"}, "sk": {"S": "INDEX#<index_name>"}}'

3. Add paginationConfig to search_config

Click the record to open it for editing. Inside the search_config field, add a paginationConfig object. For example, to make the first 5 pages duplicate-free:

{
"search_config": {
"...existing keys...": "...",
"paginationConfig": {
"strongStablePageSize": 5,
"weakStablePageSize": 10
}
}
}

Notes:

  • Add paginationConfig alongside the existing search_config keys — do not replace the whole object. To avoid mistakes on a large config, copy the existing JSON into a diff tool (e.g. diffchecker, "JSON view" with "View DynamoDB JSON" disabled), add the block, and confirm the only change is the new paginationConfig key before pasting back.
  • Both values must be positive integers. Omitting the block (or either key) falls back to the defaults strongStablePageSize=1, weakStablePageSize=5.
  • paginationConfig is a proxy-internal field. It is stripped from the request before it reaches Marqo, so it cannot interfere with the Marqo query.

Click Save to commit.

4. Verify the change reached prod

The settings exporter Lambda picks up the DDB stream change (up to a ~10s batch window) and writes it to Cloudflare KV. To confirm:

  • Check the value in the prod-search-proxy-kv KV store (key <system_account_id>-<index_name>, or the shop_id) — the search_config.paginationConfig block should be present.
  • Run a paged search (offset > 0) for the customer and confirm duplicates/missing docs are gone for the pages within strongStablePageSize.

If the change is correct in DDB but not in KV, re-export manually by invoking the exporter Lambda with {"system_account_id": "...", "index_name": "..."} — see the Settings Sync flow for details.