Skip to main content

Summary

The Admin Lambda provides a REST API for forking indexes. Forking is a mechanism for making a copy of a customer's index and (optionally) transparently switching to serve from the copy. This can be used for development and testing on customer index, and blue-green deployments changing otherwise immutable settings on an index (e.g. model, account, cluster, etc.), or for green-blue deployment of changes to infra.

Concepts

  • Index name: the name of a Marqo Cloud index. This is unique per account, and in the Marqo Classic API, it is immutable per index and thus a unique identifier.
  • Index ID: a general unique identifier for an index (e.g. {system_account_id}-{index_name}). In the ecom world, a request for an index by ID (via the x-marqo-index-id header) does not necessarily map to an index with the specified name.
  • Index settings: the immutable settings of a Marqo index, sent in the Classic API create index request body.

Components

Fork will interact with the following components:

  • Data:
    • AccountsTable / UsersAccountsTable
    • CustomerIndexConfigTable
    • env-EcomIndexSettingsTable
    • Index query configs table
    • Merchandising table
    • Feature flags JSON
  • Infra:
    • kops clusters
    • Multitenant clusters
    • Workflows
  • Tests:
    • Canary tests

Steps

Describe

Get all the necessary details about the source and target.

The set of details is mostly covered by Ops/Tactical/Validationgeneral index readiness, and Ecom Canary Testing which validates changes to configs.

  • Source and target account and cluster details
    • IDs
    • Feature flags
  • Source index details
    • Name
    • Settings
    • Infrastructure
    • Bespoke infra config (e.g. scaled out API nodes)
    • URLs
  • Target index
    • Exists?
  • Source ecom index settings
    • Configs (add_docs_config, collections_config, search_config)
    • Infrastructure (especially the queue ARN)
  • Query configs
    • Configs
  • Merchandising
    • Config
    • Rules
  • Pixel
    • Mappings for automatic doc updates

TODO: In general, how to behave if the target resources/config already exist.

Create

Create a new index with the desired immutable configuration.

Once the queue is created, also deactivate the trigger for the ecom indexer so the queue isn't consumed until we're ready.

Configure

Create or update any mutable configuration (most of the things in "Describe"), defaulting to copies of the old index, able to be overridden at clone time.

Transfer

Once the target index is ready, update the source index's add_docs_config.index_write_aliases to start forking all subsequent writes to the target index.

In parallel, being the transfer operation for the existing docs (either manual snapshot to be restored, or the reindexing pipeline).

Persistence

Fork details are stored in a DynamoDB table called {env}-IndexForksTable. A new record is created for each state change of each fork.

ColumnDescription
pkSource system account ID
sk(Source index name)#(System timestamp, ISO format)
fork_idFork ID
statuspending, in_progress, ready, failed, rolled_back, aborted, complete
source_cell_idSource cell ID
source_system_account_idSource system account ID
source_index_nameSource index name
target_cell_idTarget cell ID
target_system_account_idTarget system account ID
target_index_nameTarget index name
created_atTimestamp of creation of this record (particular status reached)
updated_atTimestamp of last update of this record

For each fork, we store:

  • One record with all the context with which the fork was created.
  • One record for each status change with timestamps.

Access patterns:

  • List all forks for a given index ID (account ID + index name) and their latest status
  • Get the latest status for a given fork ID
  • List the history of a given fork ID
  • Create a new fork record
  • Update the
  • Update the status of a fork (by creating a new record with the same fork ID and a new timestamp)

API

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks

Create a new fork.

GET /api/v1/accounts/{account_id}/indexes/{index_name}/forks

List all forks for an index (as source or target).

GET /accounts/:acc/indexes/:index/forks
=> 200 {
"forks": [
{"forkId": "...", "status": "...", "source": {...}, "target": {...}},
...
]
}

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/verify

Run on-demand comparison between source and target indexes.

POST /accounts/:acc/indexes/:index/forks/:fork/verify
{
"numQueries": 10, // number of test queries
"resultLimit": 10, // results per query
"tolerance": 0.9 // minimum overlap (0.0-1.0)
}
=> 200 {
"forkId": "...",
"queriesRun": 10,
"queriesPassed": 9,
"queriesFailed": 1,
"overallPassed": false,
"sourceDocCount": 1000000,
"targetDocCount": 999998,
"comparisons": [
{"query": "...", "overlapPercent": 0.95, "passed": true},
...
]
}

Test Query Sources:

  1. Customer-specific test suites: Each customer has a dedicated test suite that validates the specific features they use, in the way they use them. These are the primary verification mechanism.
  2. Generic smoke tests: A standard set of tests that can be fired at any index to verify basic functionality (e.g., simple searches, filter queries, pagination).
  3. AI-generated tests (future): An AI agent could analyze document contents to generate representative queries, or sample production traffic patterns to create realistic test scenarios.

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cutover

Shift production traffic from source to target by adding a read alias on the source index that routes search queries to the target.

Current implementation: Instant cutover only (adds read alias, marks fork COMPLETE). No request body required.

POST /accounts/:acc/indexes/:index/forks/:fork/cutover
=> 200 {"fork_id": "...", "status": "complete"}

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/rollback

Revert read traffic to the source index by switching the read alias from the target back to the source. The read alias stays active (pointing at the source) and the write alias (dual-write) continues. Only valid from complete status.

POST /accounts/:acc/indexes/:index/forks/:fork/rollback
=> 200 {"fork_id": "...", "status": "rolled_back"}

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cleanup

Clean up source read routing after cutover is confirmed working: removes the source's self-referencing read alias and makes the target read alias visible, while retaining the source-to-target write alias so writes addressed to the source continue reaching the target. The fork stays in complete status. Idempotent — safe to call multiple times.

POST /accounts/:acc/indexes/:index/forks/:fork/cleanup
=> 200 {"fork_id": "...", "status": "complete", "cleaned_up": true}

Future: chain collapse. When sequential forks create a chain (A→B→C), cleanup of the B→C fork should collapse the chain by updating A's aliases to point directly to C. This allows intermediary index B to be decommissioned. Without collapse, all intermediary indexes must stay alive as alias relay points and reads traverse multiple hops.

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cancel

Cancel a fork that is still executing. Stops the Step Functions workflow and marks the fork as cancelled. Only valid from in_progress status.

POST /accounts/:acc/indexes/:index/forks/:fork/cancel
=> 200 {"fork_id": "...", "status": "cancelled"}

Note: Cancel is a best-effort operation — the workflow step currently executing will complete, but no further steps will run. Any write alias already added by prepare_transfer will remain (use abort to clean it up).

DELETE /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}

Abort a fork. Removes write alias from source (best-effort) and marks the fork as aborted. Accepts forks in pending, in_progress, or ready status.

DELETE /accounts/:acc/indexes/:index/forks/:fork
=> 200 {"fork_id": "...", "status": "aborted"}

Implementation Plan

1. Persistence Layer

  • Schema Design: Define a DynamoDB table schema for ForksTable to store fork ID, status, source/target details, and step progress.
  • Service: Create a ForkService to handle CRUD operations for fork records.
  • Deployment: Deploy the new table to the production environment via admin_stack in CDK.

2. Core Fork Logic (Orchestrator)

  • Describe & Validation: Implement logic to fetch source index details and validate target index parameters.
  • Resource Creation: Integrate with IndexSettingsService to create the target index (immutable settings).
  • Configuration Sync: Implement logic to copy and merge mutable settings (Ecom, Query Configs, Merchandising, Pixel) from source to target.
  • Write Aliasing: Implement the update of add_docs_config on the source index to alias writes to the target.

3. API Implementation

  • POST /forks:
    • Generate Fork ID.
    • Create initial record in ForksTable.
    • Trigger the asynchronous fork workflow (likely via Step Functions or async Lambda invocation).
    • Return Fork ID and pending status.
  • POST /cutover:
    • Retrieve fork record.
    • Verify fork is in ready state.
    • Update routing configuration (Index Registry/DNS/Gateway) to point search traffic to target.
    • Update status to complete.
  • POST /rollback:
    • Revert write aliases on source index.
    • Revert search routing if cutover was attempted.
    • Update status to rolled_back.
  • POST /cleanup:
    • Remove the source's self-referencing read alias and make the target read alias visible.
    • Retain the source-to-target write alias so writes keep fanning out to the target.
  • POST /abort:
    • Revert any changes to source (aliases).
    • Delete target index resources.
    • Mark fork as aborted.

4. Asynchronous Workflow

The fork workflow is orchestrated by a Step Functions state machine (AdminIndexForkWorkflow). Each step invokes the Admin Lambda with a specific action:

fork.ensure_target → fork.configure_target → fork.prepare_transfer → snapshot → restore → fork.activate_target → fork.verify → succeed
StepLambda ActionDescription
Ensure Targetfork.ensure_targetCreate target index if missing, single readiness check (SFN retries on TargetIndexNotReadyError), validate infra compatibility
Configure Targetfork.configure_targetExport source config, import into target, validate post-import export matches
Prepare Transferfork.prepare_transferDisable target SQS ESM, add write alias (source → target)
Snapshot(cross-account SFN)Snapshot source index documents
Restore(cross-account SFN)Restore snapshot onto target index
Activate Targetfork.activate_targetRe-enable target SQS ESM so queued writes drain
Verifyfork.verifyCompare search results between source and target, mark READY or FAILED

All steps are idempotent for safe Step Functions retries. Failures mark the fork as FAILED with a descriptive message.

5. Testing

  • Unit Tests: Test individual components (Service, API models, Logic).
  • Integration Tests: Test the full flow with mocked infrastructure calls.