Summary
The Admin Lambda provides a REST API for forking indexes. Forking is a mechanism for making a copy of a customer's index and (optionally) transparently switching to serve from the copy. This can be used for development and testing on customer index, and blue-green deployments changing otherwise immutable settings on an index (e.g. model, account, cluster, etc.), or for green-blue deployment of changes to infra.
Concepts
- Index name: the name of a Marqo Cloud index. This is unique per account, and in the Marqo Classic API, it is immutable per index and thus a unique identifier.
- Index ID: a general unique identifier for an index (e.g.
{system_account_id}-{index_name}). In the ecom world, a request for an index by ID (via thex-marqo-index-idheader) does not necessarily map to an index with the specified name. - Index settings: the immutable settings of a Marqo index, sent in the Classic API create index request body.
Components
Fork will interact with the following components:
- Data:
- AccountsTable / UsersAccountsTable
- CustomerIndexConfigTable
- env-EcomIndexSettingsTable
- Index query configs table
- Merchandising table
- Feature flags JSON
- Infra:
- kops clusters
- Multitenant clusters
- Workflows
- Tests:
- Canary tests
Steps
Describe
Get all the necessary details about the source and target.
The set of details is mostly covered by Ops/Tactical/Validationgeneral index readiness, and Ecom Canary Testing which validates changes to configs.
- Source and target account and cluster details
- IDs
- Feature flags
- Source index details
- Name
- Settings
- Infrastructure
- Bespoke infra config (e.g. scaled out API nodes)
- URLs
- Target index
- Exists?
- Source ecom index settings
- Configs (add_docs_config, collections_config, search_config)
- Infrastructure (especially the queue ARN)
- Query configs
- Configs
- Merchandising
- Config
- Rules
- Pixel
- Mappings for automatic doc updates
TODO: In general, how to behave if the target resources/config already exist.
Create
Create a new index with the desired immutable configuration.
Once the queue is created, also deactivate the trigger for the ecom indexer so the queue isn't consumed until we're ready.
Configure
Create or update any mutable configuration (most of the things in "Describe"), defaulting to copies of the old index, able to be overridden at clone time.
Transfer
Once the target index is ready, update the source index's add_docs_config.index_write_aliases to start forking all subsequent writes to the target index.
In parallel, being the transfer operation for the existing docs (either manual snapshot to be restored, or the reindexing pipeline).
Persistence
Fork details are stored in a DynamoDB table called {env}-IndexForksTable. A new record is created for each state change of each fork.
| Column | Description |
|---|---|
| pk | Source system account ID |
| sk | (Source index name)#(System timestamp, ISO format) |
| fork_id | Fork ID |
| status | pending, in_progress, ready, failed, rolled_back, aborted, complete |
| source_cell_id | Source cell ID |
| source_system_account_id | Source system account ID |
| source_index_name | Source index name |
| target_cell_id | Target cell ID |
| target_system_account_id | Target system account ID |
| target_index_name | Target index name |
| created_at | Timestamp of creation of this record (particular status reached) |
| updated_at | Timestamp of last update of this record |
For each fork, we store:
- One record with all the context with which the fork was created.
- One record for each status change with timestamps.
Access patterns:
- List all forks for a given index ID (account ID + index name) and their latest status
- Get the latest status for a given fork ID
- List the history of a given fork ID
- Create a new fork record
- Update the
- Update the status of a fork (by creating a new record with the same fork ID and a new timestamp)
API
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks
Create a new fork.
GET /api/v1/accounts/{account_id}/indexes/{index_name}/forks
List all forks for an index (as source or target).
GET /accounts/:acc/indexes/:index/forks
=> 200 {
"forks": [
{"forkId": "...", "status": "...", "source": {...}, "target": {...}},
...
]
}
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/verify
Run on-demand comparison between source and target indexes.
POST /accounts/:acc/indexes/:index/forks/:fork/verify
{
"numQueries": 10, // number of test queries
"resultLimit": 10, // results per query
"tolerance": 0.9 // minimum overlap (0.0-1.0)
}
=> 200 {
"forkId": "...",
"queriesRun": 10,
"queriesPassed": 9,
"queriesFailed": 1,
"overallPassed": false,
"sourceDocCount": 1000000,
"targetDocCount": 999998,
"comparisons": [
{"query": "...", "overlapPercent": 0.95, "passed": true},
...
]
}
Test Query Sources:
- Customer-specific test suites: Each customer has a dedicated test suite that validates the specific features they use, in the way they use them. These are the primary verification mechanism.
- Generic smoke tests: A standard set of tests that can be fired at any index to verify basic functionality (e.g., simple searches, filter queries, pagination).
- AI-generated tests (future): An AI agent could analyze document contents to generate representative queries, or sample production traffic patterns to create realistic test scenarios.
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cutover
Shift production traffic from source to target by adding a read alias on the source index that routes search queries to the target.
Current implementation: Instant cutover only (adds read alias, marks fork COMPLETE). No request body required.
POST /accounts/:acc/indexes/:index/forks/:fork/cutover
=> 200 {"fork_id": "...", "status": "complete"}
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/rollback
Revert read traffic to the source index by switching the read alias from the target back to the source. The read alias stays active (pointing at the source) and the write alias (dual-write) continues. Only valid from complete status.
POST /accounts/:acc/indexes/:index/forks/:fork/rollback
=> 200 {"fork_id": "...", "status": "rolled_back"}
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cleanup
Clean up source read routing after cutover is confirmed working: removes the source's self-referencing read alias and makes the target read alias visible, while retaining the source-to-target write alias so writes addressed to the source continue reaching the target. The fork stays in complete status. Idempotent — safe to call multiple times.
POST /accounts/:acc/indexes/:index/forks/:fork/cleanup
=> 200 {"fork_id": "...", "status": "complete", "cleaned_up": true}
Future: chain collapse. When sequential forks create a chain (A→B→C), cleanup of the B→C fork should collapse the chain by updating A's aliases to point directly to C. This allows intermediary index B to be decommissioned. Without collapse, all intermediary indexes must stay alive as alias relay points and reads traverse multiple hops.
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cancel
Cancel a fork that is still executing. Stops the Step Functions workflow and marks the fork as cancelled. Only valid from in_progress status.
POST /accounts/:acc/indexes/:index/forks/:fork/cancel
=> 200 {"fork_id": "...", "status": "cancelled"}
Note: Cancel is a best-effort operation — the workflow step currently executing will complete, but no further steps will run. Any write alias already added by prepare_transfer will remain (use abort to clean it up).
DELETE /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}
Abort a fork. Removes write alias from source (best-effort) and marks the fork as aborted. Accepts forks in pending, in_progress, or ready status.
DELETE /accounts/:acc/indexes/:index/forks/:fork
=> 200 {"fork_id": "...", "status": "aborted"}
Implementation Plan
1. Persistence Layer
- Schema Design: Define a DynamoDB table schema for
ForksTableto store fork ID, status, source/target details, and step progress. - Service: Create a
ForkServiceto handle CRUD operations for fork records. - Deployment: Deploy the new table to the production environment via
admin_stackin CDK.
2. Core Fork Logic (Orchestrator)
- Describe & Validation: Implement logic to fetch source index details and validate target index parameters.
- Resource Creation: Integrate with
IndexSettingsServiceto create the target index (immutable settings). - Configuration Sync: Implement logic to copy and merge mutable settings (Ecom, Query Configs, Merchandising, Pixel) from source to target.
- Write Aliasing: Implement the update of
add_docs_configon the source index to alias writes to the target.
3. API Implementation
POST /forks:- Generate Fork ID.
- Create initial record in
ForksTable. - Trigger the asynchronous fork workflow (likely via Step Functions or async Lambda invocation).
- Return Fork ID and
pendingstatus.
POST /cutover:- Retrieve fork record.
- Verify fork is in
readystate. - Update routing configuration (Index Registry/DNS/Gateway) to point search traffic to target.
- Update status to
complete.
POST /rollback:- Revert write aliases on source index.
- Revert search routing if cutover was attempted.
- Update status to
rolled_back.
POST /cleanup:- Remove the source's self-referencing read alias and make the target read alias visible.
- Retain the source-to-target write alias so writes keep fanning out to the target.
POST /abort:- Revert any changes to source (aliases).
- Delete target index resources.
- Mark fork as
aborted.
4. Asynchronous Workflow
The fork workflow is orchestrated by a Step Functions state machine (AdminIndexForkWorkflow). Each step invokes the Admin Lambda with a specific action:
fork.ensure_target → fork.configure_target → fork.prepare_transfer → snapshot → restore → fork.activate_target → fork.verify → succeed
| Step | Lambda Action | Description |
|---|---|---|
| Ensure Target | fork.ensure_target | Create target index if missing, single readiness check (SFN retries on TargetIndexNotReadyError), validate infra compatibility |
| Configure Target | fork.configure_target | Export source config, import into target, validate post-import export matches |
| Prepare Transfer | fork.prepare_transfer | Disable target SQS ESM, add write alias (source → target) |
| Snapshot | (cross-account SFN) | Snapshot source index documents |
| Restore | (cross-account SFN) | Restore snapshot onto target index |
| Activate Target | fork.activate_target | Re-enable target SQS ESM so queued writes drain |
| Verify | fork.verify | Compare search results between source and target, mark READY or FAILED |
All steps are idempotent for safe Step Functions retries. Failures mark the fork as FAILED with a descriptive message.
5. Testing
- Unit Tests: Test individual components (Service, API models, Logic).
- Integration Tests: Test the full flow with mocked infrastructure calls.