Ecommerce Platform
- Infra code:
infra/ecom/ - Component code:
components/shopify/,components/ecom_indexer/,components/ecom_settings_exporter/,components/ecom_metrics_consumer/,components/ecom_monitoring_service/
Product search for ecommerce platforms (primarily Shopify). Includes indexing pipeline, search proxy, and admin tools.
Architecture
AWS Resources
| Resource | Name Pattern | How to Inspect |
|---|---|---|
| DynamoDB | {env}-EcomIndexSettingsTable | DynamoDB |
| DynamoDB | {env}-EcomIndexerJobsTable | DynamoDB |
| DynamoDB | {env}-EcomIndexQueryConfigsTable | DynamoDB |
| DynamoDB | {env}-EcomCollectionsTable | DynamoDB |
| DynamoDB | {env}-AgenticCachedQueriesTable | DynamoDB |
| Lambda | {env}-EcomIndexerFunction | Lambda |
| Lambda | {env}-EcomSettingsExporterLambda | Lambda |
| Lambda | {env}-EcomMetricsWorker | Lambda |
| Lambda | {env}-EcomMonitoringServiceLambda | Lambda |
| Lambda | {env}-ShopifyAppAdminFunction | Lambda |
| Lambda | {env}-ShopifyWebhookWorker | Lambda |
| SQS | {env}-EcomMetricsQueue | SQS |
| SQS | {env}-EcomMetricsQueueDLQ | SQS |
| S3 | {env}-ecom-product-data-bucket | S3 |
| S3 | {env}-shopify-app-assets | S3 |
| API Gateway | {env}-EcomApi (HTTP v2) | API Gateway |
DynamoDB Table Schemas
Before hand-rolling a query (and never
scana prod table to find one item), read the DynamoDB access-patterns cheat sheet — copy-pasteable read-only CLI for lookup-by-job-id, list-by-status, and the index-name-vs-shop-domain pk trap.
EcomIndexSettingsTable
- pk (S):
{system_account_id} - sk (S):
INDEX#{index_name} - GSI_IndexRootsByName: pk=
pk, sk=index_name - Stream: NEW_IMAGE (triggers settings exporter)
- Stores Marqo index configs: create_index settings, search settings, collection configs, aliases.
EcomIndexerJobsTable
- pk (S):
PLATFORM#{platform}#SHOP#{shop_id} - sk (S):
JOB#{created_at}#{job_id} - GSI_JobLookup_v2: pk=
job_id, sk=created_at(KEYS_ONLY) - GSI_JobsByStatus_v2: pk=
shop_id, sk=status_created_at(INCLUDE: job_type, job_status, total_items, etc.) - TTL:
ttl(30-day auto-cleanup)
EcomIndexQueryConfigsTable
- pk (S):
{system_account_id} - sk (S):
INDEX#{index_name}#QUERY#{normalized_query}
AgenticCachedQueriesTable
- pk (S):
{account_id}#{index_name}#{normalized_query} - GSI (accountId-indexName-index): pk=
gsi_pk({account_id}#{index_name})
Data Flow: Settings Sync
If KV data is stale, check:
- DDB stream status on the table
- Settings exporter Lambda logs
- Cloudflare KV namespace content
Typical Investigation Paths
Search returning wrong results:
- Check index settings in DDB: query
EcomIndexSettingsTablefor the account - Check KV cache:
npx wrangler kv key get --namespace-id {id} "{account_id}" - Check search proxy logs:
npx wrangler tail {env}-ecom-api
Indexing not working:
- Check indexer jobs: query
EcomIndexerJobsTableGSI_JobsByStatus_v2 for the shop - Check indexer Lambda logs:
aws logs tail /aws/lambda/{env}-EcomIndexerFunction - Check SQS queue for the shop (dynamically created)
Metrics missing:
- Check metrics queue depth: SQS
- Check DLQ for failed messages
- Check metrics worker Lambda logs
CLI Recipes
Finding Dev Environment Resources
Dev environment resources are prefixed with the branch name:
aws dynamodb list-tables --profile staging | grep "dev-<branch>"
aws logs describe-log-groups --profile staging \
--log-group-name-prefix "/aws/lambda/dev-<branch>"
Checking Sync Job Status
aws dynamodb scan \
--table-name {env}-EcomIndexerJobsTable \
--profile <profile>
Look for job_status (COMPLETED/FAILED/IN_PROGRESS), total_items, processed_items, failed_items, error_summary.
Querying Logs with CloudWatch Insights
# Start a query
aws logs start-query \
--log-group-name "/aws/lambda/{env}-ShopifyWebhookWorker/ia" \
--start-time $(python3 -c "import time; print(int(time.time()) - 3600)") \
--end-time $(python3 -c "import time; print(int(time.time()))") \
--query-string 'fields @timestamp, @message | filter @message like /error|ERROR/ | sort @timestamp desc | limit 30' \
--profile <profile>
# Get results (use the queryId from the start-query response)
aws logs get-query-results --query-id "<query-id>" --profile <profile>
Useful filter patterns:
filter @message like /<job-id>/— all logs for a specific jobfilter @message like /error|ERROR|exception/— errors onlyfilter @message like /chunk|streaming|Document-aware/— bulk sync progressfilter @message like /finish|bulk_operations/— bulk export completion webhook
Inspecting S3 Documents
Check what the transformer produced before it went to Marqo:
aws s3 cp "s3://{env}-ecom-product-data-bucket/<shop-id>//bulk/<job-id>/chunk_0000.jsonl" - \
--profile <profile> | python3 -c "
import sys, json
for line in sys.stdin:
doc = json.loads(line)
print(f'{doc.get(\"_id\")}: {doc.get(\"variantTitle\", \"?\")}')
"
Querying the Search Proxy
curl -s -X POST \
'https://{ecom-domain}/api/v1/indexes/<index-name>/search' \
-H 'x-marqo-index-id: <index-id>' \
-H 'x-marqo-debug: 6F188718-DC08-4076-9E05-434CC7E728C3' \
-H 'Content-Type: application/json' \
-d '{"q": "*", "limit": 5, "collapseFields": []}'
- Dev environments:
https://dev-<branch>-ecom.dev-marqo.org x-marqo-debugheader bypasses API key auth for dev environmentscollapseFields: []overrides the default collapse if not configured on the index
Checking Index Settings
aws dynamodb scan \
--table-name {env}-EcomIndexSettingsTable \
--profile <profile>
Key fields: split_products_by, group_variants_by, aggregation_mode, aggregation_fields, tensor_fields, mappings.
Common Issues
| Symptom | Likely Cause |
|---|---|
| Job COMPLETED but 0 documents | Products have no variants, or all filtered by status/published |
parentProductId collapse field error | Index schema doesn't have collapse field configured; pass "collapseFields": [] to search without it |
Marqo settings not found on search | Using wrong base URL (staging vs dev-prefixed) |
| Enrichment silently skipped | shopify_domain or access_token is None — check the caller passes them |
Shopify-Specific Diagnostics
See Shopify Diagnostics for Shopify-specific recipes (GraphQL queries, metafield inspection, variant metafield debugging).
Accounts
| Env | Account | Ecom API Domain |
|---|---|---|
| Staging | 468036072962 | staging-ecom.dev-marqo.org |
| Preprod | 010928202142 | ecom.preprod-marqo.org |
| Prod | 023568249301 | ecom.marqo-ep.ai |