Skip to main content

Ecommerce Platform

  • Infra code: infra/ecom/
  • Component code: components/shopify/, components/ecom_indexer/, components/ecom_settings_exporter/, components/ecom_metrics_consumer/, components/ecom_monitoring_service/

Product search for ecommerce platforms (primarily Shopify). Includes indexing pipeline, search proxy, and admin tools.

Architecture

AWS Resources

ResourceName PatternHow to Inspect
DynamoDB{env}-EcomIndexSettingsTableDynamoDB
DynamoDB{env}-EcomIndexerJobsTableDynamoDB
DynamoDB{env}-EcomIndexQueryConfigsTableDynamoDB
DynamoDB{env}-EcomCollectionsTableDynamoDB
DynamoDB{env}-AgenticCachedQueriesTableDynamoDB
Lambda{env}-EcomIndexerFunctionLambda
Lambda{env}-EcomSettingsExporterLambdaLambda
Lambda{env}-EcomMetricsWorkerLambda
Lambda{env}-EcomMonitoringServiceLambdaLambda
Lambda{env}-ShopifyAppAdminFunctionLambda
Lambda{env}-ShopifyWebhookWorkerLambda
SQS{env}-EcomMetricsQueueSQS
SQS{env}-EcomMetricsQueueDLQSQS
S3{env}-ecom-product-data-bucketS3
S3{env}-shopify-app-assetsS3
API Gateway{env}-EcomApi (HTTP v2)API Gateway

DynamoDB Table Schemas

Before hand-rolling a query (and never scan a prod table to find one item), read the DynamoDB access-patterns cheat sheet — copy-pasteable read-only CLI for lookup-by-job-id, list-by-status, and the index-name-vs-shop-domain pk trap.

EcomIndexSettingsTable

  • pk (S): {system_account_id}
  • sk (S): INDEX#{index_name}
  • GSI_IndexRootsByName: pk=pk, sk=index_name
  • Stream: NEW_IMAGE (triggers settings exporter)
  • Stores Marqo index configs: create_index settings, search settings, collection configs, aliases.

EcomIndexerJobsTable

  • pk (S): PLATFORM#{platform}#SHOP#{shop_id}
  • sk (S): JOB#{created_at}#{job_id}
  • GSI_JobLookup_v2: pk=job_id, sk=created_at (KEYS_ONLY)
  • GSI_JobsByStatus_v2: pk=shop_id, sk=status_created_at (INCLUDE: job_type, job_status, total_items, etc.)
  • TTL: ttl (30-day auto-cleanup)

EcomIndexQueryConfigsTable

  • pk (S): {system_account_id}
  • sk (S): INDEX#{index_name}#QUERY#{normalized_query}

AgenticCachedQueriesTable

  • pk (S): {account_id}#{index_name}#{normalized_query}
  • GSI (accountId-indexName-index): pk=gsi_pk ({account_id}#{index_name})

Data Flow: Settings Sync

If KV data is stale, check:

  1. DDB stream status on the table
  2. Settings exporter Lambda logs
  3. Cloudflare KV namespace content

Typical Investigation Paths

Search returning wrong results:

  1. Check index settings in DDB: query EcomIndexSettingsTable for the account
  2. Check KV cache: npx wrangler kv key get --namespace-id {id} "{account_id}"
  3. Check search proxy logs: npx wrangler tail {env}-ecom-api

Indexing not working:

  1. Check indexer jobs: query EcomIndexerJobsTable GSI_JobsByStatus_v2 for the shop
  2. Check indexer Lambda logs: aws logs tail /aws/lambda/{env}-EcomIndexerFunction
  3. Check SQS queue for the shop (dynamically created)

Metrics missing:

  1. Check metrics queue depth: SQS
  2. Check DLQ for failed messages
  3. Check metrics worker Lambda logs

CLI Recipes

Finding Dev Environment Resources

Dev environment resources are prefixed with the branch name:

aws dynamodb list-tables --profile staging | grep "dev-<branch>"
aws logs describe-log-groups --profile staging \
--log-group-name-prefix "/aws/lambda/dev-<branch>"

Checking Sync Job Status

aws dynamodb scan \
--table-name {env}-EcomIndexerJobsTable \
--profile <profile>

Look for job_status (COMPLETED/FAILED/IN_PROGRESS), total_items, processed_items, failed_items, error_summary.

Querying Logs with CloudWatch Insights

# Start a query
aws logs start-query \
--log-group-name "/aws/lambda/{env}-ShopifyWebhookWorker/ia" \
--start-time $(python3 -c "import time; print(int(time.time()) - 3600)") \
--end-time $(python3 -c "import time; print(int(time.time()))") \
--query-string 'fields @timestamp, @message | filter @message like /error|ERROR/ | sort @timestamp desc | limit 30' \
--profile <profile>

# Get results (use the queryId from the start-query response)
aws logs get-query-results --query-id "<query-id>" --profile <profile>

Useful filter patterns:

  • filter @message like /<job-id>/ — all logs for a specific job
  • filter @message like /error|ERROR|exception/ — errors only
  • filter @message like /chunk|streaming|Document-aware/ — bulk sync progress
  • filter @message like /finish|bulk_operations/ — bulk export completion webhook

Inspecting S3 Documents

Check what the transformer produced before it went to Marqo:

aws s3 cp "s3://{env}-ecom-product-data-bucket/<shop-id>//bulk/<job-id>/chunk_0000.jsonl" - \
--profile <profile> | python3 -c "
import sys, json
for line in sys.stdin:
doc = json.loads(line)
print(f'{doc.get(\"_id\")}: {doc.get(\"variantTitle\", \"?\")}')
"

Querying the Search Proxy

curl -s -X POST \
'https://{ecom-domain}/api/v1/indexes/<index-name>/search' \
-H 'x-marqo-index-id: <index-id>' \
-H 'x-marqo-debug: 6F188718-DC08-4076-9E05-434CC7E728C3' \
-H 'Content-Type: application/json' \
-d '{"q": "*", "limit": 5, "collapseFields": []}'
  • Dev environments: https://dev-<branch>-ecom.dev-marqo.org
  • x-marqo-debug header bypasses API key auth for dev environments
  • collapseFields: [] overrides the default collapse if not configured on the index

Checking Index Settings

aws dynamodb scan \
--table-name {env}-EcomIndexSettingsTable \
--profile <profile>

Key fields: split_products_by, group_variants_by, aggregation_mode, aggregation_fields, tensor_fields, mappings.

Common Issues

SymptomLikely Cause
Job COMPLETED but 0 documentsProducts have no variants, or all filtered by status/published
parentProductId collapse field errorIndex schema doesn't have collapse field configured; pass "collapseFields": [] to search without it
Marqo settings not found on searchUsing wrong base URL (staging vs dev-prefixed)
Enrichment silently skippedshopify_domain or access_token is None — check the caller passes them

Shopify-Specific Diagnostics

See Shopify Diagnostics for Shopify-specific recipes (GraphQL queries, metafield inspection, variant metafield debugging).

Accounts

EnvAccountEcom API Domain
Staging468036072962staging-ecom.dev-marqo.org
Preprod010928202142ecom.preprod-marqo.org
Prod023568249301ecom.marqo-ep.ai