Skip to main content

Shopify Collection Membership Reconciliation

Problem

Shopify automated (smart) collections define membership via rules (e.g., "tag equals New HH AND inventory > 0"). When a product's attributes change and it no longer matches the rules, Shopify silently removes it from the collection. Two platform limitations cause our search index to go stale:

  1. No webhook for membership changes. Shopify does not send collections/update webhooks when products enter or leave automated collections due to rule re-evaluation. The webhook only fires when the collection itself is edited (title, rules, sort order). This is confirmed by Shopify staff as expected behaviour with no plans to change.

  2. Eventually consistent collections field. When a product's tag changes, the products/update webhook fires immediately. Our handler fetches the product via GraphQL, including collections(first: 100). But Shopify's automated collection rule re-evaluation is asynchronous — the collections field still returns the old membership for seconds to minutes after the attribute change. So we re-index the product with stale collection data.

The combination means: stale collection data is written, and no follow-up event ever corrects it. Drift persists indefinitely.

Observed Impact

Confirmed on Muji US, collection home-essentials-new-arrivals (automated, rule: tag = "New HH" AND inventory > 0):

MetricCount
Products in Shopify collection328
Products in Marqo collection332
Stale (in Marqo, not in Shopify)34
Missing (in Shopify, not in Marqo)30

All 34 stale products had the "New HH" tag removed in a batch operation on 2025-05-28. The products/update webhook processed each within ~3 seconds, but the collections field hadn't updated yet.

Solution

shopify_reconciliation_service — a scheduled Lambda that periodically triggers our existing collection sync logic for every indexed collection, compensating for the missing webhook.

Architecture

EventBridge (every 5 min)
|
v
Reconciliation Lambda
|
v
For each active Shopify shop:
|
+-- Marqo facet query on productCollectionIds
| (returns all unique collection IDs in the index)
|
+-- For each collection ID (sequential):
|
+-- Create sync job record (DynamoDB)
+-- Call sync_service.process_collection_webhook() directly
|
+-- Fetch current members from Shopify (GraphQL)
+-- Search Marqo for stale references (filter by collection ID)
+-- If drift detected:
| Build partial update docs, enqueue to indexer via SQS
+-- If no drift:
Complete job with 0 items (no-op)

Key Design Decisions

  • Zero code duplication. The Lambda calls process_collection_webhook() directly — the same function the webhook worker uses. All drift detection, partial update building, S3 chunking, and SQS enqueuing is reused.
  • No Lambda fan-out. Collections are processed sequentially within a single Lambda invocation per shop. This avoids exhausting the AWS account's Lambda concurrency pool.
  • Handles only membership drift. Collection creates and deletes are already handled reliably by Shopify webhooks. This service only compensates for the collections/update gap on automated collections.
  • Graceful error isolation. If one shop or collection fails, the service logs the error and continues to the next. A single failure doesn't block reconciliation for other shops/collections.

Resource Usage

Shopify API

Shopify GraphQL uses a per-shop leaky bucket. The bucket size and restore rate depend on the shop's plan:

Shopify PlanBucket sizeRestore rate
Standard2,000 points50 pts/sec
Advanced4,000 points100 pts/sec
Shopify Plus20,000 points1,000 pts/sec

Each collection products query costs 1 point. Per-shop usage per cycle:

Collections per shopPoints usedStandard (2,000 bucket)Plus (20,000 bucket)
50502.5% of bucket0.25% of bucket
1001005% of bucket0.5% of bucket
20020010% of bucket1% of bucket

Because the bucket refills continuously (50–1,000 pts/sec depending on plan) and we process collections sequentially (~100ms+ network round-trip each), the bucket restores faster than we drain it. A Standard plan shop with 100 collections uses 100 points total; at 50 pts/sec restore that refills in 2 seconds. In practice, the reconciliation never comes close to throttling.

Batch product queries (only if drift is detected) cost ~2 additional points per drifted collection, but drift is rare in steady state.

Marqo

ResourcePer collectionNotes
Stale product search1 searchfilter: productCollectionIds:{id}, limit 1000
Facet discovery query1 search per shopSingle query returns all collection IDs

Searches are lightweight (filter-only, no vector computation). The facet discovery query runs once per shop per cycle.

AWS

ResourceUsage
Lambda invocations1 per 5-minute cycle (single invocation processes all shops)
Lambda duration~30s–5min depending on number of shops/collections
Lambda memory512 MB
DynamoDB writes1 job record per collection per cycle (most are no-op completions)
SQS messagesOnly when drift is detected (rare in steady state)
S3 writesOnly when drift is detected (partial update doc chunks)

In steady state (no drift), the cost is: 1 Lambda invocation + N Marqo searches + N Shopify collection queries + N DynamoDB job records, where N = total collections across all shops.

Cost Estimate

For a typical deployment with 10 shops averaging 50 collections each:

  • Per cycle: 1 Lambda invocation (30s @ 512MB), 500 Marqo searches, 500 Shopify API calls, 500 DynamoDB writes
  • Per day: 288 cycles (every 5 min), ~144,000 Marqo searches, ~144,000 Shopify calls, ~144,000 DDB writes
  • Monthly Lambda cost: ~$0.50 (288 invocations/day * 30 days * 30s avg * 512MB)
  • Monthly DDB cost: ~$1.50 (144K writes/day * 30 days at $1.25/million WCU)

The dominant cost is Marqo search requests, which are internal and don't have a direct dollar cost.

Component Structure

components/shopify_reconciliation_service/
BUILD # Pants build target (ARM64 Lambda)
shopify_reconciliation_service/
BUILD # python_sources + python_tests
lambda_function.py # EventBridge handler entry point
collection_membership_sync.py # Core reconciliation logic
test_collection_membership_sync.py # Unit tests (13 tests)

Infrastructure

Defined in infra/ecom/stacks/shopify_admin_stack.py:

  • Lambda: ShopifyReconciliation (Python 3.11, ARM64, 512MB, 15min timeout)
  • EventBridge rule: ShopifyReconciliationSchedule (rate: 5 minutes)
  • IAM: Same permissions as webhook worker (DynamoDB read/write, S3 read/write, SQS send)
  • Environment variables: Same as webhook worker (uses the same DependencyContainer)

Future Extensions

The shopify_reconciliation_service component is designed to host additional reconciliation checks beyond collection membership:

  • Product data reconciliation — detect products in Marqo that have been deleted or unpublished in Shopify
  • Bulk sync verification — after a bulk sync completes, verify that the final document count matches expectations
  • Index config drift — detect mismatches between DynamoDB index settings and actual Marqo index state