Shopify Collection Membership Reconciliation
Problem
Shopify automated (smart) collections define membership via rules (e.g., "tag equals New HH AND inventory > 0"). When a product's attributes change and it no longer matches the rules, Shopify silently removes it from the collection. Two platform limitations cause our search index to go stale:
-
No webhook for membership changes. Shopify does not send
collections/updatewebhooks when products enter or leave automated collections due to rule re-evaluation. The webhook only fires when the collection itself is edited (title, rules, sort order). This is confirmed by Shopify staff as expected behaviour with no plans to change. -
Eventually consistent
collectionsfield. When a product's tag changes, theproducts/updatewebhook fires immediately. Our handler fetches the product via GraphQL, includingcollections(first: 100). But Shopify's automated collection rule re-evaluation is asynchronous — thecollectionsfield still returns the old membership for seconds to minutes after the attribute change. So we re-index the product with stale collection data.
The combination means: stale collection data is written, and no follow-up event ever corrects it. Drift persists indefinitely.
Observed Impact
Confirmed on Muji US, collection home-essentials-new-arrivals (automated, rule: tag = "New HH" AND inventory > 0):
| Metric | Count |
|---|---|
| Products in Shopify collection | 328 |
| Products in Marqo collection | 332 |
| Stale (in Marqo, not in Shopify) | 34 |
| Missing (in Shopify, not in Marqo) | 30 |
All 34 stale products had the "New HH" tag removed in a batch operation on 2025-05-28. The products/update webhook processed each within ~3 seconds, but the collections field hadn't updated yet.
Solution
shopify_reconciliation_service — a scheduled Lambda that periodically triggers our existing collection sync logic for every indexed collection, compensating for the missing webhook.
Architecture
EventBridge (every 5 min)
|
v
Reconciliation Lambda
|
v
For each active Shopify shop:
|
+-- Marqo facet query on productCollectionIds
| (returns all unique collection IDs in the index)
|
+-- For each collection ID (sequential):
|
+-- Create sync job record (DynamoDB)
+-- Call sync_service.process_collection_webhook() directly
|
+-- Fetch current members from Shopify (GraphQL)
+-- Search Marqo for stale references (filter by collection ID)
+-- If drift detected:
| Build partial update docs, enqueue to indexer via SQS
+-- If no drift:
Complete job with 0 items (no-op)
Key Design Decisions
- Zero code duplication. The Lambda calls
process_collection_webhook()directly — the same function the webhook worker uses. All drift detection, partial update building, S3 chunking, and SQS enqueuing is reused. - No Lambda fan-out. Collections are processed sequentially within a single Lambda invocation per shop. This avoids exhausting the AWS account's Lambda concurrency pool.
- Handles only membership drift. Collection creates and deletes are already handled reliably by Shopify webhooks. This service only compensates for the
collections/updategap on automated collections. - Graceful error isolation. If one shop or collection fails, the service logs the error and continues to the next. A single failure doesn't block reconciliation for other shops/collections.
Resource Usage
Shopify API
Shopify GraphQL uses a per-shop leaky bucket. The bucket size and restore rate depend on the shop's plan:
| Shopify Plan | Bucket size | Restore rate |
|---|---|---|
| Standard | 2,000 points | 50 pts/sec |
| Advanced | 4,000 points | 100 pts/sec |
| Shopify Plus | 20,000 points | 1,000 pts/sec |
Each collection products query costs 1 point. Per-shop usage per cycle:
| Collections per shop | Points used | Standard (2,000 bucket) | Plus (20,000 bucket) |
|---|---|---|---|
| 50 | 50 | 2.5% of bucket | 0.25% of bucket |
| 100 | 100 | 5% of bucket | 0.5% of bucket |
| 200 | 200 | 10% of bucket | 1% of bucket |
Because the bucket refills continuously (50–1,000 pts/sec depending on plan) and we process collections sequentially (~100ms+ network round-trip each), the bucket restores faster than we drain it. A Standard plan shop with 100 collections uses 100 points total; at 50 pts/sec restore that refills in 2 seconds. In practice, the reconciliation never comes close to throttling.
Batch product queries (only if drift is detected) cost ~2 additional points per drifted collection, but drift is rare in steady state.
Marqo
| Resource | Per collection | Notes |
|---|---|---|
| Stale product search | 1 search | filter: productCollectionIds:{id}, limit 1000 |
| Facet discovery query | 1 search per shop | Single query returns all collection IDs |
Searches are lightweight (filter-only, no vector computation). The facet discovery query runs once per shop per cycle.
AWS
| Resource | Usage |
|---|---|
| Lambda invocations | 1 per 5-minute cycle (single invocation processes all shops) |
| Lambda duration | ~30s–5min depending on number of shops/collections |
| Lambda memory | 512 MB |
| DynamoDB writes | 1 job record per collection per cycle (most are no-op completions) |
| SQS messages | Only when drift is detected (rare in steady state) |
| S3 writes | Only when drift is detected (partial update doc chunks) |
In steady state (no drift), the cost is: 1 Lambda invocation + N Marqo searches + N Shopify collection queries + N DynamoDB job records, where N = total collections across all shops.
Cost Estimate
For a typical deployment with 10 shops averaging 50 collections each:
- Per cycle: 1 Lambda invocation (30s @ 512MB), 500 Marqo searches, 500 Shopify API calls, 500 DynamoDB writes
- Per day: 288 cycles (every 5 min), ~144,000 Marqo searches, ~144,000 Shopify calls, ~144,000 DDB writes
- Monthly Lambda cost: ~$0.50 (288 invocations/day * 30 days * 30s avg * 512MB)
- Monthly DDB cost: ~$1.50 (144K writes/day * 30 days at $1.25/million WCU)
The dominant cost is Marqo search requests, which are internal and don't have a direct dollar cost.
Component Structure
components/shopify_reconciliation_service/
BUILD # Pants build target (ARM64 Lambda)
shopify_reconciliation_service/
BUILD # python_sources + python_tests
lambda_function.py # EventBridge handler entry point
collection_membership_sync.py # Core reconciliation logic
test_collection_membership_sync.py # Unit tests (13 tests)
Infrastructure
Defined in infra/ecom/stacks/shopify_admin_stack.py:
- Lambda:
ShopifyReconciliation(Python 3.11, ARM64, 512MB, 15min timeout) - EventBridge rule:
ShopifyReconciliationSchedule(rate: 5 minutes) - IAM: Same permissions as webhook worker (DynamoDB read/write, S3 read/write, SQS send)
- Environment variables: Same as webhook worker (uses the same
DependencyContainer)
Future Extensions
The shopify_reconciliation_service component is designed to host additional reconciliation checks beyond collection membership:
- Product data reconciliation — detect products in Marqo that have been deleted or unpublished in Shopify
- Bulk sync verification — after a bulk sync completes, verify that the final document count matches expectations
- Index config drift — detect mismatches between DynamoDB index settings and actual Marqo index state