Shopify Metafield Indexing
How Shopify metafields (product-level and variant-level) flow through the indexing pipeline and end up as searchable/filterable fields in Marqo.
Adding a computed/derived field? Read Indexing Pipeline Stages — what data is available where first. Variant metafields (and per-location inventory and market/country data) are enrichment-deferred in the bulk path — they do not exist when
transform_product_resultruns — so any field that depends on them must be computed in/after the enrichment merge, not in the transformer.
Architecture
Shopify metafields are key-value data attached to products or variants. They have a namespace, key, value, and type. Example: namespace="custom", key="web_filters_gender", value='["Men\'s"]', type="list.single_line_text_field".
Field Naming
Metafields are transformed into camelCase Marqo field names by ProductTransformer.build_camelcase_metafield_name():
prefix + namespace + key → camelCase
product_metafield + custom + season → productMetafieldCustomSeason
variant_metafield + custom + web_filters_gender → variantMetafieldCustomWebFiltersGender
Value Conversion
ProductTransformer.convert_metafield_value() converts values based on Shopify type:
| Shopify Type | Python Output |
|---|---|
single_line_text_field | str |
integer / number_integer | int |
decimal / number_decimal | float |
boolean | bool |
list.single_line_text_field | list[str] (JSON-parsed) |
json | str (raw, for Marqo compatibility) |
Data Flow
Product Metafields
Product metafields are fetched directly in both paths and processed by ProductTransformer.extract_product_metafields():
- Bulk export:
metafields(first: 250)on theproductsnode inBULK_EXPORT_ALL_PRODUCTS - Webhook:
metafields(first: 250)on theproductnode inENRICH_PRODUCT_DATA
Product metafields are duplicated across all result documents for that product.
Variant Metafields
Variant metafields use a two-path approach because of the Shopify bulk export 5-connection limit:
Bulk Sync Path
Shopify Bulk Export (5-conn limit, no variant metafields)
→ S3 → Stream JSONL → ProductAccumulator → Transformer
→ write unenriched chunk to S3, then per batch:
→ _fetch_enrichment_for_batch() → _apply_enrichment_to_docs()
→ GET_VARIANT_ENRICHMENT_DATA query:
nodes(ids: [...]) {
... on ProductVariant {
id
inventoryItem { inventoryLevels { ... } }
metafields(first: 250) { ... }
}
}
→ Sets variantMetafield* fields on docs via setattr()
→ S3 + SQS → ecom_indexer → Marqo
The enrichment query bundles variant metafields with per-location inventory into a single nodes(ids: [...]) API call (batched at 250 variants). This avoids the bulk export connection limit.
Webhook Path
Shopify Webhook → ENRICH_PRODUCT_DATA query
(includes metafields on both product and variant nodes)
→ _build_full_product_from_graphql()
→ extract_variant_metafields() from GraphQL edges
→ attaches to variant dicts
→ ProductTransformer.transform_product_result()
→ extract_variant_metafields() per variant
→ aggregates across variants in split group
→ variantMetafield* fields on result doc
Aggregation
When a result document aggregates multiple variants (the default behavior), variant metafield values are collected:
- Same value across all variants → scalar:
"Men's" - Different values → flat list of unique values:
["Men's", "Women's", "Unisex"] - List-type metafields (e.g.,
list.single_line_text_field) are flattened before aggregation to avoid nested lists
This works correctly with all aggregation modes:
split_products_by: each split doc only gets metafields from its own variantsgroup_variants_by: does not affect metafield aggregation (grouping only affects numeric fields like price/stock)aggregation_mode="none": one doc per variant, each has only its own metafieldsaggregation_mode="specific": metafields are always extracted regardless of which options get array aggregation
Shopify Bulk Export 5-Connection Limit
Shopify's bulkOperationRunQuery mutation limits queries to 5 connections (fields with edges/node). The current bulk export uses all 5:
products(top-level)variantsmediacollectionsmetafields(product-level only)
This is why variant metafields cannot be added to the bulk export query — there's no room for a 6th connection (variants.metafields). The enrichment step (GET_VARIANT_ENRICHMENT_DATA) works around this by fetching variant data via a separate regular GraphQL query that has no connection limit.
Key Files
| File | Role |
|---|---|
admin_server/graphql/queries/product_queries.py | GraphQL queries: ENRICH_PRODUCT_DATA, GET_VARIANT_ENRICHMENT_DATA |
admin_server/graphql/mutations/bulk_operations_mutations.py | BULK_EXPORT_ALL_PRODUCTS (5-conn limit) |
admin_server/services/shopify_service.py | get_variant_enrichment_data() — batch fetches inventory + metafields |
admin_server/handlers/bulk_sync_handler.py | _fetch_enrichment_for_batch() fetches, _apply_enrichment_to_docs() applies inventory + metafield fields to docs |
admin_server/transformers/product_transformer.py | extract_variant_metafields(), extract_product_metafields(), build_camelcase_metafield_name(), convert_metafield_value() |
admin_server/handlers/product_webhook_handler.py | extract_variant_metafields() — parses GraphQL edges format for webhook path |
Testing Against a Dev Store
- Check variant metafield definitions exist: Shopify Admin → Settings → Custom data → Variants
- Verify metafields are populated: Query variant metafields via GraphQL:
curl -X POST \-H 'Content-Type: application/json' \-H 'X-Shopify-Access-Token: <token>' \-d '{"query": "{ products(first: 1) { edges { node { title variants(first: 2) { edges { node { id metafields(first: 20) { edges { node { namespace key value type } } } } } } } } } }"}' \'https://<store>.myshopify.com/admin/api/2024-10/graphql.json'
- Trigger a bulk sync from the admin UI
- Check S3 output for
variantMetafield*fields in the JSONL - Search the index and verify fields appear on hits, and filtering works:
"filter": "variantMetafieldCustomWebFiltersGender:Men's"
Debugging
Processed 0 variant metafields across 0 variantsin webhook worker logs is expected for bulk sync — the bulk export JSONL doesn't contain variant metafields. They come from the enrichment step.- If variant metafield fields are missing, check:
- The enrichment step ran (look for warning logs about enrichment failures)
shopify_domainandaccess_tokenare passed toprocess_bulk_file_streaming- The metafield definitions exist on the Shopify store and have values populated on variants