Skip to main content

Shopify Metafield Indexing

How Shopify metafields (product-level and variant-level) flow through the indexing pipeline and end up as searchable/filterable fields in Marqo.

Adding a computed/derived field? Read Indexing Pipeline Stages — what data is available where first. Variant metafields (and per-location inventory and market/country data) are enrichment-deferred in the bulk path — they do not exist when transform_product_result runs — so any field that depends on them must be computed in/after the enrichment merge, not in the transformer.

Architecture

Shopify metafields are key-value data attached to products or variants. They have a namespace, key, value, and type. Example: namespace="custom", key="web_filters_gender", value='["Men\'s"]', type="list.single_line_text_field".

Field Naming

Metafields are transformed into camelCase Marqo field names by ProductTransformer.build_camelcase_metafield_name():

prefix + namespace + key → camelCase

product_metafield + custom + season → productMetafieldCustomSeason
variant_metafield + custom + web_filters_gender → variantMetafieldCustomWebFiltersGender

Value Conversion

ProductTransformer.convert_metafield_value() converts values based on Shopify type:

Shopify TypePython Output
single_line_text_fieldstr
integer / number_integerint
decimal / number_decimalfloat
booleanbool
list.single_line_text_fieldlist[str] (JSON-parsed)
jsonstr (raw, for Marqo compatibility)

Data Flow

Product Metafields

Product metafields are fetched directly in both paths and processed by ProductTransformer.extract_product_metafields():

  • Bulk export: metafields(first: 250) on the products node in BULK_EXPORT_ALL_PRODUCTS
  • Webhook: metafields(first: 250) on the product node in ENRICH_PRODUCT_DATA

Product metafields are duplicated across all result documents for that product.

Variant Metafields

Variant metafields use a two-path approach because of the Shopify bulk export 5-connection limit:

Bulk Sync Path

Shopify Bulk Export (5-conn limit, no variant metafields)
→ S3 → Stream JSONL → ProductAccumulator → Transformer
→ write unenriched chunk to S3, then per batch:
→ _fetch_enrichment_for_batch() → _apply_enrichment_to_docs()
→ GET_VARIANT_ENRICHMENT_DATA query:
nodes(ids: [...]) {
... on ProductVariant {
id
inventoryItem { inventoryLevels { ... } }
metafields(first: 250) { ... }
}
}
→ Sets variantMetafield* fields on docs via setattr()
→ S3 + SQS → ecom_indexer → Marqo

The enrichment query bundles variant metafields with per-location inventory into a single nodes(ids: [...]) API call (batched at 250 variants). This avoids the bulk export connection limit.

Webhook Path

Shopify Webhook → ENRICH_PRODUCT_DATA query
(includes metafields on both product and variant nodes)
→ _build_full_product_from_graphql()
→ extract_variant_metafields() from GraphQL edges
→ attaches to variant dicts
→ ProductTransformer.transform_product_result()
→ extract_variant_metafields() per variant
→ aggregates across variants in split group
→ variantMetafield* fields on result doc

Aggregation

When a result document aggregates multiple variants (the default behavior), variant metafield values are collected:

  • Same value across all variants → scalar: "Men's"
  • Different values → flat list of unique values: ["Men's", "Women's", "Unisex"]
  • List-type metafields (e.g., list.single_line_text_field) are flattened before aggregation to avoid nested lists

This works correctly with all aggregation modes:

  • split_products_by: each split doc only gets metafields from its own variants
  • group_variants_by: does not affect metafield aggregation (grouping only affects numeric fields like price/stock)
  • aggregation_mode="none": one doc per variant, each has only its own metafields
  • aggregation_mode="specific": metafields are always extracted regardless of which options get array aggregation

Shopify Bulk Export 5-Connection Limit

Shopify's bulkOperationRunQuery mutation limits queries to 5 connections (fields with edges/node). The current bulk export uses all 5:

  1. products (top-level)
  2. variants
  3. media
  4. collections
  5. metafields (product-level only)

This is why variant metafields cannot be added to the bulk export query — there's no room for a 6th connection (variants.metafields). The enrichment step (GET_VARIANT_ENRICHMENT_DATA) works around this by fetching variant data via a separate regular GraphQL query that has no connection limit.

Key Files

FileRole
admin_server/graphql/queries/product_queries.pyGraphQL queries: ENRICH_PRODUCT_DATA, GET_VARIANT_ENRICHMENT_DATA
admin_server/graphql/mutations/bulk_operations_mutations.pyBULK_EXPORT_ALL_PRODUCTS (5-conn limit)
admin_server/services/shopify_service.pyget_variant_enrichment_data() — batch fetches inventory + metafields
admin_server/handlers/bulk_sync_handler.py_fetch_enrichment_for_batch() fetches, _apply_enrichment_to_docs() applies inventory + metafield fields to docs
admin_server/transformers/product_transformer.pyextract_variant_metafields(), extract_product_metafields(), build_camelcase_metafield_name(), convert_metafield_value()
admin_server/handlers/product_webhook_handler.pyextract_variant_metafields() — parses GraphQL edges format for webhook path

Testing Against a Dev Store

  1. Check variant metafield definitions exist: Shopify Admin → Settings → Custom data → Variants
  2. Verify metafields are populated: Query variant metafields via GraphQL:
    curl -X POST \
    -H 'Content-Type: application/json' \
    -H 'X-Shopify-Access-Token: <token>' \
    -d '{"query": "{ products(first: 1) { edges { node { title variants(first: 2) { edges { node { id metafields(first: 20) { edges { node { namespace key value type } } } } } } } } } }"}' \
    'https://<store>.myshopify.com/admin/api/2024-10/graphql.json'
  3. Trigger a bulk sync from the admin UI
  4. Check S3 output for variantMetafield* fields in the JSONL
  5. Search the index and verify fields appear on hits, and filtering works:
    "filter": "variantMetafieldCustomWebFiltersGender:Men's"

Debugging

  • Processed 0 variant metafields across 0 variants in webhook worker logs is expected for bulk sync — the bulk export JSONL doesn't contain variant metafields. They come from the enrichment step.
  • If variant metafield fields are missing, check:
    1. The enrichment step ran (look for warning logs about enrichment failures)
    2. shopify_domain and access_token are passed to process_bulk_file_streaming
    3. The metafield definitions exist on the Shopify store and have values populated on variants