Skip to main content

DynamoDB Access-Patterns Cheat Sheet

Read the access pattern before hand-rolling a query. Never scan a prod table to find one item — every load-bearing lookup below already has a key or GSI path, and the repository code is the authoritative source for the key format. Querying-by-guess is how a recent revco resync burned four wrong aws dynamodb invocations and one full-table scan before someone read get_shop_job_by_id and found the GSI_JobLookup_v2 path that had existed all along.

This page covers only the load-bearing tables (the long tail goes stale and nobody trusts it). For the generic CLI primitives (list-tables, describe-table, table naming by env) see resources/dynamodb.md. For the ecommerce data-flow context see components/ecommerce.md.

All snippets are read-only (query / get-item, never put/update/ delete) and use --profile controller --region us-east-1 against prod. Swap the prod- prefix for staging- / dev-<branch>- per resources/dynamodb.md.

Deserializing output without boto3

The controller box's system python3 has no boto3, so you cannot rely on TypeDeserializer. Either run through pants, or pipe the raw DynamoDB-JSON through a small hand-rolled deserializer that unwraps the single-key type descriptors ({"S": ...}, {"N": ...}, {"M": ...}, {"L": ...}):

aws dynamodb query ... --profile controller --region us-east-1 --output json \
| python3 -c '
import sys, json

def und(v):
(t, x), = v.items()
if t == "S": return x
if t == "N": return float(x) if ("." in x or "e" in x.lower()) else int(x)
if t == "BOOL": return x
if t == "NULL": return None
if t == "M": return {k: und(w) for k, w in x.items()}
if t == "L": return [und(w) for w in x]
if t in ("SS", "NS"): return x
return x

data = json.load(sys.stdin)
for it in data.get("Items", []):
print(json.dumps({k: und(v) for k, v in it.items()}))
'

The same und() helper works on a get-item response by iterating over data["Item"].items() instead of data["Items"].


1. Indexer / sync jobs — prod-EcomIndexerJobsTable

Tracks every bulk, webhook, and incremental sync job.

Source of truth: sync_job_repository.py (GSI names sync_job_repository.py:23-24, pk/sk construction sync_job_repository.py:104-117), sync_job_core_model.py (schema sync_job_core_model.py:35-51), and the table/GSI definition in ecom_stack.py (setup_indexer_jobs_table, ecom_stack.py:341-406).

Key schema

KeyFormatNotes
pk (S)PLATFORM#{platform}#SHOP#{shop_id}platform is the enum value (shopify, ecom, …).
sk (S)JOB#{created_at}#{job_id}created_at is an ISO-8601 timestamp; leading component, so a time window maps onto a sort-key range.
TTLttlUnix ts, 30-day auto-cleanup.

⚠️ The pk trap. shop_id is the composite index name {system_account_id}-{index_name} (e.g. kl7a9h55-shopify-e513a8-4), not the shop domain e513a8-4.myshopify.com — even though the admin console shows the shop domain. See the shop_id field description, sync_job_core_model.py:49-51. If your pk-keyed query returns nothing, check that you used the index name, not the domain.

GSIs

GSIpkskProjectionUse
GSI_JobLookup_v2job_id (bare)created_atKEYS_ONLY"I have a job id, find the row." Returns only pk/sk/job_id/created_at — you must then get-item the base row.
GSI_JobsByStatus_v2shop_idstatus_created_at ({status}#{created_at})INCLUDE (listing attrs)"Find active/pending jobs for a shop."

GSI_JobsByStatus_v2 partitions on shop_id alone, so a shop on multiple platforms can return jobs from all of them — add a platform = :platform filter when it matters (see list_shop_jobs_by_status, sync_job_repository.py:890-1049).

How to look up X

  • I have a job id → the row: query GSI_JobLookup_v2 on job_id, then get-item the base row by the returned pk/sk. (Mirrors get_shop_job_by_id, sync_job_repository.py:831-888.)
  • Active / pending jobs for a shop: query GSI_JobsByStatus_v2 on shop_id + begins_with(status_created_at, "PENDING#") (or "IN_PROGRESS#").
  • Latest N jobs for an index: query the base table on the pk with ScanIndexForward=false, Limit=N.

Canonical CLI

Lookup by job id (two steps — the GSI is KEYS_ONLY):

# 1) Find the base-table key for the job id.
aws dynamodb query \
--table-name prod-EcomIndexerJobsTable \
--index-name GSI_JobLookup_v2 \
--key-condition-expression "job_id = :jid" \
--expression-attribute-values '{":jid": {"S": "<job-id>"}}' \
--profile controller --region us-east-1

# 2) Hydrate the full row using the pk/sk from step 1.
aws dynamodb get-item \
--table-name prod-EcomIndexerJobsTable \
--key '{"pk": {"S": "PLATFORM#shopify#SHOP#kl7a9h55-shopify-e513a8-4"}, "sk": {"S": "JOB#<created_at>#<job-id>"}}' \
--profile controller --region us-east-1

List by status for a shop (newest first):

aws dynamodb query \
--table-name prod-EcomIndexerJobsTable \
--index-name GSI_JobsByStatus_v2 \
--key-condition-expression "shop_id = :sid AND begins_with(status_created_at, :sp)" \
--expression-attribute-values '{":sid": {"S": "kl7a9h55-shopify-e513a8-4"}, ":sp": {"S": "IN_PROGRESS#"}}' \
--no-scan-index-forward \
--profile controller --region us-east-1

Latest N jobs for an index (base table, newest first):

aws dynamodb query \
--table-name prod-EcomIndexerJobsTable \
--key-condition-expression "pk = :pk AND begins_with(sk, :skp)" \
--expression-attribute-values '{":pk": {"S": "PLATFORM#shopify#SHOP#kl7a9h55-shopify-e513a8-4"}, ":skp": {"S": "JOB#"}}' \
--no-scan-index-forward --max-items 10 \
--profile controller --region us-east-1

2. Index settings — prod-EcomIndexSettingsTable

One unified config record per index (create_index / search / collections / add_docs settings), plus sub-records (saved queries, search profiles, agentic config) that share the INDEX# prefix.

Source of truth: index_settings_repository.py (pk/sk + sub-record SK shapes, index_settings_repository.py:39-50, 106-114, 163-164, 509-534; DEFAULT_CONFIGS key index_settings_repository.py:47-50), index_settings_model.py (schema index_settings_model.py:928-945), and ecom_stack.py (setup_index_settings_table, ecom_stack.py:408-460).

Key schema

KeyFormatExample
pk (S){system_account_id}kl7a9h55
sk (S)INDEX#{index_name} (root record — exactly one #)INDEX#shopify-e513a8-4
sk (S)INDEX#DEFAULT_CONFIGSaccount-level default config record
sk (S)INDEX#{index_name}#PROFILE#SEARCH#{profile}INDEX#products#PROFILE#SEARCH#default

Root records carry exactly one # in the SK; every sub-record shape (#QUERY#…, #PROFILE#SEARCH#…, #AGENTIC_CONFIG, …) appends further # segments. See _exclude_non_index_settings_items, index_settings_repository.py:509-534.

GSIs

GSIpkskUse
GSI_IndexRootsByNamepkindex_nameSparse — only root records have index_name. List index roots without scanning sub-records.
GSI_ShopifyDomainshopify_domainskResolve a Shopify domain → index settings (webhook routing). Sparse: only rows with shopify_domain set.

How to look up X

  • An index's full config: get-item by pk = system_account_id, sk = INDEX#{index_name}. (Mirrors get_index_config, index_settings_repository.py:150-173.)
  • Just the add_docs_config: same get-item, with a projection on add_docs_config.
  • The account-level defaults: get-item sk = INDEX#DEFAULT_CONFIGS.

Canonical CLI

Get an index's add_docs_config:

aws dynamodb get-item \
--table-name prod-EcomIndexSettingsTable \
--key '{"pk": {"S": "kl7a9h55"}, "sk": {"S": "INDEX#shopify-e513a8-4"}}' \
--projection-expression "add_docs_config" \
--profile controller --region us-east-1

Drop --projection-expression for the whole record. To list every config record for an account, query the pk with begins_with(sk, "INDEX#").


3. Shopify entities — prod-ShopifyEntitiesTable

Single-table store for per-shop OAuth sessions, API keys, and settings. This is where scripts retrieve the Shopify Admin API access token.

Source of truth: shopify_entities.py (entity models + SK shapes, shopify_entities.py:15-50), database.py (key prefixes, database.py:8-42), shopify_graphql.py (token retrieval, get_shopify_access_token, shopify_graphql.py:57-113), and shopify_admin_stack.py (setup_shopify_entities_table, shopify_admin_stack.py:828-876).

Key schema

KeyFormatentity_type
pk (S)SHOP#{shop_domain}— (e.g. SHOP#cool-store.myshopify.com)
sk (S)USER#{user_id}SESSION (OAuth session, has access_token)
sk (S)API_KEYAPI_KEY
sk (S)SETTINGSSETTINGS (UI components, active_index, system_account_id)

Here the pk is the shop domain (contrast with the jobs table, which keys on the composite index name). Sessions support multiple users per shop, so there can be several USER#… rows.

GSI

GSIpkskUse
GSI_SystemAccountIdsystem_account_idskSparse — only SETTINGS rows that carry system_account_id. List all shops for a Marqo account.

How to look up X

  • The Shopify access token for a shop: query the pk with begins_with(sk, "USER#") and entity_type = SESSION, then pick the session with the latest last_updated and read access_token. (This is exactly get_shopify_access_token, shopify_graphql.py:57-113 — prefer running that script over hand-rolling, since it handles AWS SSO and session selection.)
  • A shop's settings / active index: get-item sk = SETTINGS.
  • All shops for an account: query GSI_SystemAccountId on system_account_id.

Canonical CLI

Find sessions (and their access tokens) for a shop:

aws dynamodb query \
--table-name prod-ShopifyEntitiesTable \
--key-condition-expression "pk = :pk AND begins_with(sk, :skp)" \
--filter-expression "entity_type = :et" \
--expression-attribute-values '{":pk": {"S": "SHOP#cool-store.myshopify.com"}, ":skp": {"S": "USER#"}, ":et": {"S": "SESSION"}}' \
--profile controller --region us-east-1

Pick the row with the latest last_updated and read access_token. For the full flow (SSO + selection) prefer:

python scripts/ecom/shopify_graphql.py \
--shop cool-store.myshopify.com --env prod --profile controller \
--query '{ shop { name } }'

Get a shop's settings record:

aws dynamodb get-item \
--table-name prod-ShopifyEntitiesTable \
--key '{"pk": {"S": "SHOP#cool-store.myshopify.com"}, "sk": {"S": "SETTINGS"}}' \
--profile controller --region us-east-1