Skip to main content

Flow: Agentic Search (AI-Powered Conversational Search)

AI-powered search using Google Gemini to categorize queries, generate summaries, and support multi-turn conversations.

Request Path

Step-by-Step

1. Search Proxy Routes to Agentic Worker

Where: components/search_proxy/src/worker.ts

The search proxy receives the agentic search request and calls AGENTIC_SEARCH_WORKER via Cloudflare service binding RPC.

Inspect: Tail both workers — see Cloudflare Workers.

npx wrangler tail {env}-ecom-api
npx wrangler tail {env}-agentic-search

2. Payload Validation & Feature Check

Where: components/agentic_search/src/index.ts

Base64-decoded payload validated against Zod schema. Checks feature_flags.agentic_search is enabled.

Failure: 400 if feature not enabled or invalid payload.

3. Three-Tier Cache Check

Where: components/agentic_search/src/ddb-cached-lookup.ts

  1. Cloudflare Cache API (2hr TTL, with 60s stale-while-revalidate) — key: https://internal/agentic-cached-query/{accountId}/{indexName}/{normalized_query}. Staleness check uses 1hr threshold; Cloudflare max-age is set to 2x (7200s) so entries survive past staleness for the stale-while-revalidate pattern.
  2. XOR Filter (bloom filter) — negative filter: if query not in filter, skip DDB entirely
  3. DynamoDB ({env}-AgenticCachedQueriesTable) — direct key lookup

If cached and eligible (based on cached_query_percentage config + session bucketing):

  • Sends cached summary + executes category searches
  • Skips LLM entirely

Inspect: Check cached queries — see DynamoDB.

aws dynamodb get-item --table-name {env}-AgenticCachedQueriesTable \
--key '{"pk": {"S": "{accountId}#{indexName}#{normalized_query}"}}'

4. Short Query Bypass

Queries with fewer than 4 words skip the LLM entirely — returns basic Marqo search results.

5. Initial Marqo Search (Parallel)

Calls SEARCH_PROXY_WORKER.handleSearch() via service binding RPC. Runs in parallel with LLM call.

6. LLM Phase 1: Function Calling

Where: components/agentic_search/src/search/agentic-search.ts

Calls Google Gemini (gemini-2.5-flash) with:

  • System instructions (base prompt or per-account custom prompt)
  • Conversation history (if continuing)
  • execute_search function tool

LLM generates JSON with:

  • query_expansions: array of [query, category_label, confidence]
  • When agentic_config.filter_facets.enabled is enabled and facet context is available, each expansion may include an optional 4th element: a Marqo filter DSL string
  • query_complements: related search terms
  • summary: text with angle-bracket wrapped product links

Streaming: Uses stream-json library to parse LLM JSON output in real-time. Category searches fire as soon as each expansion is parsed (not waiting for full response).

Inspect: Check Gemini API key — see Secrets Manager. Key stored as GOOGLE_API_KEY env var.

7. Category Searches (Parallel)

For each LLM-generated query expansion, executes a Marqo search via SEARCH_PROXY_WORKER.handleSearch(). Up to 6 parallel searches.

If an agent-constructed filter is present, it is merged with any client-provided filter. If the merged filter yields 0 hits, the worker retries without the agent filter and marks the streamed category payload with filterDropped + originalFilter.

The filter that was actually applied is surfaced as appliedFilter on each categoryHits[] item.

8. LLM Phase 2: Summary Generation (Optional)

If agentic_config.enable_summary is true:

  • Second Gemini call with function calling DISABLED
  • Synthesizes summary from search results
  • Streamed as delta SSE events with text chunks

9. Conversation State (Optional)

Where: components/agentic_search/src/conversation-do.ts

If enableConversation is true:

  • Durable Object (ConversationSqlDO) stores conversation context in SQLite
  • Trimmed to 10 interactions for LLM, 100 for storage
  • Auto-cleanup after 30 days of inactivity (DO alarm)

Inspect: Durable Objects visible in Cloudflare dashboard.

10. SSE Response Stream

Event types sent to client (agentic search path):

init → { categories: boolean, summary: boolean }
delta → { summary?, categoryHits?, hits?, facets?, status?, error? }
stream-end → {}

The converse path (handleConverse) uses different event types:

message → text delta chunks
category-hits → search category results
conversation-id → { conversationId }
error → error details
stream-end → {}

Converse (Multi-Turn Chat)

Similar to agentic search but:

  • Always uses function calling (no separate summary phase)
  • Supports document context injection (fetches docs by ID)
  • Supports image analysis (up to 5 total via imageUrls + imageIds)
  • imageIds are uploaded via POST /api/v1/indexes/:index/agentic-search/images and resolved from R2
  • Conversation history preserved in Durable Object

Chat Suggestions

handleChatSuggestions() — generates 3-5 follow-up questions for a document:

  1. Fetch document via search proxy RPC
  2. Call Gemini Lite (gemini-2.5-flash-lite)
  3. Cache result in Cloudflare Cache API (30-day TTL)

Performance Metrics

Where: components/agentic_search/src/timer.ts

Logged at completion as agentic_search_completed:

{
"source": "llm|cached|short_query_bypass",
"timings": {
"E2E": 2500,
"INITIAL_MARQO_SEARCH": 200,
"LLM_FIRST_CALL": 1800,
"LLM_SECOND_CALL": 800,
"CATEGORY_SEARCH_TOTAL": 400,
"CACHED_QUERY_LOOKUP": 15
}
}

What to Look For

SymptomWhere to Check
Agentic search not availablefeature_flags.agentic_search in settings. Check Settings Sync.
LLM errorsGemini API key missing/invalid. Check GOOGLE_API_KEY in Secrets Manager.
Slow responsesCheck timings in completion log. LLM latency? Marqo latency?
Cache not workingCheck DDB table. Check XOR filter rebuild. Check CF cache headers (2hr max-age, 1hr staleness threshold). Check cached_query_percentage config and session bucketing.
Conversation context lostDurable Object cleanup (30-day inactivity). Check DO console.
5xx from agentic workerTail {env}-agentic-search worker. Check Gemini API status.
Empty categoriesLLM returned no query expansions. Check system prompt. Short query bypass.
Streaming errorsCheck SSE error events in delta. Partial results may still be valid.