Skip to content

Grill Retrieval

POST /grill/search is Grill's main retrieval endpoint. It runs hybrid (lexical + semantic) search across the project namespace and returns a RetrievalContext — a prompt-ready XML + Markdown block, not a list of ranked hits.

This page covers the request shape, how the four tuning parameters interact, and when to use searchInDoc instead.

Request

http
POST /v3/grill/search HTTP/1.1
Authorization: Bearer <project-api-key>
Content-Type: application/json

{
  "query": "How did operating margin change year over year?",
  "min_relevance": 0.3,
  "max_tokens": 4000,
  "doc_filter": null,
  "exclude_doc_ids": [],
  "return_assets": false,
  "return_page_images": false
}
FieldTypeDefaultPurpose
query (required)stringNatural-language query. Hybrid retrieval splits this into a lexical term set and a vector embedding internally.
min_relevancefloat 0..1unsetDrops hits whose relevance score falls below this threshold.
max_tokensintegerunsetToken budget for the rendered context. Lower-ranked hits are dropped (sandwich-aware) until the block fits.
doc_filterstringunsetOptional document id. When set, restricts retrieval to a single doc. (Use /grill/searchInDoc if you want this required.)
exclude_doc_idsarray of stringunsetDoc ids to exclude from results (max 100). Useful in agent loops to avoid re-citing docs already shown.
return_assetsbooleanfalseWhen true, asset references (figures, tables) are included alongside the matched chunks.
return_page_imagesbooleanfalseWhen true, page-image references are included. Useful for visual citations.

No top_k. Result count is bounded server-side by relevance + token budget. The recall stage isn't a tunable parameter; tune precision via min_relevance and prompt size via max_tokens.

Response

json
{
  "context": "<doc id=\"...\">…</doc>"
}

A single context string. That is intentional — Grill's contract is "ready to drop into a prompt", not "here is JSON for you to format". For the wrapper grammar (<doc>, <gap>, sandwich order, citation attributes), see RetrievalContext format.

How the parameters interact

Think of retrieval as a two-stage pipeline:

text
                ┌──── filter ────┐  ┌──── render ────┐
query ─▶ hybrid search ─▶ score ≥ min_relevance ─▶ fit ≤ max_tokens (sandwich) ─▶ context
  • min_relevance controls precision. Anything below the threshold is discarded outright, even if it would have fit the budget. Useful for "I'd rather return less than return junk" queries (e.g. legal Q&A).
  • max_tokens controls the prompt budget. After thresholding, hits are placed in sandwich order; if the block overflows, the lowest-ranked hits are evicted first until it fits. Hits at the very top and very bottom of the order survive longest.

A typical chat-with-knowledge-base setup looks like:

json
{ "query": "...", "min_relevance": 0.3, "max_tokens": 6000 }

A "strict citations" assistant tightens both filters:

json
{ "query": "...", "min_relevance": 0.5, "max_tokens": 2500 }

Searching a single document

When you already know which document the answer must come from — "chat with this PDF", a per-document help bot, a contract Q&A flow — use POST /grill/searchInDoc. It is the same shape as /grill/search but doc_filter is required and rejected as 400 if missing.

bash
curl -sS -X POST "$GRILL/grill/searchInDoc" \
  -H "authorization: Bearer $GRILL_KEY" \
  -H "content-type: application/json" \
  -d '{
    "query": "What is the cancellation policy?",
    "doc_filter": "terms_of_service_v3",
    "max_tokens": 3000
  }'

Why a separate endpoint instead of just setting doc_filter on /grill/search? Two reasons:

  • Safety. searchInDoc makes the per-document scope a contract, not an option — you cannot accidentally drop the filter and silently retrieve from the entire namespace.
  • Validation. The server rejects an empty doc_filter up front, so a typo that would otherwise become a "search the world" bug fails fast.

The response is the same RetrievalContext shape.

From the Python SDK

The SDK mirrors the endpoint split — g.search(...) for namespace-wide retrieval, g.search_in_doc(...) for per-document retrieval. The doc_filter argument is required positionally on search_in_doc so the safety guarantee carries through to Python.

python
from poma import Grill

g = Grill()

# Namespace-wide retrieval
ctx = g.search(
    "How did operating margin change year over year?",
    min_relevance=0.3,
    max_tokens=6000,
)
print(ctx.context)

# Scoped to one document — doc_filter is positional and required
ctx = g.search_in_doc(
    "What is the cancellation policy?",
    "terms_of_service_v3",
    max_tokens=3000,
)

GrillContext.context is the same prompt-ready XML + Markdown block you'd get from curl. Full signature: Grill.search / Grill.search_in_doc.

Errors

StatusWhenNotes
400Missing query, missing doc_filter (on searchInDoc), or upstream validation errorBody is plain text.
401Missing or invalid tokenUse a project API key.
403Project lacks Grill access (primecut-only project)Create or switch to a Grill project.
404doc_filter references an unknown documentConfirm the doc exists with GET /grill/docs.
502Other upstream Grill / proxy errorsRetry with backoff.
503Grill backend unreachableRetry; if persistent, check status page.

Upstream Grill errors are mapped to these statuses based on the upstream message, so you'll see a consistent contract regardless of which internal subsystem failed.

Tuning checklist

When retrieval feels off, work down this list:

  1. No matches at all. Check the doc is in GET /grill/docs. Drop min_relevance (or unset it).
  2. Junk in the context. Raise min_relevance (e.g. 0.30.5).
  3. Truncated context. Raise max_tokens, or narrow the query so fewer hits are needed.
  4. Slow latency / heavy payload. Drop return_assets / return_page_images when you don't need them; lower max_tokens so less context is rendered.
  5. Wrong document picked. If you know the answer is in one doc, switch to /grill/searchInDoc — that removes the cross-document ambiguity entirely. To exclude already-cited docs in an agent loop, pass exclude_doc_ids.

Next