Grill Retrieval
POST /grill/search is Grill's main retrieval endpoint. It runs hybrid (lexical + semantic) search across the project namespace and returns a RetrievalContext — a prompt-ready XML + Markdown block, not a list of ranked hits.
This page covers the request shape, how the four tuning parameters interact, and when to use searchInDoc instead.
Request
POST /v3/grill/search HTTP/1.1
Authorization: Bearer <project-api-key>
Content-Type: application/json
{
"query": "How did operating margin change year over year?",
"min_relevance": 0.3,
"max_tokens": 4000,
"doc_filter": null,
"exclude_doc_ids": [],
"return_assets": false,
"return_page_images": false
}| Field | Type | Default | Purpose |
|---|---|---|---|
query (required) | string | — | Natural-language query. Hybrid retrieval splits this into a lexical term set and a vector embedding internally. |
min_relevance | float 0..1 | unset | Drops hits whose relevance score falls below this threshold. |
max_tokens | integer | unset | Token budget for the rendered context. Lower-ranked hits are dropped (sandwich-aware) until the block fits. |
doc_filter | string | unset | Optional document id. When set, restricts retrieval to a single doc. (Use /grill/searchInDoc if you want this required.) |
exclude_doc_ids | array of string | unset | Doc ids to exclude from results (max 100). Useful in agent loops to avoid re-citing docs already shown. |
return_assets | boolean | false | When true, asset references (figures, tables) are included alongside the matched chunks. |
return_page_images | boolean | false | When true, page-image references are included. Useful for visual citations. |
No
top_k. Result count is bounded server-side by relevance + token budget. The recall stage isn't a tunable parameter; tune precision viamin_relevanceand prompt size viamax_tokens.
Response
{
"context": "<doc id=\"...\">…</doc>"
}A single context string. That is intentional — Grill's contract is "ready to drop into a prompt", not "here is JSON for you to format". For the wrapper grammar (<doc>, <gap>, sandwich order, citation attributes), see RetrievalContext format.
How the parameters interact
Think of retrieval as a two-stage pipeline:
┌──── filter ────┐ ┌──── render ────┐
query ─▶ hybrid search ─▶ score ≥ min_relevance ─▶ fit ≤ max_tokens (sandwich) ─▶ contextmin_relevancecontrols precision. Anything below the threshold is discarded outright, even if it would have fit the budget. Useful for "I'd rather return less than return junk" queries (e.g. legal Q&A).max_tokenscontrols the prompt budget. After thresholding, hits are placed in sandwich order; if the block overflows, the lowest-ranked hits are evicted first until it fits. Hits at the very top and very bottom of the order survive longest.
A typical chat-with-knowledge-base setup looks like:
{ "query": "...", "min_relevance": 0.3, "max_tokens": 6000 }A "strict citations" assistant tightens both filters:
{ "query": "...", "min_relevance": 0.5, "max_tokens": 2500 }Searching a single document
When you already know which document the answer must come from — "chat with this PDF", a per-document help bot, a contract Q&A flow — use POST /grill/searchInDoc. It is the same shape as /grill/search but doc_filter is required and rejected as 400 if missing.
curl -sS -X POST "$GRILL/grill/searchInDoc" \
-H "authorization: Bearer $GRILL_KEY" \
-H "content-type: application/json" \
-d '{
"query": "What is the cancellation policy?",
"doc_filter": "terms_of_service_v3",
"max_tokens": 3000
}'Why a separate endpoint instead of just setting doc_filter on /grill/search? Two reasons:
- Safety.
searchInDocmakes the per-document scope a contract, not an option — you cannot accidentally drop the filter and silently retrieve from the entire namespace. - Validation. The server rejects an empty
doc_filterup front, so a typo that would otherwise become a "search the world" bug fails fast.
The response is the same RetrievalContext shape.
From the Python SDK
The SDK mirrors the endpoint split — g.search(...) for namespace-wide retrieval, g.search_in_doc(...) for per-document retrieval. The doc_filter argument is required positionally on search_in_doc so the safety guarantee carries through to Python.
from poma import Grill
g = Grill()
# Namespace-wide retrieval
ctx = g.search(
"How did operating margin change year over year?",
min_relevance=0.3,
max_tokens=6000,
)
print(ctx.context)
# Scoped to one document — doc_filter is positional and required
ctx = g.search_in_doc(
"What is the cancellation policy?",
"terms_of_service_v3",
max_tokens=3000,
)GrillContext.context is the same prompt-ready XML + Markdown block you'd get from curl. Full signature: Grill.search / Grill.search_in_doc.
Errors
| Status | When | Notes |
|---|---|---|
400 | Missing query, missing doc_filter (on searchInDoc), or upstream validation error | Body is plain text. |
401 | Missing or invalid token | Use a project API key. |
403 | Project lacks Grill access (primecut-only project) | Create or switch to a Grill project. |
404 | doc_filter references an unknown document | Confirm the doc exists with GET /grill/docs. |
502 | Other upstream Grill / proxy errors | Retry with backoff. |
503 | Grill backend unreachable | Retry; if persistent, check status page. |
Upstream Grill errors are mapped to these statuses based on the upstream message, so you'll see a consistent contract regardless of which internal subsystem failed.
Tuning checklist
When retrieval feels off, work down this list:
- No matches at all. Check the doc is in
GET /grill/docs. Dropmin_relevance(or unset it). - Junk in the context. Raise
min_relevance(e.g.0.3→0.5). - Truncated context. Raise
max_tokens, or narrow the query so fewer hits are needed. - Slow latency / heavy payload. Drop
return_assets/return_page_imageswhen you don't need them; lowermax_tokensso less context is rendered. - Wrong document picked. If you know the answer is in one doc, switch to
/grill/searchInDoc— that removes the cross-document ambiguity entirely. To exclude already-cited docs in an agent loop, passexclude_doc_ids.
Next
- RetrievalContext format — the grammar of the returned block.
- API reference — full request/response schemas.