Grill Retrieval
POST /grill/search is Grill's main retrieval endpoint. It runs hybrid (lexical + semantic) search across the project namespace and returns a RetrievalContext — a prompt-ready XML + Markdown block, not a list of ranked hits.
This page covers the request shape, how the four tuning parameters interact, and when to use searchInDoc instead.
Request
POST /v3/grill/search HTTP/1.1
Authorization: Bearer <project-api-key>
Content-Type: application/json
{
"query": "How did operating margin change year over year?",
"min_relevance": 0.3,
"target_tokens": 6000,
"max_tokens": 16000,
"doc_filter": null,
"exclude_doc_ids": [],
"return_assets": false,
"return_page_images": false
}| Field | Type | Default | Purpose |
|---|---|---|---|
query (required) | string | — | Natural-language query. Hybrid retrieval splits this into a lexical term set and a vector embedding internally. |
min_relevance | float 0..1 | 0.3 | Relevance floor: hits scoring below it are dropped. Higher = stricter (return less, drop borderline hits); lower = more permissive. 0.3 is the balanced default. |
target_tokens | integer | 6000 | Soft token budget — the typical answer size. Hits are admitted best-first (sandwich-aware) up to this. This is the knob to size a response. |
max_tokens | integer | 16000 | Hard ceiling. Grill only expands past target_tokens toward it for tight multi-document clusters. Raised up to target_tokens if set lower, so it can't shrink a response on its own. |
doc_filter | string | unset | Optional document id. When set, restricts retrieval to a single doc. (Use /grill/searchInDoc if you want this required.) |
doc_ids | array of string | unset | Restrict retrieval to this set of document ids. |
exclude_doc_ids | array of string | unset | Doc ids to exclude from results (max 100). Useful in agent loops to avoid re-citing docs already shown. |
return_assets | boolean | false | When true, the cited documents' figures (and tables, where available) are returned in an assets field keyed by doc_id. Images are base64 data URIs; resolved per document. |
Metadata filters
Filter on the labels, integers, and wildcard strings you attached at ingest (see Ingestion → Attaching metadata):
| Field | Type | Purpose |
|---|---|---|
meta_tags_any | array of string | Match documents carrying any of these labels (set via X-Labels). Equality-only. |
meta_tags_all | array of string | Match documents carrying all of these labels. |
meta_int_1_gte / meta_int_1_lte | integer | Bounds on the document's X-Meta-Int-1 value. gte = greater than or equal (≥) lower bound; lte = less than or equal (≤) upper bound. |
meta_int_2_gte / meta_int_2_lte | integer | Same, for X-Meta-Int-2. |
unencrypted_strings_match | object (string→string) | Match on the plaintext X-Unencrypted-Strings you set at ingest: a key→glob-pattern map. Each entry is a case-insensitive glob (* = any run, ? = one char; a pattern with no wildcard is an exact match). Multiple keys AND together. |
All filters above (and doc_ids / exclude_doc_ids) combine via AND.
Each integer bound is independent and inclusive. Pass just one to make it open-ended (gte alone = "from N upward"; lte alone = "up to N"), or both to bound a closed window. For example, if you stored a publication year in X-Meta-Int-1, {"meta_int_1_gte": 2020, "meta_int_1_lte": 2023} keeps only documents with 2020 ≤ year ≤ 2023.
For wildcard strings, if you stored {"path": "legal/contracts/acme"} at ingest, then {"unencrypted_strings_match": {"path": "legal/contracts/*"}} keeps only documents whose path begins with legal/contracts/. Unlike meta_tags (HMAC'd, equality-only), unencrypted_strings values are stored unencrypted so they can be glob-matched — don't put secrets or PII in them.
Other controls
| Field | Type | Purpose |
|---|---|---|
min_top_relevance | float 0..1 | Floor for the top hit only. If the best hit scores below this, the whole result set is empty — a guard against returning anything when nothing is a strong match. |
expand_tightness | float 0..1 | How aggressively the engine expands context around each hit. |
retrieval_tier | string | standard (fusion only) or advanced (adds a reranker). See Retrieval tiers. |
premium | boolean | Legacy flag — true equals retrieval_tier: "advanced". Prefer retrieval_tier. |
format | string | prompt_ready (default) returns the XML+Markdown block; json returns structured ranked hits instead. |
No
top_k. Result count is bounded server-side by relevance + token budget. The recall stage isn't a tunable parameter; tune precision viamin_relevanceand answer size viatarget_tokens.
Response
{
"context": "<doc id=\"...\">…</doc>",
"result_count": 4,
"tokens_estimated": 5820,
"results_dropped": 2,
"detected_lang": "english",
"mode": "advanced",
"search_units": 1
}context is the field you actually use — Grill's contract is "ready to drop into a prompt", not "here is JSON for you to format". The siblings are metadata: result_count / results_dropped, tokens_estimated (rendered size), detected_lang, mode (the retrieval tier used), and search_units (billing). For the wrapper grammar (<doc>, inline [pN] / […] markers, sandwich order, citation attributes), see RetrievalContext format. To get structured per-hit JSON instead of the rendered block, set format: "json".
How the parameters interact
Think of retrieval as a two-stage pipeline:
┌──── filter ────┐ ┌──────── render ────────┐
query ─▶ hybrid search ─▶ score ≥ min_relevance ─▶ fill to target_tokens (sandwich, up to max_tokens) ─▶ contextmin_relevancecontrols precision. Anything below the threshold is discarded outright, even if it would have fit the budget. Useful for "I'd rather return less than return junk" queries (e.g. legal Q&A).target_tokenscontrols the answer size. After thresholding, hits are placed in sandwich order and admitted best-first up to this soft budget; if it overflows, the lowest-ranked hits are evicted first. Hits at the very top and very bottom of the order survive longest.max_tokensis a hard ceiling that only matters for tight multi-document clusters (several near-equally-relevant hits expand past the target toward it). Leave it at the default unless you want to allow — or forbid — that expansion.
A typical chat-with-knowledge-base setup looks like:
{ "query": "...", "min_relevance": 0.3, "target_tokens": 6000 }A "strict citations" assistant returns less, and keeps it short:
{ "query": "...", "min_relevance": 0.5, "target_tokens": 2500 }Searching a single document
When you already know which document the answer must come from — "chat with this PDF", a per-document help bot, a contract Q&A flow — use POST /grill/searchInDoc. It is the same shape as /grill/search but doc_filter is required and rejected as 400 if missing.
curl -sS -X POST "$GRILL/grill/searchInDoc" \
-H "authorization: Bearer $GRILL_KEY" \
-H "content-type: application/json" \
-d '{
"query": "What is the cancellation policy?",
"doc_filter": "terms_of_service_v3",
"target_tokens": 3000
}'Why a separate endpoint instead of just setting doc_filter on /grill/search? Two reasons:
- Safety.
searchInDocmakes the per-document scope a contract, not an option — you cannot accidentally drop the filter and silently retrieve from the entire namespace. - Validation. The server rejects an empty
doc_filterup front, so a typo that would otherwise become a "search the world" bug fails fast.
The response is the same RetrievalContext shape.
From the Python SDK
The SDK mirrors the endpoint split — g.search(...) for namespace-wide retrieval, g.search_in_doc(...) for per-document retrieval. The doc_filter argument is required positionally on search_in_doc so the safety guarantee carries through to Python.
from poma import Grill
g = Grill()
# Namespace-wide retrieval
ctx = g.search(
"How did operating margin change year over year?",
min_relevance=0.3,
target_tokens=6000,
)
print(ctx.context)
# Scoped to one document — doc_filter is positional and required
ctx = g.search_in_doc(
"What is the cancellation policy?",
"terms_of_service_v3",
target_tokens=3000,
)GrillContext.context is the same prompt-ready XML + Markdown block you'd get from curl. Full signature: Grill.search / Grill.search_in_doc.
Errors
| Status | When | Notes |
|---|---|---|
400 | Missing query, missing doc_filter (on searchInDoc), or upstream validation error | Body is plain text. |
401 | Missing or invalid token | Use a project API key. |
403 | Project lacks Grill access (primecut-only project) | Create or switch to a Grill project. |
404 | doc_filter references an unknown document | Confirm the doc exists with GET /grill/docs. |
502 | Other upstream Grill / proxy errors | Retry with backoff. |
503 | Grill backend unreachable | Retry; if persistent, check status page. |
Upstream Grill errors are mapped to these statuses based on the upstream message, so you'll see a consistent contract regardless of which internal subsystem failed.
Tuning checklist
When retrieval feels off, work down this list:
- No matches at all. Check the doc is in
GET /grill/docs. Lowermin_relevance(e.g.0.3→0.1). - Junk in the context. Raise
min_relevance(e.g.0.3→0.5). - Truncated context. Raise
target_tokens, or narrow the query so fewer hits are needed. - Slow latency / heavy payload. Drop
return_assets/return_page_imageswhen you don't need them; lowertarget_tokensso less context is rendered. - Wrong document picked. If you know the answer is in one doc, switch to
/grill/searchInDoc— that removes the cross-document ambiguity entirely. To exclude already-cited docs in an agent loop, passexclude_doc_ids.
Next
- RetrievalContext format — the grammar of the returned block.
- API reference — full request/response schemas.