Skip to content

Grill Retrieval

POST /grill/search is Grill's main retrieval endpoint. It runs hybrid (lexical + semantic) search across the project namespace and returns a RetrievalContext — a prompt-ready XML + Markdown block, not a list of ranked hits.

This page covers the request shape, how the four tuning parameters interact, and when to use searchInDoc instead.

Request

http
POST /v3/grill/search HTTP/1.1
Authorization: Bearer <project-api-key>
Content-Type: application/json

{
  "query": "How did operating margin change year over year?",
  "min_relevance": 0.3,
  "target_tokens": 6000,
  "max_tokens": 16000,
  "doc_filter": null,
  "exclude_doc_ids": [],
  "return_assets": false,
  "return_page_images": false
}
FieldTypeDefaultPurpose
query (required)stringNatural-language query. Hybrid retrieval splits this into a lexical term set and a vector embedding internally.
min_relevancefloat 0..10.3Relevance floor: hits scoring below it are dropped. Higher = stricter (return less, drop borderline hits); lower = more permissive. 0.3 is the balanced default.
target_tokensinteger6000Soft token budget — the typical answer size. Hits are admitted best-first (sandwich-aware) up to this. This is the knob to size a response.
max_tokensinteger16000Hard ceiling. Grill only expands past target_tokens toward it for tight multi-document clusters. Raised up to target_tokens if set lower, so it can't shrink a response on its own.
doc_filterstringunsetOptional document id. When set, restricts retrieval to a single doc. (Use /grill/searchInDoc if you want this required.)
doc_idsarray of stringunsetRestrict retrieval to this set of document ids.
exclude_doc_idsarray of stringunsetDoc ids to exclude from results (max 100). Useful in agent loops to avoid re-citing docs already shown.
return_assetsbooleanfalseWhen true, the cited documents' figures (and tables, where available) are returned in an assets field keyed by doc_id. Images are base64 data URIs; resolved per document.

Metadata filters

Filter on the labels, integers, and wildcard strings you attached at ingest (see Ingestion → Attaching metadata):

FieldTypePurpose
meta_tags_anyarray of stringMatch documents carrying any of these labels (set via X-Labels). Equality-only.
meta_tags_allarray of stringMatch documents carrying all of these labels.
meta_int_1_gte / meta_int_1_lteintegerBounds on the document's X-Meta-Int-1 value. gte = greater than or equal (≥) lower bound; lte = less than or equal (≤) upper bound.
meta_int_2_gte / meta_int_2_lteintegerSame, for X-Meta-Int-2.
unencrypted_strings_matchobject (string→string)Match on the plaintext X-Unencrypted-Strings you set at ingest: a key→glob-pattern map. Each entry is a case-insensitive glob (* = any run, ? = one char; a pattern with no wildcard is an exact match). Multiple keys AND together.

All filters above (and doc_ids / exclude_doc_ids) combine via AND.

Each integer bound is independent and inclusive. Pass just one to make it open-ended (gte alone = "from N upward"; lte alone = "up to N"), or both to bound a closed window. For example, if you stored a publication year in X-Meta-Int-1, {"meta_int_1_gte": 2020, "meta_int_1_lte": 2023} keeps only documents with 2020 ≤ year ≤ 2023.

For wildcard strings, if you stored {"path": "legal/contracts/acme"} at ingest, then {"unencrypted_strings_match": {"path": "legal/contracts/*"}} keeps only documents whose path begins with legal/contracts/. Unlike meta_tags (HMAC'd, equality-only), unencrypted_strings values are stored unencrypted so they can be glob-matched — don't put secrets or PII in them.

Other controls

FieldTypePurpose
min_top_relevancefloat 0..1Floor for the top hit only. If the best hit scores below this, the whole result set is empty — a guard against returning anything when nothing is a strong match.
expand_tightnessfloat 0..1How aggressively the engine expands context around each hit.
retrieval_tierstringstandard (fusion only) or advanced (adds a reranker). See Retrieval tiers.
premiumbooleanLegacy flag — true equals retrieval_tier: "advanced". Prefer retrieval_tier.
formatstringprompt_ready (default) returns the XML+Markdown block; json returns structured ranked hits instead.

No top_k. Result count is bounded server-side by relevance + token budget. The recall stage isn't a tunable parameter; tune precision via min_relevance and answer size via target_tokens.

Response

json
{
  "context": "<doc id=\"...\">…</doc>",
  "result_count": 4,
  "tokens_estimated": 5820,
  "results_dropped": 2,
  "detected_lang": "english",
  "mode": "advanced",
  "search_units": 1
}

context is the field you actually use — Grill's contract is "ready to drop into a prompt", not "here is JSON for you to format". The siblings are metadata: result_count / results_dropped, tokens_estimated (rendered size), detected_lang, mode (the retrieval tier used), and search_units (billing). For the wrapper grammar (<doc>, inline [pN] / […] markers, sandwich order, citation attributes), see RetrievalContext format. To get structured per-hit JSON instead of the rendered block, set format: "json".

How the parameters interact

Think of retrieval as a two-stage pipeline:

text
                ┌──── filter ────┐  ┌──────── render ────────┐
query ─▶ hybrid search ─▶ score ≥ min_relevance ─▶ fill to target_tokens (sandwich, up to max_tokens) ─▶ context
  • min_relevance controls precision. Anything below the threshold is discarded outright, even if it would have fit the budget. Useful for "I'd rather return less than return junk" queries (e.g. legal Q&A).
  • target_tokens controls the answer size. After thresholding, hits are placed in sandwich order and admitted best-first up to this soft budget; if it overflows, the lowest-ranked hits are evicted first. Hits at the very top and very bottom of the order survive longest.
  • max_tokens is a hard ceiling that only matters for tight multi-document clusters (several near-equally-relevant hits expand past the target toward it). Leave it at the default unless you want to allow — or forbid — that expansion.

A typical chat-with-knowledge-base setup looks like:

json
{ "query": "...", "min_relevance": 0.3, "target_tokens": 6000 }

A "strict citations" assistant returns less, and keeps it short:

json
{ "query": "...", "min_relevance": 0.5, "target_tokens": 2500 }

Searching a single document

When you already know which document the answer must come from — "chat with this PDF", a per-document help bot, a contract Q&A flow — use POST /grill/searchInDoc. It is the same shape as /grill/search but doc_filter is required and rejected as 400 if missing.

bash
curl -sS -X POST "$GRILL/grill/searchInDoc" \
  -H "authorization: Bearer $GRILL_KEY" \
  -H "content-type: application/json" \
  -d '{
    "query": "What is the cancellation policy?",
    "doc_filter": "terms_of_service_v3",
    "target_tokens": 3000
  }'

Why a separate endpoint instead of just setting doc_filter on /grill/search? Two reasons:

  • Safety. searchInDoc makes the per-document scope a contract, not an option — you cannot accidentally drop the filter and silently retrieve from the entire namespace.
  • Validation. The server rejects an empty doc_filter up front, so a typo that would otherwise become a "search the world" bug fails fast.

The response is the same RetrievalContext shape.

From the Python SDK

The SDK mirrors the endpoint split — g.search(...) for namespace-wide retrieval, g.search_in_doc(...) for per-document retrieval. The doc_filter argument is required positionally on search_in_doc so the safety guarantee carries through to Python.

python
from poma import Grill

g = Grill()

# Namespace-wide retrieval
ctx = g.search(
    "How did operating margin change year over year?",
    min_relevance=0.3,
    target_tokens=6000,
)
print(ctx.context)

# Scoped to one document — doc_filter is positional and required
ctx = g.search_in_doc(
    "What is the cancellation policy?",
    "terms_of_service_v3",
    target_tokens=3000,
)

GrillContext.context is the same prompt-ready XML + Markdown block you'd get from curl. Full signature: Grill.search / Grill.search_in_doc.

Errors

StatusWhenNotes
400Missing query, missing doc_filter (on searchInDoc), or upstream validation errorBody is plain text.
401Missing or invalid tokenUse a project API key.
403Project lacks Grill access (primecut-only project)Create or switch to a Grill project.
404doc_filter references an unknown documentConfirm the doc exists with GET /grill/docs.
502Other upstream Grill / proxy errorsRetry with backoff.
503Grill backend unreachableRetry; if persistent, check status page.

Upstream Grill errors are mapped to these statuses based on the upstream message, so you'll see a consistent contract regardless of which internal subsystem failed.

Tuning checklist

When retrieval feels off, work down this list:

  1. No matches at all. Check the doc is in GET /grill/docs. Lower min_relevance (e.g. 0.30.1).
  2. Junk in the context. Raise min_relevance (e.g. 0.30.5).
  3. Truncated context. Raise target_tokens, or narrow the query so fewer hits are needed.
  4. Slow latency / heavy payload. Drop return_assets / return_page_images when you don't need them; lower target_tokens so less context is rendered.
  5. Wrong document picked. If you know the answer is in one doc, switch to /grill/searchInDoc — that removes the cross-document ambiguity entirely. To exclude already-cited docs in an agent loop, pass exclude_doc_ids.

Next