Grill Retrieval

POST /grill/search is Grill's main retrieval endpoint. It runs hybrid (lexical + semantic) search across the project namespace and returns a RetrievalContext — a prompt-ready XML + Markdown block, not a list of ranked hits.

This page covers the request shape, how the four tuning parameters interact, and when to use searchInDoc instead.

Request

http

POST /v3/grill/search HTTP/1.1
Authorization: Bearer <project-api-key>
Content-Type: application/json

{
  "query": "How did operating margin change year over year?",
  "doc_filter": null,
  "exclude_doc_ids": [],
  "return_assets": false,
  "return_page_images": false
}

Only query is required. The tuning knobs (min_relevance, target_tokens, max_tokens) are omitted above on purpose — their defaults are sized for the common case, so set them only when you want to deviate.

Field	Type	Default	Purpose
`query` (required)	string	—	Natural-language query. Hybrid retrieval splits this into a lexical term set and a vector embedding internally.
`min_relevance`	float `0..1`	`0.3`	Relevance floor: hits scoring below it are dropped. Higher = stricter (return less, drop borderline hits); lower = more permissive. `0.3` is the balanced default.
`target_tokens`	integer	`5000`	Soft token budget — the typical answer size. Hits are admitted best-first (sandwich-aware) up to this. This is the knob to size a response. Can never exceed `max_tokens` (clamped down if it would).
`max_tokens`	integer	`15000`	Hard ceiling. Grill only expands past `target_tokens` toward it for tight multi-document clusters. Set it below `target_tokens` and the soft target is clamped down to match — the hard ceiling always wins; it never inflates a response on its own.
`doc_filter`	string	unset	Optional document id. When set, restricts retrieval to a single doc. (Use `/grill/searchInDoc` if you want this required.)
`doc_ids`	array of string	unset	Restrict retrieval to this set of document ids.
`exclude_doc_ids`	array of string	unset	Doc ids to exclude from results (max 100). Useful in agent loops to avoid re-citing docs already shown.
`return_assets`	boolean	`false`	When `true`, the cited documents' figures (and tables, where available) are returned in an `assets` field keyed by `doc_id`. Images are base64 data URIs; resolved per document.

Metadata filters

Filter on the labels, integers, and wildcard strings you attached at ingest (see Ingestion → Attaching metadata):

Field	Type	Purpose
`meta_tags_any`	array of string	Match documents carrying any of these labels (set via `X-Labels`). Equality-only.
`meta_tags_all`	array of string	Match documents carrying all of these labels.
`meta_int_1_gte` / `meta_int_1_lte`	integer	Bounds on the document's `X-Meta-Int-1` value. `gte` = greater than or equal (≥) lower bound; `lte` = less than or equal (≤) upper bound.
`meta_int_2_gte` / `meta_int_2_lte`	integer	Same, for `X-Meta-Int-2`.
`unencrypted_strings_match`	object (string→string)	Match on the plaintext `X-Unencrypted-Strings` you set at ingest: a key→glob-pattern map. Each entry is a case-insensitive glob (`*` = any run, `?` = one char; a pattern with no wildcard is an exact match). Multiple keys AND together.

All filters above (and doc_ids / exclude_doc_ids) combine via AND.

Each integer bound is independent and inclusive. Pass just one to make it open-ended (gte alone = "from N upward"; lte alone = "up to N"), or both to bound a closed window. For example, if you stored a publication year in X-Meta-Int-1, {"meta_int_1_gte": 2020, "meta_int_1_lte": 2023} keeps only documents with 2020 ≤ year ≤ 2023.

For wildcard strings, if you stored {"path": "legal/contracts/acme"} at ingest, then {"unencrypted_strings_match": {"path": "legal/contracts/*"}} keeps only documents whose path begins with legal/contracts/. Unlike meta_tags (HMAC'd, equality-only), unencrypted_strings values are stored unencrypted so they can be glob-matched — don't put secrets or PII in them.

Other controls

Field	Type	Purpose
`min_top_relevance`	float `0..1`	Floor for the top hit only. If the best hit scores below this, the whole result set is empty — a guard against returning anything when nothing is a strong match.
`expand_tightness`	float `0..1`	How aggressively the engine expands context around each hit.
`retrieval_tier`	string	`standard` (fusion only) or `advanced` (adds a reranker). See Retrieval tiers.
`premium`	boolean	Legacy flag — `true` equals `retrieval_tier: "advanced"`. Prefer `retrieval_tier`.
`format`	string	`prompt_ready` (default) returns the XML+Markdown block; `json` returns structured ranked hits instead.

No top_k. Result count is bounded server-side by relevance + token budget. The recall stage isn't a tunable parameter; tune precision via min_relevance and answer size via target_tokens.

Response

json

{
  "context": "<doc id=\"...\">…</doc>",
  "result_count": 4,
  "tokens_estimated": 5820,
  "results_dropped": 2,
  "detected_lang": "english",
  "mode": "advanced",
  "search_units": 1
}

context is the field you actually use — Grill's contract is "ready to drop into a prompt", not "here is JSON for you to format". The siblings are metadata: result_count / results_dropped, tokens_estimated (rendered size), detected_lang, mode (the retrieval tier used), and search_units (billing). For the wrapper grammar (<doc>, inline [pN] / […] markers, sandwich order, citation attributes), see RetrievalContext format. To get structured per-hit JSON instead of the rendered block, set format: "json".

How the parameters interact

Think of retrieval as a two-stage pipeline:

text

                ┌──── filter ────┐  ┌──────── render ────────┐
query ─▶ hybrid search ─▶ score ≥ min_relevance ─▶ fill to target_tokens (sandwich, up to max_tokens) ─▶ context

min_relevance controls precision. Anything below the threshold is discarded outright, even if it would have fit the budget. Useful for "I'd rather return less than return junk" queries (e.g. legal Q&A).
target_tokens controls the answer size. After thresholding, hits are placed in sandwich order and admitted best-first up to this soft budget; if it overflows, the lowest-ranked hits are evicted first. Hits at the very top and very bottom of the order survive longest.
max_tokens is a hard ceiling that only matters for tight multi-document clusters (several near-equally-relevant hits expand past the target toward it). Leave it at the default unless you want to allow — or forbid — that expansion.

A typical chat-with-knowledge-base setup needs no tuning at all — the defaults are sized for exactly this:

json

{ "query": "..." }

A "strict citations" assistant is where deviating pays off — raise the floor to drop borderline hits, and shrink the budget to keep answers short:

json

{ "query": "...", "min_relevance": 0.5, "target_tokens": 2500 }

Searching a single document

When you already know which document the answer must come from — "chat with this PDF", a per-document help bot, a contract Q&A flow — use POST /grill/searchInDoc. It is the same shape as /grill/search but doc_filter is required and rejected as 400 if missing.

bash

curl -sS -X POST "$GRILL/grill/searchInDoc" \
  -H "authorization: Bearer $GRILL_KEY" \
  -H "content-type: application/json" \
  -d '{
    "query": "What is the cancellation policy?",
    "doc_filter": "terms_of_service_v3"
  }'

Why a separate endpoint instead of just setting doc_filter on /grill/search? Two reasons:

Safety. searchInDoc makes the per-document scope a contract, not an option — you cannot accidentally drop the filter and silently retrieve from the entire namespace.
Validation. The server rejects an empty doc_filter up front, so a typo that would otherwise become a "search the world" bug fails fast.

The response is the same RetrievalContext shape.

From the Python SDK

The SDK mirrors the endpoint split — g.search(...) for namespace-wide retrieval, g.search_in_doc(...) for per-document retrieval. The doc_filter argument is required positionally on search_in_doc so the safety guarantee carries through to Python.

python

from poma import Grill

g = Grill()

# Namespace-wide retrieval — defaults are sized for the common case
ctx = g.search("How did operating margin change year over year?")
print(ctx.context)

# Scoped to one document — doc_filter is positional and required
ctx = g.search_in_doc("What is the cancellation policy?", "terms_of_service_v3")

GrillContext.context is the same prompt-ready XML + Markdown block you'd get from curl. Full signature: Grill.search / Grill.search_in_doc.

Errors

Status	When	Notes
`400`	Missing `query`, missing `doc_filter` (on `searchInDoc`), or upstream validation error	Body is plain text.
`401`	Missing or invalid token	Use a project API key.
`403`	Project lacks Grill access (`primecut`-only project)	Create or switch to a Grill project.
`404`	`doc_filter` references an unknown document	Confirm the doc exists with `GET /grill/docs`.
`502`	Other upstream Grill / proxy errors	Retry with backoff.
`503`	Grill backend unreachable	Retry; if persistent, check status page.

Upstream Grill errors are mapped to these statuses based on the upstream message, so you'll see a consistent contract regardless of which internal subsystem failed.

Tuning checklist

When retrieval feels off, work down this list:

No matches at all. Check the doc is in GET /grill/docs. Lower min_relevance (e.g. 0.3 → 0.1).
Junk in the context. Raise min_relevance (e.g. 0.3 → 0.5).
Truncated context. Raise target_tokens, or narrow the query so fewer hits are needed.
Slow latency / heavy payload. Drop return_assets / return_page_images when you don't need them; lower target_tokens so less context is rendered.
Wrong document picked. If you know the answer is in one doc, switch to /grill/searchInDoc — that removes the cross-document ambiguity entirely. To exclude already-cited docs in an agent loop, pass exclude_doc_ids.

RetrievalContext format — the grammar of the returned block.
API reference — full request/response schemas.

Grill

Getting started

Concepts

Reference

PrimeCut

Getting started

Concepts

Reference

Python SDK

Getting started

Concepts

Reference

Integrations

Migration

CLI

MCP

Learn (study path)

Chunking

Ingestion

Grill Retrieval

Request

Metadata filters

Other controls

Response

How the parameters interact

Searching a single document

From the Python SDK

Errors

Tuning checklist

Next

Chunking

Ingestion

Grill Retrieval ​

Request ​

Metadata filters ​

Other controls ​

Response ​

How the parameters interact ​

Searching a single document ​

From the Python SDK ​

Errors ​

Tuning checklist ​

Next ​

Grill Retrieval

Request

Metadata filters

Other controls

Response

How the parameters interact

Searching a single document

From the Python SDK

Errors

Tuning checklist

Next