The RetrievalContext Format
Grill's search endpoints return a single field — context — containing a prompt-ready block of XML + Markdown. The shape is opinionated on purpose: it encodes everything an LLM needs to use the retrieved passages well, and everything you need to surface citations to the end user, without you assembling a prompt yourself.
This page documents the grammar, the ordering algorithm, and how to consume it on the client side.
The shape, end to end
<query>How did operating margin change year over year?</query>
<doc id="annual-report-2025" title="Annual Report 2025" pages="3-5">
## Operating margin
Operating margin rose from **18.4%** in FY24 to **21.1%** in FY25 …
<gap pages="2" />
Cost-of-goods-sold improvements contributed roughly 1.6 pts to the lift …
</doc>
<doc id="cfo-q4-call" title="CFO Q4 earnings call" pages="14">
> "We expect operating margin to expand again in FY26, supported by …"
</doc>Three things are happening here:
<doc>wrappers make per-document boundaries explicit so the model can attribute statements correctly.<gap />markers tell the model that the preceding and following passages are not contiguous in the source.- Markdown content (headings, lists, blockquotes) is preserved from the original chunking so the model gets the same structural cues a reader would.
Top-level structure
context ::= [ "<query>" QUERY "</query>" ] (DOC)+
DOC ::= "<doc id=\"…\" title=\"…\" pages=\"…\">" CONTENT "</doc>"
CONTENT ::= ( MARKDOWN | "<gap pages=\"N\" />" )+Attributes you can rely on:
| Attribute | Meaning | Always present? |
|---|---|---|
id on <doc> | The same doc_id you see in DocInfo.doc_id | Yes |
title on <doc> | Document title (from PrimeCut metadata) | When known |
pages on <doc> | The page range covered by the included passages, e.g. "3-5" or "14" | When the source has page numbers |
pages on <gap /> | How many pages were skipped between adjacent passages | Yes |
<query> only appears when the request set include_query: true.
Sandwich ordering
Inside the context block, passages are placed in sandwich order rather than strict relevance order. Concretely: the highest-relevance passage goes near the top, the second-highest near the bottom, and the rest fill in toward the middle.
position relevance
───────── ───────────
top ★★★★★ ← highest
… ★★★★
… ★★★
middle ★★ ← lowest
… ★★★
… ★★★★
bottom ★★★★★ ← second-highestWhy? Modern LLMs exhibit a U-shaped attention curve over long contexts (the "lost in the middle" effect): tokens at the very start and the very end of the prompt are recalled better than tokens in the middle. Sandwich ordering puts the most important passages where the model is most likely to use them, without you having to think about it.
If you need a different ordering for a downstream tool, you can re-parse the <doc> blocks and reorder yourself — but for a default LLM call, sandwich order will out-perform plain relevance order on long-context recall benchmarks.
Gap markers
When two retrieved passages from the same document are non-adjacent in the source, Grill inserts a <gap pages="N" /> marker between them. The integer is the number of pages skipped.
<doc id="annual-report-2025" pages="3-12">
## Section 2 — Margin trends
…passage from page 3…
<gap pages="4" />
…passage from page 7…
<gap pages="2" />
…passage from page 10…
</doc>The marker exists so the model does not assume continuity across passages it would otherwise read as one. Without it, the model can hallucinate transitions ("As stated above, …") that are not in the source.
When two passages are adjacent, no <gap /> is inserted.
Token budgeting
max_tokens in the request bounds the size of the rendered context — not the number of hits. Grill computes the rendered length, including all wrapper tags and gap markers, and drops the lowest-ranked hits first (preserving the sandwich shape) until it fits.
This means:
- The block you receive is always within budget, even when the underlying corpus has many high-relevance passages.
- Hits dropped for budget reasons are silently discarded — they do not appear with a "truncated" marker. Use
top_kandmin_relevanceto influence which hits survive. - The block can be empty (just a
<query>tag) if every hit fell belowmin_relevanceor if the query found nothing in the namespace.
If you need predictable lengths, set max_tokens to a value comfortably below your model's context window minus your prompt's other content (system prompt, user message, response budget).
Surfacing citations
Because every passage is wrapped in <doc id=…>, you can extract citations with a one-pass regex over the response:
import re
def cite(context: str):
return [
{ "doc_id": m.group(1), "title": m.group(2), "pages": m.group(3) }
for m in re.finditer(
r'<doc\s+id="([^"]+)"(?:\s+title="([^"]*)")?(?:\s+pages="([^"]*)")?',
context,
)
]Pair that with a system prompt like:
"Answer using ONLY the context. After each claim, cite the matching
doc.idin square brackets."
…and you have a citation-grounded assistant without writing a reranker, a re-formatter, or a prompt assembler.
What is not in the response
- No raw scores. Relevance scores are used server-side for filtering and ordering and then dropped. The set of returned passages is the contract; the underlying scores are not.
- No chunk ids. The granularity you can address is the document, not the chunk. If you need per-chunk addressability, use PrimeCut and own the retrieval layer.
- No JSON of hits. The returned shape is
{ "context": "<…>" }. If you need a structured list, parse the XML.
This is deliberate: the API contract guarantees prompt-readiness, not the internals of how that prompt was assembled.
Working with assets and page images
When you set return_assets: true or return_page_images: true, Grill includes references to the matched chunks' associated assets/page images inline in the relevant <doc> block. They appear as Markdown image links pointing at signed URLs Grill manages.
Use them when you need to render visual citations (e.g. "show the page that contains the answer"). Skip them otherwise — they add tokens and the URLs need to be fetched server-side, which adds latency.
Next
- Retrieval — the request side, including
top_k/min_relevancetuning. - Document management — keeping the namespace tidy.