Skip to content

The RetrievalContext Format

Grill's search endpoints return a single field — context — containing a prompt-ready block of XML + Markdown. The shape is opinionated on purpose: it encodes everything an LLM needs to use the retrieved passages well, and everything you need to surface citations to the end user, without you assembling a prompt yourself.

This page documents the grammar, the ordering algorithm, and how to consume it on the client side.

The shape, end to end

xml
<query>How did operating margin change year over year?</query>

<doc id="annual-report-2025" title="Annual Report 2025" pages="3-5">
  ## Operating margin

  Operating margin rose from **18.4%** in FY24 to **21.1%** in FY25 …

  <gap pages="2" />

  Cost-of-goods-sold improvements contributed roughly 1.6 pts to the lift …
</doc>

<doc id="cfo-q4-call" title="CFO Q4 earnings call" pages="14">
  > "We expect operating margin to expand again in FY26, supported by …"
</doc>

Three things are happening here:

  1. <doc> wrappers make per-document boundaries explicit so the model can attribute statements correctly.
  2. <gap /> markers tell the model that the preceding and following passages are not contiguous in the source.
  3. Markdown content (headings, lists, blockquotes) is preserved from the original chunking so the model gets the same structural cues a reader would.

Top-level structure

text
context  ::= [ "<query>" QUERY "</query>" ] (DOC)+
DOC      ::= "<doc id=\"…\" title=\"…\" pages=\"…\">" CONTENT "</doc>"
CONTENT  ::= ( MARKDOWN | "<gap pages=\"N\" />" )+

Attributes you can rely on:

AttributeMeaningAlways present?
id on <doc>The same doc_id you see in DocInfo.doc_idYes
title on <doc>Document title (from PrimeCut metadata)When known
pages on <doc>The page range covered by the included passages, e.g. "3-5" or "14"When the source has page numbers
pages on <gap />How many pages were skipped between adjacent passagesYes

<query> only appears when the request set include_query: true.

Sandwich ordering

Inside the context block, passages are placed in sandwich order rather than strict relevance order. Concretely: the highest-relevance passage goes near the top, the second-highest near the bottom, and the rest fill in toward the middle.

text
position    relevance
─────────  ───────────
top        ★★★★★      ← highest
…          ★★★★
…          ★★★
middle     ★★          ← lowest
…          ★★★
…          ★★★★
bottom     ★★★★★       ← second-highest

Why? Modern LLMs exhibit a U-shaped attention curve over long contexts (the "lost in the middle" effect): tokens at the very start and the very end of the prompt are recalled better than tokens in the middle. Sandwich ordering puts the most important passages where the model is most likely to use them, without you having to think about it.

If you need a different ordering for a downstream tool, you can re-parse the <doc> blocks and reorder yourself — but for a default LLM call, sandwich order will out-perform plain relevance order on long-context recall benchmarks.

Gap markers

When two retrieved passages from the same document are non-adjacent in the source, Grill inserts a <gap pages="N" /> marker between them. The integer is the number of pages skipped.

xml
<doc id="annual-report-2025" pages="3-12">
  ## Section 2 — Margin trends
  …passage from page 3…

  <gap pages="4" />

  …passage from page 7…

  <gap pages="2" />

  …passage from page 10…
</doc>

The marker exists so the model does not assume continuity across passages it would otherwise read as one. Without it, the model can hallucinate transitions ("As stated above, …") that are not in the source.

When two passages are adjacent, no <gap /> is inserted.

Token budgeting

max_tokens in the request bounds the size of the rendered context — not the number of hits. Grill computes the rendered length, including all wrapper tags and gap markers, and drops the lowest-ranked hits first (preserving the sandwich shape) until it fits.

This means:

  • The block you receive is always within budget, even when the underlying corpus has many high-relevance passages.
  • Hits dropped for budget reasons are silently discarded — they do not appear with a "truncated" marker. Use top_k and min_relevance to influence which hits survive.
  • The block can be empty (just a <query> tag) if every hit fell below min_relevance or if the query found nothing in the namespace.

If you need predictable lengths, set max_tokens to a value comfortably below your model's context window minus your prompt's other content (system prompt, user message, response budget).

Surfacing citations

Because every passage is wrapped in <doc id=…>, you can extract citations with a one-pass regex over the response:

python
import re

def cite(context: str):
    return [
        { "doc_id": m.group(1), "title": m.group(2), "pages": m.group(3) }
        for m in re.finditer(
            r'<doc\s+id="([^"]+)"(?:\s+title="([^"]*)")?(?:\s+pages="([^"]*)")?',
            context,
        )
    ]

Pair that with a system prompt like:

"Answer using ONLY the context. After each claim, cite the matching doc.id in square brackets."

…and you have a citation-grounded assistant without writing a reranker, a re-formatter, or a prompt assembler.

What is not in the response

  • No raw scores. Relevance scores are used server-side for filtering and ordering and then dropped. The set of returned passages is the contract; the underlying scores are not.
  • No chunk ids. The granularity you can address is the document, not the chunk. If you need per-chunk addressability, use PrimeCut and own the retrieval layer.
  • No JSON of hits. The returned shape is { "context": "<…>" }. If you need a structured list, parse the XML.

This is deliberate: the API contract guarantees prompt-readiness, not the internals of how that prompt was assembled.

Working with assets and page images

When you set return_assets: true or return_page_images: true, Grill includes references to the matched chunks' associated assets/page images inline in the relevant <doc> block. They appear as Markdown image links pointing at signed URLs Grill manages.

Use them when you need to render visual citations (e.g. "show the page that contains the answer"). Skip them otherwise — they add tokens and the URLs need to be fetched server-side, which adds latency.

Next