Grill Quickstart

This walkthrough takes you from a blank Grill project to prompt-ready context in three calls. You will:

Ingest a PDF with POST /grill/ingest.
Wait for the job to reach done.
Search with POST /grill/search and inspect the RetrievalContext block.

If you have not created a project yet, do that first — see Create a Grill project. You should have a project API key (prefix poma_prod_gr_…) before starting. The account-level POMA_API_KEY (prefix poma_acc_…) cannot call /grill/*.

Prefer Python? The whole loop below collapses to four lines with the SDK — jump to Using the Python SDK at the bottom, or read the Grill reference.

bash

export GRILL="https://api.poma-ai.com/v3"
export GRILL_KEY="poma_prod_gr_…"   # the SDK reads POMA_GRILL_API_KEY by the same name

1. Ingest a document

Grill ingestion uses the same raw-bytes contract as PrimeCut: application/octet-stream body with a Content-Disposition header that carries the filename.

bash

JOB=$(curl -sS -X POST "$GRILL/grill/ingest" \
  -H "authorization: Bearer $GRILL_KEY" \
  -H "content-type: application/octet-stream" \
  -H 'content-disposition: attachment; filename="annual-report.pdf"' \
  --data-binary @annual-report.pdf)

echo "$JOB" | jq .

The response is a PublicJob:

json

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "created_at": "2026-04-30T10:00:00Z",
  "properties": { "file": { "filename": "annual-report.pdf", "size": 1048576 } }
}

Capture the job id:

bash

JOB_ID=$(echo "$JOB" | jq -r .job_id)

Why Content-Disposition? The body is raw bytes, so the filename has to ride in a header. Without it the server cannot infer the file type and rejects the request with 400.

2. Wait for the job to finish

Grill reuses the standard POMA job lifecycle: pending → processing → done / failed. Two ways to follow it:

pollingSSE stream

bash

while :; do
  STATUS=$(curl -sS "$GRILL/jobs/$JOB_ID/status" \
    -H "authorization: Bearer $GRILL_KEY" | jq -r .status)
  echo "status: $STATUS"
  case "$STATUS" in
    done|failed) break ;;
    *) sleep 2 ;;
  esac
done

bash

curl -N "https://api.poma-ai.com/status/v1/jobs/$JOB_ID" \
  -H "authorization: Bearer $GRILL_KEY"

When the job reaches done, the document is already indexed in your project namespace. Unlike PrimeCut you do not need to download the .poma archive — Grill has stored chunks, embeddings, and assets server-side and they are immediately searchable.

3. Search

bash

curl -sS -X POST "$GRILL/grill/search" \
  -H "authorization: Bearer $GRILL_KEY" \
  -H "content-type: application/json" \
  -d '{
    "query": "How did operating margin change year over year?"
  }' | jq -r .context

You get back a RetrievalContext — a single string field of XML + Markdown, ready to paste into an LLM prompt:

xml

<context>
<doc id="annual-report" title="Annual Report 2025" pages="42">
  [p42]
  ## Operating margin

  Operating margin rose from **18.4%** in FY24 to **21.1%** in FY25, …

  […]

  Cost-of-goods-sold improvements contributed roughly 1.6 pts …
</doc>
</context>

The block is already:

Sandwich-ordered — most relevant passages bracket the middle of the prompt window for the best LLM recall behaviour.
Skip-marked — […] is inserted between non-consecutive chunks of the same document, so the model knows one or more chunks were dropped and won't hallucinate a transition. [pN] tags the page-of-origin within the chunk content.
Token-budgeted — max_tokens is enforced server-side; lower-ranked hits are dropped if the budget is tight.
Citation-ready — the id, title, and pages attributes on <doc> give you everything you need to surface a citation.

See RetrievalContext format for the full grammar.

4. Drop the context into an LLM call

python

import os, requests, openai

g = requests.post(
    "https://api.poma-ai.com/v3/grill/search",
    headers={"authorization": f"Bearer {os.environ['GRILL_KEY']}"},
    json={"query": "How did operating margin change year over year?"},
    timeout=30,
).json()

resp = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Answer using ONLY the context. Cite document ids."},
        {"role": "user", "content": g["context"]},
    ],
)
print(resp.choices[0].message.content)

That is the complete Grill loop: ingest → search → prompt. No vector DB to set up, no reranker to tune, no prompt assembly to hand-write.

Using the Python SDK

The same three steps in idiomatic Python:

python

# pip install poma
import os
os.environ["POMA_GRILL_API_KEY"] = "poma_prod_gr_…"

from poma import Grill

g = Grill()                                     # validates key prefix locally
result = g.ingest("annual-report.pdf")          # submit + wait, returns when done
ctx = g.search("How did operating margin change year over year?")
print(result.job_id, result.status)
print(ctx.context)                              # the same XML+Markdown block

For async code, use AsyncGrill — every method becomes await-able:

python

import asyncio
from poma import AsyncGrill

async def main() -> None:
    async with AsyncGrill() as g:
        await g.ingest("annual-report.pdf")
        ctx = await g.search("operating margin year over year")
        print(ctx.context)

asyncio.run(main())

Full surface: Grill reference, AsyncGrill reference, Grill concepts in the SDK.

Ingestion — file types, async semantics, redoing a doc.
Retrieval — min_relevance, target_tokens vs max_tokens, their defaults, and when to override each.
API reference — every endpoint and field.

Grill

Getting started

Concepts

Reference

PrimeCut

Getting started

Concepts

Reference

Python SDK

Getting started

Concepts

Reference

Integrations

Migration

CLI

MCP

Learn (study path)

Chunking

Ingestion

Grill Quickstart

1. Ingest a document

2. Wait for the job to finish

3. Search

4. Drop the context into an LLM call

Using the Python SDK

Next

Chunking

Ingestion

Grill Quickstart ​

1. Ingest a document ​

2. Wait for the job to finish ​

3. Search ​

4. Drop the context into an LLM call ​

Using the Python SDK ​

Next ​

Grill Quickstart

1. Ingest a document

2. Wait for the job to finish

3. Search

4. Drop the context into an LLM call

Using the Python SDK

Next