Grill Quickstart
This walkthrough takes you from a blank Grill project to prompt-ready context in three calls. You will:
- Ingest a PDF with
POST /grill/ingest. - Wait for the job to reach
done. - Search with
POST /grill/searchand inspect theRetrievalContextblock.
If you have not created a project yet, do that first — see Create a Grill project. You should have a project API key (prefix poma_prod_gr_…) before starting. The account-level POMA_API_KEY (prefix poma_acc_…) cannot call /grill/*.
Prefer Python? The whole loop below collapses to four lines with the SDK — jump to Using the Python SDK at the bottom, or read the
Grillreference.
export GRILL="https://api.poma-ai.com/v3"
export GRILL_KEY="poma_prod_gr_…" # the SDK reads POMA_GRILL_API_KEY by the same name1. Ingest a document
Grill ingestion uses the same raw-bytes contract as PrimeCut: application/octet-stream body with a Content-Disposition header that carries the filename.
JOB=$(curl -sS -X POST "$GRILL/grill/ingest" \
-H "authorization: Bearer $GRILL_KEY" \
-H "content-type: application/octet-stream" \
-H 'content-disposition: attachment; filename="annual-report.pdf"' \
--data-binary @annual-report.pdf)
echo "$JOB" | jq .The response is a PublicJob:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"created_at": "2026-04-30T10:00:00Z",
"properties": { "file": { "filename": "annual-report.pdf", "size": 1048576 } }
}Capture the job id:
JOB_ID=$(echo "$JOB" | jq -r .job_id)Why
Content-Disposition? The body is raw bytes, so the filename has to ride in a header. Without it the server cannot infer the file type and rejects the request with400.
2. Wait for the job to finish
Grill reuses the standard POMA job lifecycle: pending → processing → done / failed. Two ways to follow it:
while :; do
STATUS=$(curl -sS "$GRILL/jobs/$JOB_ID/status" \
-H "authorization: Bearer $GRILL_KEY" | jq -r .status)
echo "status: $STATUS"
case "$STATUS" in
done|failed) break ;;
*) sleep 2 ;;
esac
donecurl -N "https://api.poma-ai.com/status/v1/jobs/$JOB_ID" \
-H "authorization: Bearer $GRILL_KEY"When the job reaches done, the document is already indexed in your project namespace. Unlike PrimeCut you do not need to download the .poma archive — Grill has stored chunks, embeddings, and assets server-side and they are immediately searchable.
3. Search
curl -sS -X POST "$GRILL/grill/search" \
-H "authorization: Bearer $GRILL_KEY" \
-H "content-type: application/json" \
-d '{
"query": "How did operating margin change year over year?",
"top_k": 8,
"min_relevance": 0.35,
"max_tokens": 4000,
"include_query": true
}' | jq -r .contextYou get back a RetrievalContext — a single string field of XML + Markdown, ready to paste into an LLM prompt:
<query>How did operating margin change year over year?</query>
<doc id="annual-report" title="Annual Report 2025" pages="42">
## Operating margin
Operating margin rose from **18.4%** in FY24 to **21.1%** in FY25, …
<gap pages="3" />
Cost-of-goods-sold improvements contributed roughly 1.6 pts …
</doc>The block is already:
- Sandwich-ordered — most relevant passages bracket the middle of the prompt window for the best LLM recall behaviour.
- Gap-marked —
<gap pages="N" />tags show where surrounding content was skipped, so the model knows the passages are not contiguous. - Token-budgeted —
max_tokensis enforced server-side; lower-ranked hits are dropped if the budget is tight. - Citation-ready — the
id,title, andpagesattributes on<doc>give you everything you need to surface a citation.
See RetrievalContext format for the full grammar.
4. Drop the context into an LLM call
import os, requests, openai
g = requests.post(
"https://api.poma-ai.com/v3/grill/search",
headers={"authorization": f"Bearer {os.environ['GRILL_KEY']}"},
json={"query": "How did operating margin change year over year?", "max_tokens": 4000, "include_query": True},
timeout=30,
).json()
resp = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Answer using ONLY the context. Cite document ids."},
{"role": "user", "content": g["context"]},
],
)
print(resp.choices[0].message.content)That is the complete Grill loop: ingest → search → prompt. No vector DB to set up, no reranker to tune, no prompt assembly to hand-write.
Using the Python SDK
The same three steps in idiomatic Python:
# pip install poma
import os
os.environ["POMA_GRILL_API_KEY"] = "poma_prod_gr_…"
from poma import Grill
g = Grill() # validates key prefix locally
result = g.ingest("annual-report.pdf") # submit + wait, returns when done
ctx = g.search(
"How did operating margin change year over year?",
max_tokens=4000,
)
print(result.job_id, result.status)
print(ctx.context) # the same XML+Markdown blockFor async code, use AsyncGrill — every method becomes await-able:
import asyncio
from poma import AsyncGrill
async def main() -> None:
async with AsyncGrill() as g:
await g.ingest("annual-report.pdf")
ctx = await g.search("operating margin year over year", max_tokens=4000)
print(ctx.context)
asyncio.run(main())Full surface: Grill reference, AsyncGrill reference, Grill concepts in the SDK.
Next
- Ingestion — file types, async semantics, redoing a doc.
- Retrieval —
top_kvsmin_relevancevsmax_tokensand when each one matters. - API reference — every endpoint and field.