Skip to content

Grill Quickstart

This walkthrough takes you from a blank Grill project to prompt-ready context in three calls. You will:

  1. Ingest a PDF with POST /grill/ingest.
  2. Wait for the job to reach done.
  3. Search with POST /grill/search and inspect the RetrievalContext block.

If you have not created a project yet, do that first — see Create a Grill project. You should have a project API key (prefix poma_prod_gr_…) before starting. The account-level POMA_API_KEY (prefix poma_acc_…) cannot call /grill/*.

Prefer Python? The whole loop below collapses to four lines with the SDK — jump to Using the Python SDK at the bottom, or read the Grill reference.

bash
export GRILL="https://api.poma-ai.com/v3"
export GRILL_KEY="poma_prod_gr_…"   # the SDK reads POMA_GRILL_API_KEY by the same name

1. Ingest a document

Grill ingestion uses the same raw-bytes contract as PrimeCut: application/octet-stream body with a Content-Disposition header that carries the filename.

bash
JOB=$(curl -sS -X POST "$GRILL/grill/ingest" \
  -H "authorization: Bearer $GRILL_KEY" \
  -H "content-type: application/octet-stream" \
  -H 'content-disposition: attachment; filename="annual-report.pdf"' \
  --data-binary @annual-report.pdf)

echo "$JOB" | jq .

The response is a PublicJob:

json
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "created_at": "2026-04-30T10:00:00Z",
  "properties": { "file": { "filename": "annual-report.pdf", "size": 1048576 } }
}

Capture the job id:

bash
JOB_ID=$(echo "$JOB" | jq -r .job_id)

Why Content-Disposition? The body is raw bytes, so the filename has to ride in a header. Without it the server cannot infer the file type and rejects the request with 400.

2. Wait for the job to finish

Grill reuses the standard POMA job lifecycle: pendingprocessingdone / failed. Two ways to follow it:

bash
while :; do
  STATUS=$(curl -sS "$GRILL/jobs/$JOB_ID/status" \
    -H "authorization: Bearer $GRILL_KEY" | jq -r .status)
  echo "status: $STATUS"
  case "$STATUS" in
    done|failed) break ;;
    *) sleep 2 ;;
  esac
done
bash
curl -N "https://api.poma-ai.com/status/v1/jobs/$JOB_ID" \
  -H "authorization: Bearer $GRILL_KEY"

When the job reaches done, the document is already indexed in your project namespace. Unlike PrimeCut you do not need to download the .poma archive — Grill has stored chunks, embeddings, and assets server-side and they are immediately searchable.

bash
curl -sS -X POST "$GRILL/grill/search" \
  -H "authorization: Bearer $GRILL_KEY" \
  -H "content-type: application/json" \
  -d '{
    "query": "How did operating margin change year over year?",
    "top_k": 8,
    "min_relevance": 0.35,
    "max_tokens": 4000,
    "include_query": true
  }' | jq -r .context

You get back a RetrievalContext — a single string field of XML + Markdown, ready to paste into an LLM prompt:

xml
<query>How did operating margin change year over year?</query>
<doc id="annual-report" title="Annual Report 2025" pages="42">
  ## Operating margin

  Operating margin rose from **18.4%** in FY24 to **21.1%** in FY25, …

  <gap pages="3" />

  Cost-of-goods-sold improvements contributed roughly 1.6 pts …
</doc>

The block is already:

  • Sandwich-ordered — most relevant passages bracket the middle of the prompt window for the best LLM recall behaviour.
  • Gap-marked<gap pages="N" /> tags show where surrounding content was skipped, so the model knows the passages are not contiguous.
  • Token-budgetedmax_tokens is enforced server-side; lower-ranked hits are dropped if the budget is tight.
  • Citation-ready — the id, title, and pages attributes on <doc> give you everything you need to surface a citation.

See RetrievalContext format for the full grammar.

4. Drop the context into an LLM call

python
import os, requests, openai

g = requests.post(
    "https://api.poma-ai.com/v3/grill/search",
    headers={"authorization": f"Bearer {os.environ['GRILL_KEY']}"},
    json={"query": "How did operating margin change year over year?", "max_tokens": 4000, "include_query": True},
    timeout=30,
).json()

resp = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Answer using ONLY the context. Cite document ids."},
        {"role": "user", "content": g["context"]},
    ],
)
print(resp.choices[0].message.content)

That is the complete Grill loop: ingest → search → prompt. No vector DB to set up, no reranker to tune, no prompt assembly to hand-write.

Using the Python SDK

The same three steps in idiomatic Python:

python
# pip install poma
import os
os.environ["POMA_GRILL_API_KEY"] = "poma_prod_gr_…"

from poma import Grill

g = Grill()                                     # validates key prefix locally
result = g.ingest("annual-report.pdf")          # submit + wait, returns when done
ctx = g.search(
    "How did operating margin change year over year?",
    max_tokens=4000,
)
print(result.job_id, result.status)
print(ctx.context)                              # the same XML+Markdown block

For async code, use AsyncGrill — every method becomes await-able:

python
import asyncio
from poma import AsyncGrill

async def main() -> None:
    async with AsyncGrill() as g:
        await g.ingest("annual-report.pdf")
        ctx = await g.search("operating margin year over year", max_tokens=4000)
        print(ctx.context)

asyncio.run(main())

Full surface: Grill reference, AsyncGrill reference, Grill concepts in the SDK.

Next

  • Ingestion — file types, async semantics, redoing a doc.
  • Retrievaltop_k vs min_relevance vs max_tokens and when each one matters.
  • API reference — every endpoint and field.