Grill: Context-Engine Client
Grill is the SDK's client for POMA's RAG / hybrid-search product. Where PrimeCut returns a .poma archive of chunks for you to embed yourself, Grill indexes the document server-side into a project namespace and serves prompt-ready retrieval context directly.
If you are coming from the v2 SDK (PrimeCut + generate_cheatsheets(...)), think of Grill as the same idea — structure-preserving retrieval context for an LLM — but managed server-side: no vector store to run, no embeddings to compute, no cheatsheet assembly to wire up.
Authentication
Grill is project-scoped, not account-scoped. The SDK enforces this locally:
from poma import Grill
# Reads POMA_GRILL_API_KEY from the environment.
# Validates the prefix and raises InvalidGrillApiKeyError before any HTTP call.
g = Grill()| Env var | Prefix | Source |
|---|---|---|
POMA_API_KEY | poma_acc_… | Account-level — works on /primeCut/*. Rejected on /grill/*. |
POMA_GRILL_API_KEY | poma_prod_gr_… | Project-level — required for every Grill call. Created by POST /projects. |
Full provisioning flow: Grill Authentication and Create a Grill project.
Ingest
Grill.ingest(...) mirrors PrimeCut.ingest(...): submit a file, wait for the job, return a typed result. Unlike PrimeCut, no archive is downloaded — the document is indexed server-side into your project namespace and is immediately searchable.
from poma import Grill
g = Grill()
result = g.ingest("annual-report.pdf")
print(result.job_id, result.status, result.usage)If you need to submit now and wait later (e.g. for batch ingest, or to free the calling process), use the split form:
job_id = g.submit("annual-report.pdf")
# … minutes or hours later …
result = g.collect(job_id)Search
g.search(query) runs hybrid search across every document in the project namespace and returns a GrillContext — a single string field of XML + Markdown, structured and token-budgeted, that drops straight into an LLM prompt.
ctx = g.search(
"How did operating margin change year over year?",
max_tokens=4000,
)
print(ctx.context)To scope a search to one document (e.g. an in-document Q&A widget), use g.search_in_doc(query, doc_id):
ctx = g.search_in_doc("summarize section 3", "doc_abc123")min_relevance, max_tokens, return_assets, return_page_images, and exclude_doc_ids are accepted on both — see the Grill reference for the full signature.
Manage documents
The namespace is queryable and mutable:
# List every document in the project namespace
for doc in g.list_docs():
print(doc.doc_id, doc.filename, doc.pages, doc.source_job_id)
# Inspect one
info = g.get_doc("doc_abc123")
# Remove one — frees both vectors and stored content
result = g.delete_doc("doc_abc123")
print(result.vectors_deleted, result.storage_deleted)DocInfo.source_job_id lets you correlate the doc back to the job_id returned by submit(...) / ingest(...).
Async
AsyncGrill mirrors the sync surface for use inside asyncio code (web servers, agents, pipelines):
import asyncio
from poma import AsyncGrill
async def main() -> None:
async with AsyncGrill() as g:
result = await g.ingest("annual-report.pdf")
ctx = await g.search("operating margin year over year", max_tokens=4000)
print(ctx.context)
asyncio.run(main())When to use Grill vs PrimeCut
| You want… | Use |
|---|---|
| Prompt-ready retrieval context without running your own vector store | Grill |
| Raw chunks + embeddings to feed into LangChain / LlamaIndex / Qdrant yourself | PrimeCut + generate_cheatsheets(...) |
| Hybrid search across many documents | Grill (g.search) |
| One-shot summarization of a single document | Either; PrimeCut if you already have a chunk pipeline |
| Strict on-prem / air-gapped processing | PrimeCut (results are downloadable archives) |
Both clients can live side-by-side in the same project — they read separate env vars (POMA_API_KEY for PrimeCut, POMA_GRILL_API_KEY for Grill).
Next
- Grill reference — every method signature and return type.
- AsyncGrill reference.
- Grill product docs — endpoint-level reference and end-to-end concepts.