Skip to content

Grill: Context-Engine Client

Grill is the SDK's client for POMA's RAG / hybrid-search product. Where PrimeCut returns a .poma archive of chunks for you to embed yourself, Grill indexes the document server-side into a project namespace and serves prompt-ready retrieval context directly.

If you are coming from the v2 SDK (PrimeCut + generate_cheatsheets(...)), think of Grill as the same idea — structure-preserving retrieval context for an LLM — but managed server-side: no vector store to run, no embeddings to compute, no cheatsheet assembly to wire up.

Authentication

Grill is project-scoped, not account-scoped. The SDK enforces this locally:

python
from poma import Grill

# Reads POMA_GRILL_API_KEY from the environment.
# Validates the prefix and raises InvalidGrillApiKeyError before any HTTP call.
g = Grill()
Env varPrefixSource
POMA_API_KEYpoma_acc_…Account-level — works on /primeCut/*. Rejected on /grill/*.
POMA_GRILL_API_KEYpoma_prod_gr_…Project-level — required for every Grill call. Created by POST /projects.

Full provisioning flow: Grill Authentication and Create a Grill project.

Ingest

Grill.ingest(...) mirrors PrimeCut.ingest(...): submit a file, wait for the job, return a typed result. Unlike PrimeCut, no archive is downloaded — the document is indexed server-side into your project namespace and is immediately searchable.

python
from poma import Grill

g = Grill()
result = g.ingest("annual-report.pdf")
print(result.job_id, result.status, result.usage)

If you need to submit now and wait later (e.g. for batch ingest, or to free the calling process), use the split form:

python
job_id = g.submit("annual-report.pdf")
# … minutes or hours later …
result = g.collect(job_id)

g.search(query) runs hybrid search across every document in the project namespace and returns a GrillContext — a single string field of XML + Markdown, structured and token-budgeted, that drops straight into an LLM prompt.

python
ctx = g.search(
    "How did operating margin change year over year?",
    max_tokens=4000,
)
print(ctx.context)

To scope a search to one document (e.g. an in-document Q&A widget), use g.search_in_doc(query, doc_id):

python
ctx = g.search_in_doc("summarize section 3", "doc_abc123")

min_relevance, max_tokens, return_assets, return_page_images, and exclude_doc_ids are accepted on both — see the Grill reference for the full signature.

Manage documents

The namespace is queryable and mutable:

python
# List every document in the project namespace
for doc in g.list_docs():
    print(doc.doc_id, doc.filename, doc.pages, doc.source_job_id)

# Inspect one
info = g.get_doc("doc_abc123")

# Remove one — frees both vectors and stored content
result = g.delete_doc("doc_abc123")
print(result.vectors_deleted, result.storage_deleted)

DocInfo.source_job_id lets you correlate the doc back to the job_id returned by submit(...) / ingest(...).

Async

AsyncGrill mirrors the sync surface for use inside asyncio code (web servers, agents, pipelines):

python
import asyncio
from poma import AsyncGrill

async def main() -> None:
    async with AsyncGrill() as g:
        result = await g.ingest("annual-report.pdf")
        ctx = await g.search("operating margin year over year", max_tokens=4000)
        print(ctx.context)

asyncio.run(main())

When to use Grill vs PrimeCut

You want…Use
Prompt-ready retrieval context without running your own vector storeGrill
Raw chunks + embeddings to feed into LangChain / LlamaIndex / Qdrant yourselfPrimeCut + generate_cheatsheets(...)
Hybrid search across many documentsGrill (g.search)
One-shot summarization of a single documentEither; PrimeCut if you already have a chunk pipeline
Strict on-prem / air-gapped processingPrimeCut (results are downloadable archives)

Both clients can live side-by-side in the same project — they read separate env vars (POMA_API_KEY for PrimeCut, POMA_GRILL_API_KEY for Grill).

Next