Grill: Context-Engine Client

Grill is the SDK's client for POMA's RAG / hybrid-search product. Where PrimeCut returns a .poma archive of chunks for you to embed yourself, Grill indexes the document server-side into a project namespace and serves prompt-ready retrieval context directly.

If you are coming from the v2 SDK (PrimeCut + generate_cheatsheets(...)), think of Grill as the same idea — structure-preserving retrieval context for an LLM — but managed server-side: no vector store to run, no embeddings to compute, no cheatsheet assembly to wire up.

Authentication

Grill is project-scoped, not account-scoped. The SDK enforces this locally:

python

from poma import Grill

# Reads POMA_GRILL_API_KEY from the environment.
# Validates the prefix and raises InvalidGrillApiKeyError before any HTTP call.
g = Grill()

Env var	Prefix	Source
`POMA_API_KEY`	`poma_acc_…`	Account-level — works on `/primeCut/`. Rejected* on `/grill/*`.
`POMA_GRILL_API_KEY`	`poma_prod_gr_…`	Project-level — required for every Grill call. Created by `POST /projects`.

Full provisioning flow: Grill Authentication and Create a Grill project.

Ingest

Grill.ingest(...) mirrors PrimeCut.ingest(...): submit a file, wait for the job, return a typed result. Unlike PrimeCut, no archive is downloaded — the document is indexed server-side into your project namespace and is immediately searchable.

python

from poma import Grill

g = Grill()
result = g.ingest("annual-report.pdf")
print(result.job_id, result.status, result.usage)

If you need to submit now and wait later (e.g. for batch ingest, or to free the calling process), use the split form:

python

job_id = g.submit("annual-report.pdf")
# … minutes or hours later …
result = g.collect(job_id)

Search

g.search(query) runs hybrid search across every document in the project namespace and returns a GrillContext — a single string field of XML + Markdown, structured and token-budgeted, that drops straight into an LLM prompt.

python

ctx = g.search("How did operating margin change year over year?")
print(ctx.context)

To scope a search to one document (e.g. an in-document Q&A widget), use g.search_in_doc(query, doc_id):

python

ctx = g.search_in_doc("summarize section 3", "doc_abc123")

min_relevance, target_tokens, max_tokens, return_assets, return_page_images, and exclude_doc_ids are accepted on both — see the Grill reference for the full signature.

Manage documents

The namespace is queryable and mutable:

python

# List every document in the project namespace
for doc in g.list_docs():
    print(doc.doc_id, doc.filename, doc.pages, doc.source_job_id)

# Inspect one
info = g.get_doc("doc_abc123")

# Remove one — frees both vectors and stored content
result = g.delete_doc("doc_abc123")
print(result.vectors_deleted, result.storage_deleted)

DocInfo.source_job_id lets you correlate the doc back to the job_id returned by submit(...) / ingest(...).

Async

AsyncGrill mirrors the sync surface for use inside asyncio code (web servers, agents, pipelines):

python

import asyncio
from poma import AsyncGrill

async def main() -> None:
    async with AsyncGrill() as g:
        result = await g.ingest("annual-report.pdf")
        ctx = await g.search("operating margin year over year")
        print(ctx.context)

asyncio.run(main())

When to use Grill vs PrimeCut

You want…	Use
Prompt-ready retrieval context without running your own vector store	Grill
Raw chunks + embeddings to feed into LangChain / LlamaIndex / Qdrant yourself	PrimeCut + `generate_cheatsheets(...)`
Hybrid search across many documents	Grill (`g.search`)
One-shot summarization of a single document	Either; PrimeCut if you already have a chunk pipeline
Strict on-prem / air-gapped processing	PrimeCut (results are downloadable archives)

Both clients can live side-by-side in the same project — they read separate env vars (POMA_API_KEY for PrimeCut, POMA_GRILL_API_KEY for Grill).

Grill reference — every method signature and return type.
AsyncGrill reference.
Grill product docs — endpoint-level reference and end-to-end concepts.

Grill

Getting started

Concepts

Reference

PrimeCut

Getting started

Concepts

Reference

Python SDK

Getting started

Concepts

Reference

Integrations

Migration

CLI

MCP

Learn (study path)

Chunking

Ingestion

Grill: Context-Engine Client

Authentication

Ingest

Search

Manage documents

Async

When to use Grill vs PrimeCut

Next

Chunking

Ingestion

Grill: Context-Engine Client ​

Authentication ​

Ingest ​

Search ​

Manage documents ​

Async ​

When to use Grill vs PrimeCut ​

Next ​

Grill: Context-Engine Client

Authentication

Ingest

Search

Manage documents

Async

When to use Grill vs PrimeCut

Next