Grill — Context Engine for LLM Applications
Grill is POMA AI's end-to-end context engine. It pairs the PrimeCut ingestion pipeline with a managed retrieval layer that emits prompt-ready context — not raw ranked hits — through a single API call. Ingest a document once, then ask Grill questions in natural language and it returns an XML + Markdown block you can drop straight into an LLM call or Agent Workflow.
Grill is shipped as part of the POMA AI API v3 and is exposed under the /grill/* endpoints on https://api.poma-ai.com/v3.
Grill is the recommended path when you want to skip the whole management necessary to get great content and if you really want to focus on your main work. If you only need raw chunks to feed your own retrieval stack, use PrimeCut instead.
What you get
| Capability | What Grill does for you |
|---|---|
| Ingestion | Reuses PrimeCut to parse, chunk, embed and persist documents into the project's vector + storage namespace. |
| Hybrid retrieval | Runs hybrid (lexical + semantic) search across that namespace. |
| Context assembly | Wraps hits in <doc> tags, applies sandwich ordering, inserts gap markers between non-adjacent passages, and enforces a token budget. |
| Per-doc filtering | Restrict retrieval to a single document with doc_filter — useful for "chat with this PDF" UX. |
| Document admin | List, inspect, and delete documents inside your namespace without touching the vector store directly. |
Why a context engine, not a vector DB
A vector database returns chunks. An LLM call needs context: ordered, deduplicated, gap-marked, budgeted. Doing that translation layer well is the bulk of every production RAG project. Grill does it once, server-side, and ships the formatted block back to you. That means:
- No reranker code. Hits are already filtered by
min_relevanceand ordered for prompt locality. - No token-budget arithmetic. Pass
max_tokens, Grill drops the lowest-relevance hits (sandwich-aware) until the block fits. - No "where did this answer come from" gap. Each
<doc>block is tagged with the source document id, so citations are trivial. - No duplicate ingestion logic. PrimeCut is the same pipeline Grill uses internally — what you ingest in PrimeCut works in Grill, and vice versa.
How it fits with the rest of POMA AI
| Product | Returns | Best for |
|---|---|---|
| PrimeCut | Structured .poma archives (chunks, chunksets, assets) | You own the retrieval stack and just need high-quality chunks. |
| Grill | A RetrievalContext block of XML + Markdown, prompt-ready | You want POMA to handle ingestion and retrieval and just hand you LLM-ready context. |
Both are billed per project. A project is bound to one product at creation time (primecut or grill).
The shape of the workflow
[your file] ──▶ POST /grill/ingest ──▶ {job_id}
│
├── poll GET /jobs/{job_id}/status
└── stream GET /status/v1/jobs/{job_id} (SSE)
│
▼ (status: done)
[your query] ──▶ POST /grill/search ──▶ { context: "<doc>…</doc>" }
│
▼
drop into LLM promptThere is no separate "create index" step and no embedding model to choose — Grill manages that for you behind the project namespace.
Pages in this section
- Authentication — project API keys and how Grill auth differs from PrimeCut auth.
- Create a Grill project — the one-time
api_keyreturned at project creation. - Quickstart — ingest → search → prompt in three curl calls.
- Ingestion — how
/grill/ingestdiffers from/primeCut/ingest. - Retrieval —
min_relevance,max_tokens, hybrid scoring. - RetrievalContext format — sandwich ordering, gap markers, token budgeting, citations.
- Document management — list, inspect, delete documents.
- API reference — every
/grill/*endpoint.
Use Grill from an MCP client
Want Grill inside Claude Code, Claude Desktop, or Cursor? Use poma-grill-mcp — POMA's Grill-only MCP server. Seven tools (ingest sync/async/batch/resume, jobs status, search, explain), file_path for large files, Go and Node implementations, plus a fully hosted endpoint at https://mcp.poma-ai.com/grill/v1 if you'd rather skip the local install.
The separate
poma-mcpserver covers PrimeCut only — no Grill tools. If you want both products in the same agent, install both binaries side-by-side.