Grill — Context Engine for LLM Applications

Grill is POMA AI's end-to-end context engine. It pairs the PrimeCut ingestion pipeline with a managed retrieval layer that emits prompt-ready context — not raw ranked hits — through a single API call. Ingest a document once, then ask Grill questions in natural language and it returns an XML + Markdown block you can drop straight into an LLM call or Agent Workflow.

Grill is shipped as part of the POMA AI API v3 and is exposed under the /grill/* endpoints on https://api.poma-ai.com/v3.

Grill is the recommended path when you want to skip the whole management necessary to get great content and if you really want to focus on your main work. If you only need raw chunks to feed your own retrieval stack, use PrimeCut instead.

What you get

Capability	What Grill does for you
Ingestion	Reuses PrimeCut to parse, chunk, embed and persist documents into the project's vector + storage namespace.
Hybrid retrieval	Runs hybrid (lexical + semantic) search across that namespace.
Context assembly	Wraps hits in `<doc>` tags, applies sandwich ordering, drops a `[…]` between non-consecutive chunks of the same document, and enforces a token budget.
Per-doc filtering	Restrict retrieval to a single document with `doc_filter` — useful for "chat with this PDF" UX.
Document admin	List, inspect, and delete documents inside your namespace without touching the vector store directly.

Why a context engine, not a vector DB

A vector database returns chunks. An LLM call needs context: ordered, deduplicated, skip-aware, budgeted. Doing that translation layer well is the bulk of every production RAG project. Grill does it once, server-side, and ships the formatted block back to you. That means:

No reranker code. Hits are already filtered by min_relevance and ordered for prompt locality.
No token-budget arithmetic. Set target_tokens for the answer size you want; Grill admits the best hits (sandwich-aware) until the block fits.
No "where did this answer come from" gap. Each <doc> block is tagged with the source document id, so citations are trivial.
No duplicate ingestion logic. PrimeCut is the same pipeline Grill uses internally — what you ingest in PrimeCut works in Grill, and vice versa.

How it fits with the rest of POMA AI

Product	Returns	Best for
PrimeCut	Structured `.poma` archives (chunks, chunksets, assets)	You own the retrieval stack and just need high-quality chunks.
Grill	A `RetrievalContext` block of XML + Markdown, prompt-ready	You want POMA to handle ingestion and retrieval and just hand you LLM-ready context.

Both are billed per project. A project is bound to one product at creation time (primecut or grill).

The shape of the workflow

text

[your file] ──▶ POST /grill/ingest ──▶ {job_id}
                                          │
                                          ├── poll  GET /jobs/{job_id}/status
                                          └── stream GET /status/v1/jobs/{job_id}   (SSE)
                                                       │
                                                       ▼ (status: done)
[your query] ──▶ POST /grill/search ──▶ { context: "<doc>…</doc>" }
                                          │
                                          ▼
                                 drop into LLM prompt

There is no separate "create index" step and no embedding model to choose — Grill manages that for you behind the project namespace.

Pages in this section

Authentication — project API keys and how Grill auth differs from PrimeCut auth.
Create a Grill project — the one-time api_key returned at project creation.
Quickstart — ingest → search → prompt in three curl calls.
Ingestion — how /grill/ingest differs from /primeCut/ingest.
Retrieval — min_relevance, target_tokens / max_tokens, hybrid scoring.
RetrievalContext format — sandwich ordering, inline markers ([pN], […]), token budgeting, citations.
Document management — list, inspect, delete documents.
API reference — every /grill/* endpoint.

Use Grill from an MCP client

Want Grill inside Claude Code, Claude Desktop, or Cursor? Use poma-grill-mcp — POMA's Grill-only MCP server. Seven tools (ingest sync/async/batch/resume, jobs status, search, explain), file_path for large files, Go and Node implementations, plus a fully hosted endpoint at https://mcp.poma-ai.com/grill/v1 if you'd rather skip the local install.

The separate poma-mcp server covers PrimeCut only — no Grill tools. If you want both products in the same agent, install both binaries side-by-side.

Grill

Getting started

Concepts

Reference

PrimeCut

Getting started

Concepts

Reference

Python SDK

Getting started

Concepts

Reference

Integrations

Migration

CLI

MCP

Learn (study path)

Chunking

Ingestion

Grill — Context Engine for LLM Applications

What you get

Why a context engine, not a vector DB

How it fits with the rest of POMA AI

The shape of the workflow

Pages in this section

Use Grill from an MCP client

See also

Chunking

Ingestion

Grill — Context Engine for LLM Applications ​

What you get ​

Why a context engine, not a vector DB ​

How it fits with the rest of POMA AI ​

The shape of the workflow ​

Pages in this section ​

Use Grill from an MCP client ​

See also ​

Grill — Context Engine for LLM Applications

What you get

Why a context engine, not a vector DB

How it fits with the rest of POMA AI

The shape of the workflow

Pages in this section

Use Grill from an MCP client

See also