Skip to content

Grill — Context Engine for LLM Applications

Grill is POMA AI's end-to-end context engine. It pairs the PrimeCut ingestion pipeline with a managed retrieval layer that emits prompt-ready context — not raw ranked hits — through a single API call. Ingest a document once, then ask Grill questions in natural language and it returns an XML + Markdown block you can drop straight into an LLM call or Agent Workflow.

Grill is shipped as part of the POMA AI API v3 and is exposed under the /grill/* endpoints on https://api.poma-ai.com/v3.

Grill is the recommended path when you want to skip the whole management necessary to get great content and if you really want to focus on your main work. If you only need raw chunks to feed your own retrieval stack, use PrimeCut instead.

What you get

CapabilityWhat Grill does for you
IngestionReuses PrimeCut to parse, chunk, embed and persist documents into the project's vector + storage namespace.
Hybrid retrievalRuns hybrid (lexical + semantic) search across that namespace.
Context assemblyWraps hits in <doc> tags, applies sandwich ordering, inserts gap markers between non-adjacent passages, and enforces a token budget.
Per-doc filteringRestrict retrieval to a single document with doc_filter — useful for "chat with this PDF" UX.
Document adminList, inspect, and delete documents inside your namespace without touching the vector store directly.

Why a context engine, not a vector DB

A vector database returns chunks. An LLM call needs context: ordered, deduplicated, gap-marked, budgeted. Doing that translation layer well is the bulk of every production RAG project. Grill does it once, server-side, and ships the formatted block back to you. That means:

  • No reranker code. Hits are already filtered by min_relevance and ordered for prompt locality.
  • No token-budget arithmetic. Pass max_tokens, Grill drops the lowest-relevance hits (sandwich-aware) until the block fits.
  • No "where did this answer come from" gap. Each <doc> block is tagged with the source document id, so citations are trivial.
  • No duplicate ingestion logic. PrimeCut is the same pipeline Grill uses internally — what you ingest in PrimeCut works in Grill, and vice versa.

How it fits with the rest of POMA AI

ProductReturnsBest for
PrimeCutStructured .poma archives (chunks, chunksets, assets)You own the retrieval stack and just need high-quality chunks.
GrillA RetrievalContext block of XML + Markdown, prompt-readyYou want POMA to handle ingestion and retrieval and just hand you LLM-ready context.

Both are billed per project. A project is bound to one product at creation time (primecut or grill).

The shape of the workflow

text
[your file] ──▶ POST /grill/ingest ──▶ {job_id}

                                          ├── poll  GET /jobs/{job_id}/status
                                          └── stream GET /status/v1/jobs/{job_id}   (SSE)

                                                       ▼ (status: done)
[your query] ──▶ POST /grill/search ──▶ { context: "<doc>…</doc>" }


                                 drop into LLM prompt

There is no separate "create index" step and no embedding model to choose — Grill manages that for you behind the project namespace.

Pages in this section

Use Grill from an MCP client

Want Grill inside Claude Code, Claude Desktop, or Cursor? Use poma-grill-mcp — POMA's Grill-only MCP server. Seven tools (ingest sync/async/batch/resume, jobs status, search, explain), file_path for large files, Go and Node implementations, plus a fully hosted endpoint at https://mcp.poma-ai.com/grill/v1 if you'd rather skip the local install.

The separate poma-mcp server covers PrimeCut only — no Grill tools. If you want both products in the same agent, install both binaries side-by-side.

See also