PrimeCut

The RAG ingestion engine for documents that actually have structure.

PrimeCut is POMA AI's document ingestion product. You hand it a PDF, DOCX, HTML page, or image and it returns the same document as a hierarchy of chunks and chunksets — typed units that preserve the document's structure so retrieval never loses context.

It's the layer that sits between "raw documents in your bucket" and "retrieval calls in your RAG pipeline." Drop it in front of LangChain, LlamaIndex, Qdrant, pgvector, or your own stack.

Where RAG begins

Most RAG pipelines fail at the same point: chunking. Fixed-size character splitters cut tables mid-row, separate headers from their content, and force you to inflate chunk_overlap until your index is full of near-duplicates. The model sees orphaned facts and answers them out of context.

PrimeCut chunks by document hierarchy, not by character count. Every retrieved sentence arrives with its breadcrumb trail intact — chapter → section → subsection → paragraph — so the LLM reads the leaf and the lineage that gives it meaning.

In our reference benchmark on a notoriously hard legal document (Andorra's personalised licence-plate law), the same query needed:

Approach	Tokens of retrieved context	Information loss
Recursive character splitter (LangChain default)	1,542 tokens	Material — context fragments orphaned
PrimeCut chunksets + cheatsheets	337 tokens	None

That's a ~80% token reduction with zero information loss. Multiply that across a million queries.

Full methodology and comparison: The Expert's Guide to Document Ingestion & Chunking.

How PrimeCut fits

your document  →  PrimeCut  →  hierarchy of chunks + chunksets  →  your retrieval stack  →  prompt-ready context

PrimeCut does the structurally-hard part (parse + chunk + group) and hands you the result. You keep ownership of:

The vector store (Qdrant, pgvector, Pinecone, Weaviate — your choice)
The embedding model (OpenAI, Cohere, BGE, your own — your choice)
The retrieval strategy (dense, sparse, hybrid, reranked — your choice)
The LLM call (anywhere, any model)

POMA's job is the input. Everything downstream stays yours.

Looking for an end-to-end managed RAG endpoint — chunking + indexing + hybrid search + prompt-ready context, all server-side? That's Grill. PrimeCut is for teams that want to own their retrieval stack.

What you get out

PrimeCut emits two structural artifacts per document, plus a portable archive.

Chunks — the typed units

Each chunk is a sentence or paragraph with a depth integer (its level in the document hierarchy), a to_embed field (normalized text ready for embeddings), a page number, and optional references to images, tables, or code blocks. Tables stay as HTML so rows and columns survive embedding. Images get descriptions interleaved into the chunk stream so multimodal context is preserved.

python

PomaChunk(
    chunk_index=42,
    content="Operating margin rose from 18.4% to 21.1% in FY25.",
    depth=3,
    file_id="annual-report-2025",
    to_embed="operating margin rose from 18.4% to 21.1% in fy25",
)

Full schema: PrimeCut SDK results.

Chunksets — the retrieval units

A chunkset is a root-to-leaf path through the document's hierarchy — the chunk you want, plus every ancestor breadcrumb it depends on. No more "the section header is in chunk 17 but the answer is in chunk 19" failures.

Annual Report 2025 → Financial highlights → Operating margin →
  "Operating margin rose from 18.4% to 21.1% in FY25."

Embed and retrieve at the chunkset level and your context is always self-explanatory to the model. Full theory: POMA chunksets.

The `.poma` archive

A single ZIP file that bundles chunks, chunksets, image renders, page images, and metadata. Portable, inspectable, durable. Works the same way locally, on-prem, or in the cloud.

my-doc.poma
├── chunks.json
├── chunksets.json
├── meta.json
├── images/
│   ├── image_00001.jpeg
│   └── ...
└── pages/
    ├── page_001.png
    └── ...

Full format: Results and archives.

Built for high-stakes documents

PrimeCut earns its keep where context loss has a cost — where the answer to "is this clause enforceable?" needs the surrounding contract, not a fragment from page 47.

Strong fit:

Legal & compliance — contracts, policies, regulations, tax rulings. Hierarchical sections + nested clauses + tables that must not be shredded.
Financial — annual reports, prospectuses, term sheets. Long-form prose interleaved with multi-page tables.
Technical manuals & specs — engineering docs, ISO standards, API specs. Deep H1 > H2 > H3 > H4 hierarchies with code blocks.
Internal knowledge bases — onboarding handbooks, runbooks, HR policies. Where "where in the doc did that come from?" is a real question.

Weaker fit (and you should know):

Chat transcripts, support tickets, social-media corpora — flat, short, no real hierarchy. A simple sentence splitter is often enough.
Single-paragraph FAQs — too small to chunk; embed whole.

Supported formats

Documents, presentations, spreadsheets, images:

pdf, doc, docx, dotx, rtf, txt, md, html, htm, xml, ppt, pptx, pps, ppsx, pot, potx, key, xls, xlsx, xlsb, xltx, csv, numbers, ods, odc, png, jpg, jpeg, gif, bmp, tif, tiff, svg, webp, ico, heic, heif, psd, epub, mobi, djvu, dwg, dxf, dwf, dwfx, vsd, vsdx, ai, eps, ps, prn, xps, oxps, pub, mdi, pages, odp, odf, odt.

OCR is automatic for image-only PDFs. Tables are detected and preserved structurally, not flattened to text.

Ingestion modes

Mode	When to use	Cost vs Pro
Pro (default)	High-stakes documents — legal, financial, technical specs. Maximum structural fidelity.	100%
Eco	Internal knowledge bases, low-stakes content, exploratory pipelines. Slightly lower fidelity for ~40% lower cost.	~60%

Full comparison: Eco ingestion.

Get started

Python SDKCLIREST APIMCP — from Claude / Cursor

python

# pip install poma
from poma import PrimeCut

client = PrimeCut()  # reads POMA_API_KEY from env
result = client.ingest("contract.pdf")

# Now you have typed chunks + chunksets — feed them into any vector store.
for chunkset in result.chunksets:
    embed_and_index(chunkset.to_embed, metadata={"chunk_ids": chunkset.chunks})

bash

# Install via Homebrew, Go, or release archive — see /cli/
export POMA_API_KEY="poma_acc_…"

poma primecut ingest-sync --file contract.pdf --output result.poma
# → bin/<job_id>.poma — chunks.json, chunksets.json, images/, pages/

bash

# Submit the file
JOB=$(curl -sS -X POST "https://api.poma-ai.com/v3/primeCut/ingest" \
  -H "authorization: Bearer $POMA_API_KEY" \
  -H "content-type: application/octet-stream" \
  -H 'content-disposition: attachment; filename="contract.pdf"' \
  --data-binary @contract.pdf)

# Poll, download the .poma archive, unzip, read the JSONs.

javascript

// Wire poma-mcp into your MCP client (see /mcp/poma-mcp)
// Then ask the agent: "Ingest ~/contracts/q3.pdf with POMA PrimeCut"
// The agent calls primecut_ingest and returns chunks + chunksets.

Then plug the result into your retrieval stack of choice:

Qdrant integration — PomaQdrant subclass with hybrid search built in.
LangChain integration — drop-in Retriever for any LangChain chain.
LlamaIndex integration — Reader + NodeParser that produce LlamaIndex-native nodes.
Cheatsheets at query time — re-assemble retrieved chunksets into a prompt-ready block.

Reference

	Where to look
Concepts	Ingestion · Results & archives · Cheatsheets · Eco ingestion
Python SDK class	`PrimeCut` · `AsyncPrimeCut`
REST API	`api.poma-ai.com/v3/docs` — `/primeCut/ingest`, `/jobs/{id}/status`, `/jobs/{id}/download`
CLI	`poma primecut ingest`
MCP	`poma-mcp` — `primecut_ingest`, `primecut_status`, `primecut_resume`, `primecut_get_result`
Auth	Authentication — account-level API key (prefix `poma_acc_…`)

Pricing

PrimeCut is billed per page. Free tier covers 1,000 pages — enough to evaluate the engine on real documents before committing.

See pricing for current rates.

Compare with Grill

If you've read this far and you're wondering "do I need to run my own retrieval, or do I want POMA to do all of it?" — that's the PrimeCut vs Grill question.

	PrimeCut	Grill
What POMA does	Ingestion → chunks + chunksets in a `.poma` archive	Ingestion + indexing + hybrid search → prompt-ready context
You run a vector store	Yes	No
You write retrieval code	Yes	No (POMA does it)
Output	`chunks.json`, `chunksets.json`, images, pages	A single `context` string per query
Best for	Teams with an existing retrieval stack, on-prem requirements, or strict ownership constraints	Teams that want a managed RAG endpoint with one HTTP call
API	v2 + v3 (account-scoped key)	v3 only (project-scoped key)

Pick PrimeCut when you want chunks. Pick Grill when you want answers-shaped context. Both run on the same ingestion engine, so the structural fidelity is identical — they differ only in what happens after the chunks are produced.

Open Grill →

Grill

Getting started

Concepts

Reference

PrimeCut

Getting started

Concepts

Reference

Python SDK

Getting started

Concepts

Reference

Integrations

Migration

CLI

MCP

Learn (study path)

Chunking

Ingestion

PrimeCut

Where RAG begins

How PrimeCut fits

What you get out

Chunks — the typed units

Chunksets — the retrieval units

The `.poma` archive

Built for high-stakes documents

Supported formats

Ingestion modes

Get started

Reference

Pricing

Compare with Grill

Chunking

Ingestion

PrimeCut ​

Where RAG begins ​

How PrimeCut fits ​

What you get out ​

Chunks — the typed units ​

Chunksets — the retrieval units ​

The .poma archive ​

Built for high-stakes documents ​

Supported formats ​

Ingestion modes ​

Get started ​

Reference ​

Pricing ​

Compare with Grill ​

PrimeCut

Where RAG begins

How PrimeCut fits

What you get out

Chunks — the typed units

Chunksets — the retrieval units

The `.poma` archive

Built for high-stakes documents

Supported formats

Ingestion modes

Get started

Reference

Pricing

Compare with Grill