Skip to content

PrimeCut

The RAG ingestion engine for documents that actually have structure.

PrimeCut is POMA AI's document ingestion product. You hand it a PDF, DOCX, HTML page, or image and it returns the same document as a hierarchy of chunks and chunksets — typed units that preserve the document's structure so retrieval never loses context.

It's the layer that sits between "raw documents in your bucket" and "retrieval calls in your RAG pipeline." Drop it in front of LangChain, LlamaIndex, Qdrant, pgvector, or your own stack.

Where RAG begins

Most RAG pipelines fail at the same point: chunking. Fixed-size character splitters cut tables mid-row, separate headers from their content, and force you to inflate chunk_overlap until your index is full of near-duplicates. The model sees orphaned facts and answers them out of context.

PrimeCut chunks by document hierarchy, not by character count. Every retrieved sentence arrives with its breadcrumb trail intact — chapter → section → subsection → paragraph — so the LLM reads the leaf and the lineage that gives it meaning.

In our reference benchmark on a notoriously hard legal document (Andorra's personalised licence-plate law), the same query needed:

ApproachTokens of retrieved contextInformation loss
Recursive character splitter (LangChain default)1,542 tokensMaterial — context fragments orphaned
PrimeCut chunksets + cheatsheets337 tokensNone

That's a ~80% token reduction with zero information loss. Multiply that across a million queries.

Full methodology and comparison: The Expert's Guide to Document Ingestion & Chunking.

How PrimeCut fits

your document  →  PrimeCut  →  hierarchy of chunks + chunksets  →  your retrieval stack  →  prompt-ready context

PrimeCut does the structurally-hard part (parse + chunk + group) and hands you the result. You keep ownership of:

  • The vector store (Qdrant, pgvector, Pinecone, Weaviate — your choice)
  • The embedding model (OpenAI, Cohere, BGE, your own — your choice)
  • The retrieval strategy (dense, sparse, hybrid, reranked — your choice)
  • The LLM call (anywhere, any model)

POMA's job is the input. Everything downstream stays yours.

Looking for an end-to-end managed RAG endpoint — chunking + indexing + hybrid search + prompt-ready context, all server-side? That's Grill. PrimeCut is for teams that want to own their retrieval stack.

What you get out

PrimeCut emits two structural artifacts per document, plus a portable archive.

Chunks — the typed units

Each chunk is a sentence or paragraph with a depth integer (its level in the document hierarchy), a to_embed field (normalized text ready for embeddings), a page number, and optional references to images, tables, or code blocks. Tables stay as HTML so rows and columns survive embedding. Images get descriptions interleaved into the chunk stream so multimodal context is preserved.

python
PomaChunk(
    chunk_index=42,
    content="Operating margin rose from 18.4% to 21.1% in FY25.",
    depth=3,
    file_id="annual-report-2025",
    to_embed="operating margin rose from 18.4% to 21.1% in fy25",
)

Full schema: PrimeCut SDK results.

Chunksets — the retrieval units

A chunkset is a root-to-leaf path through the document's hierarchy — the chunk you want, plus every ancestor breadcrumb it depends on. No more "the section header is in chunk 17 but the answer is in chunk 19" failures.

Annual Report 2025 → Financial highlights → Operating margin →
  "Operating margin rose from 18.4% to 21.1% in FY25."

Embed and retrieve at the chunkset level and your context is always self-explanatory to the model. Full theory: POMA chunksets.

The .poma archive

A single ZIP file that bundles chunks, chunksets, image renders, page images, and metadata. Portable, inspectable, durable. Works the same way locally, on-prem, or in the cloud.

my-doc.poma
├── chunks.json
├── chunksets.json
├── meta.json
├── images/
│   ├── image_00001.jpeg
│   └── ...
└── pages/
    ├── page_001.png
    └── ...

Full format: Results and archives.

Built for high-stakes documents

PrimeCut earns its keep where context loss has a cost — where the answer to "is this clause enforceable?" needs the surrounding contract, not a fragment from page 47.

Strong fit:

  • Legal & compliance — contracts, policies, regulations, tax rulings. Hierarchical sections + nested clauses + tables that must not be shredded.
  • Financial — annual reports, prospectuses, term sheets. Long-form prose interleaved with multi-page tables.
  • Technical manuals & specs — engineering docs, ISO standards, API specs. Deep H1 > H2 > H3 > H4 hierarchies with code blocks.
  • Internal knowledge bases — onboarding handbooks, runbooks, HR policies. Where "where in the doc did that come from?" is a real question.

Weaker fit (and you should know):

  • Chat transcripts, support tickets, social-media corpora — flat, short, no real hierarchy. A simple sentence splitter is often enough.
  • Single-paragraph FAQs — too small to chunk; embed whole.

Supported formats

Documents, presentations, spreadsheets, images:

pdf, doc, docx, dotx, rtf, txt, md, html, htm, xml, ppt, pptx, pps, ppsx, pot, potx, key, xls, xlsx, xlsb, xltx, csv, numbers, ods, odc, png, jpg, jpeg, gif, bmp, tif, tiff, svg, webp, ico, heic, heif, psd, epub, mobi, djvu, dwg, dxf, dwf, dwfx, vsd, vsdx, ai, eps, ps, prn, xps, oxps, pub, mdi, pages, odp, odf, odt.

OCR is automatic for image-only PDFs. Tables are detected and preserved structurally, not flattened to text.

Ingestion modes

ModeWhen to useCost vs Pro
Pro (default)High-stakes documents — legal, financial, technical specs. Maximum structural fidelity.100%
EcoInternal knowledge bases, low-stakes content, exploratory pipelines. Slightly lower fidelity for ~40% lower cost.~60%

Full comparison: Eco ingestion.

Get started

python
# pip install poma
from poma import PrimeCut

client = PrimeCut()  # reads POMA_API_KEY from env
result = client.ingest("contract.pdf")

# Now you have typed chunks + chunksets — feed them into any vector store.
for chunkset in result.chunksets:
    embed_and_index(chunkset.to_embed, metadata={"chunk_ids": chunkset.chunks})
bash
# Install via Homebrew, Go, or release archive — see /cli/
export POMA_API_KEY="poma_acc_…"

poma primecut ingest-sync --file contract.pdf --output result.poma
# → bin/<job_id>.poma — chunks.json, chunksets.json, images/, pages/
bash
# Submit the file
JOB=$(curl -sS -X POST "https://api.poma-ai.com/v3/primeCut/ingest" \
  -H "authorization: Bearer $POMA_API_KEY" \
  -H "content-type: application/octet-stream" \
  -H 'content-disposition: attachment; filename="contract.pdf"' \
  --data-binary @contract.pdf)

# Poll, download the .poma archive, unzip, read the JSONs.
javascript
// Wire poma-mcp into your MCP client (see /mcp/poma-mcp)
// Then ask the agent: "Ingest ~/contracts/q3.pdf with POMA PrimeCut"
// The agent calls primecut_ingest and returns chunks + chunksets.

Then plug the result into your retrieval stack of choice:

Reference

Where to look
ConceptsIngestion · Results & archives · Cheatsheets · Eco ingestion
Python SDK classPrimeCut · AsyncPrimeCut
REST APIapi.poma-ai.com/v3/docs/primeCut/ingest, /jobs/{id}/status, /jobs/{id}/download
CLIpoma primecut ingest
MCPpoma-mcpprimecut_ingest, primecut_status, primecut_resume, primecut_get_result
AuthAuthentication — account-level API key (prefix poma_acc_…)

Pricing

PrimeCut is billed per page. Free tier covers 1,000 pages — enough to evaluate the engine on real documents before committing.

See pricing for current rates.

Compare with Grill

If you've read this far and you're wondering "do I need to run my own retrieval, or do I want POMA to do all of it?" — that's the PrimeCut vs Grill question.

PrimeCutGrill
What POMA doesIngestion → chunks + chunksets in a .poma archiveIngestion + indexing + hybrid search → prompt-ready context
You run a vector storeYesNo
You write retrieval codeYesNo (POMA does it)
Outputchunks.json, chunksets.json, images, pagesA single context string per query
Best forTeams with an existing retrieval stack, on-prem requirements, or strict ownership constraintsTeams that want a managed RAG endpoint with one HTTP call
APIv2 + v3 (account-scoped key)v3 only (project-scoped key)

Pick PrimeCut when you want chunks. Pick Grill when you want answers-shaped context. Both run on the same ingestion engine, so the structural fidelity is identical — they differ only in what happens after the chunks are produced.

Open Grill →