Better Ingestion Ingestion Chunking Retrieval Ingestion Chunking

POMA PrimeCut is a document ingestion and chunking engine that preserves hierarchical document structure, eliminates context poisoning, and produces clean, semantically coherent chunks at unmatched accuracy per token.

Try for Free

Context Poisoning Starts at Ingestion
RAG Chunking Strategies Fail Without Hierarchical Ingestion

Engineers optimizing RAG systems spend most of their time tuning retrieval: adjusting similarity thresholds, re-ranking results, switching embedding models. These interventions treat symptoms. The underlying pathology is how documents are parsed and chunked upstream — a step most pipelines delegate to naive text splitters or PDF-to-text extractors that have no awareness of document semantics.

The result is a vector store filled with structurally corrupted, semantically diluted embeddings. No retrieval strategy fully recovers from that.

Failure Mode 01

Context Poisoning

Chunks containing content from multiple unrelated sections produce embeddings that represent neither topic accurately.

The consequence: The LLM receives contradictory context. Hallucination rates rise.

Failure Mode 02

Structural Signal Loss

Document hierarchy — headings, tables, section boundaries — is discarded before chunking begins.

The consequence: Hierarchical queries fail. The vector store cannot distinguish a section heading from a footnote.

Failure Mode 03

Boundary Blindness

Fixed-size chunking severs semantic units at arbitrary character or token limits.

The consequence: High-relevance content becomes unfindable even when it exists in the corpus.

These are not edge cases. They are the default behavior of general-purpose text processing pipelines applied to structured documents. Fixing retrieval without fixing ingestion is rearranging deck chairs. PrimeCut addresses all three failure modes at the source. Learn more about chunking failure modes →

How Hierarchical Ingestion and Chunking Works: Structure to RAG-Ready Chunks
How POMA PrimeCut Sees Your Document Hierarchy

Every document carries an internal logic: a hierarchy of headings, sub-sections, tables, lists, and supporting elements that define what content belongs together and why. That structure is not decoration — it is the semantic map of the document.

Standard ingestion pipelines discard this map. They extract raw text and hand it to a chunker that has no knowledge of where one idea ends and another begins.

PrimeCut understands your document’s content hierarchy before chunking — preserving structural relationships, eliminating context poisoning, and producing semantically coherent chunksets that make every downstream RAG component more accurate by default.

Text, Chart & Table — One Document, Fully Resolved

MSCI World Index (USD) Factsheet, Sep 2025 Chunksets 0–5 of 43

Shared root

Shared hierarchy

Leaf (unique to one chunkset)

Text

Image description

Table data

Source: MSCI World Index (USD) factsheet — processed by POMA PrimeCut into 43 chunksets from 85 structural chunks.

Try for Free

What POMA PrimeCut Does Differently
POMA PrimeCut vs Unstructured.io vs Conventional Chunking:

Hierarchical Chunking Compared

Conventional Chunk

an SPDF is one approach to help ensure that the QS regulation is met. Because of its benefits in helping comply with the QS regulation and

cybersecurity, FDA encourages manufacturers to use an SPDF, but other approaches might also satisfy the QS regulation.

### B. Designing for Security

When reviewing premarket submissions, FDA intends to assess device cybersecurity based on a number of factors, including, but not limited to, the

device's ability to provide and implement the security objectives below throughout the device architecture. The security objectives below generally

may apply broadly to devices within the scope of this guidance, including, but not limited to, devices containing artificial intelligence (AI) and

cloud-based services.

Security Objectives:

• Authenticity, which includes integrity;

• Authorization:

• Availability:

• Confidentiality; and

• Secure and timely updatability and patchability.

Premarket submissions should include information that describes how the above security objectives are addressed by and integrated into the device

design. The extent to which security requirements, architecture, supply chain, and implementation are needed to meet these objectives will depend on

but may not be limited to:

- The device’s intended use, indications for use, and reasonably foreseeable misuse;

- The presence and functionality of its electronic data interfaces;

• Its intended and actual environment of use:

- The risks presented by cybersecurity vulnerabilities;

- The exploitability of the vulnerabilities; and

- The risk of patient harm due to vulnerability exploitation.

SPDF processes aim to reduce the number and severity of vulnerabilities and thereby reduce the exploitability of a medical device system and the

associated risk of patient harm. Because exploitation of known vulnerabilities or weak cybersecurity controls should be considered reasonably

foreseeable failure modes for medical device systems, these factors should be addressed in the device design. $ ^{19} $ One of the key benefits of

using an SPDF is that a medical device system is more likely to be secure by design, such that the device is designed from the outset to be secure

within its system and/or network of use throughout the device lifecycle.

### C. Transparency

A lack of cybersecurity information, such as information necessary to integrate the device into the use environment, as well as information needed

by users to maintain the medical device system’s cybersecurity over the device lifecycle, has the potential to affect the safety and effectiveness

of a device. In order to address these concerns, it is important for device users to

## Contains Nonbinding Recommendations

have access to information pertaining to the device’s cybersecurity controls, potential risks to the medical device system, and other relevant

very long
spanning multiple sections
isolating heading from subsequent content

Unstructured.io Chunk

• The device’s intended use, indications for use, and reasonably foreseeable misuse;

• The presence and functionality of its electronic data interfaces;

• Its intended and actual environment of use; 18

• The risks presented by cybersecurity vulnerabilities;

• The exploitability of the vulnerabilities; and

• The risk of patient harm due to vulnerability exploitation.

No section indication
Random artifacts as part of apparent main text (“18”)
No context/positioning within the document

POMA Chunk(Set): Full Context Path

Cybersecurity Guidance for Medical Devices: Quality Systems and Premarket Submission

Requirements

	[…]

	Guidance for Industry and Food and Drug Administration Staff

		[…]

		Contains Nonbinding Recommendations outline

			[…]

			B. Designing for Security

			When reviewing premarket submissions, FDA intends to assess device cybersecurity

			based on a number of factors, including, but not limited to, the device's ability

			to provide and implement the security objectives below throughout the device

			architecture.

				[…]

				The extent to which security requirements, architecture, supply chain, and

				implementation are needed to meet these objectives will depend on but may not

				be limited to:

					[…]

					• Its intended and actual environment of use:

						[…]

						• The risk of patient harm due to vulnerability exploitation.

It just works

What We Do Differently - Explained

POMA-OfficeQA Benchmark: RAG Chunking Performance Results
PrimeCut Achieves 77% Fewer Tokens than Conventional Chunking — Without Losing Context Accuracy

Most RAG benchmarks measure whether the right chunks were retrieved — not whether the resulting context is actually useful to the LLM. Token waste, attention gaps, and context poisoning go unmeasured.

POMA-OfficeQA asked a different question: How many tokens do I need, to achieve 100% context recall?

Token comparison benchmark between PrimeCut and baselines

View Benchmark on GitHub Explore Chunking Strategies

Two Ways to Use POMA PrimeCut Depending on Your Budget
PrimeCut Eco and PrimeCut Pro

PrimeCut ships in two tiers. Both preserve document hierarchy. Both eliminate context poisoning. The difference is in how they handle visual content and compute — matched to the complexity of your documents and your budget.

PrimeCut Eco

Simple hierarchical chunking for well-structured documents.

0.003 € / page

Features

Rapid document hierarchy detection
Semantically bounded chunks with ancestor context inheritance
Ready-to-embed chunksets
Images and visual elements extracted and placeholdered
Optimized for low cost
Simple Title Generation

Best for

Large knowledge bases with limited budget
Simple and well-structured content

PrimeCut Pro

Full structural and visual intelligence for complex, mixed-content documents.

0.03 € / page

Features

Full document hierarchy parsing
Semantically bounded and neighbour-aware chunks with ancestor context inheritance
Context-aware ready-to-embed chunksets
Full AI processing — figures, tables, and images parsed as semantic content
Visual elements both extracted and converted to retrievable, context-aware textual chunks
Optimized for multimodal accurate hierarchical textual representation of complex content

Best for

High-stakes domains with complex documents (legal & regulatory, financial & insurance, medical, engineering)
High need for search accuracy
Multimodal retrieval based on comparable semantic and hierarchical representations

Try for Free

Integration into Your RAG Pipeline
LangChain Document Chunking and RAG Pipeline Integration — No Architectural Overhaul

PrimeCut sits at the ingestion layer of your RAG pipeline — upstream of your vector database, your embedding model, and your retrieval logic. It receives documents. It returns structured, hierarchically-bounded chunksets.

The SDK is lightweight. The API is flexible. And because PrimeCut's output schema is consistent across both configurations.

Compatible with:

LLMs

OpenAI

Anthropic

Other leading LLMs

Vector Databases

Qdrant

Pinecone

Weaviate

Other vector databases

Frameworks

LangChain

LlamaIndex

Custom RAG implementations

The SDK is lightweight. The API is flexible. And PrimeCut's output schema is consistent across both configurations.

Get Started

Try for Free

Read the Documentation

Processing at scale? Let's talk.

€0.003 / page on PrimeCut Eco ⬣ €0.03 / page on PrimeCut Pro ⬣ Volume pricing available ⬣ Free tier: 1,000 pages for documents up to 250 pages

Context Poisoning Starts at Ingestion RAG Chunking Strategies Fail Without Hierarchical Ingestion

Context Poisoning

Structural Signal Loss

Boundary Blindness

How Hierarchical Ingestion and Chunking Works: Structure to RAG-Ready Chunks How POMA PrimeCut Sees Your Document Hierarchy

Text, Chart & Table — One Document, Fully Resolved

What POMA PrimeCut Does Differently POMA PrimeCut vs Unstructured.io vs Conventional Chunking: Hierarchical Chunking Compared

Conventional Chunk

Unstructured.io Chunk

POMA Chunk(Set): Full Context Path

POMA-OfficeQA Benchmark: RAG Chunking Performance Results PrimeCut Achieves 77% Fewer Tokens than Conventional Chunking — Without Losing Context Accuracy

Two Ways to Use POMA PrimeCut Depending on Your Budget PrimeCut Eco and PrimeCut Pro

PrimeCut Eco

PrimeCut Pro

Integration into Your RAG Pipeline LangChain Document Chunking and RAG Pipeline Integration — No Architectural Overhaul

Get Started

Context Poisoning Starts at Ingestion
RAG Chunking Strategies Fail Without Hierarchical Ingestion

How Hierarchical Ingestion and Chunking Works: Structure to RAG-Ready Chunks
How POMA PrimeCut Sees Your Document Hierarchy

What POMA PrimeCut Does Differently
POMA PrimeCut vs Unstructured.io vs Conventional Chunking:

Hierarchical Chunking Compared

POMA-OfficeQA Benchmark: RAG Chunking Performance Results
PrimeCut Achieves 77% Fewer Tokens than Conventional Chunking — Without Losing Context Accuracy

Two Ways to Use POMA PrimeCut Depending on Your Budget
PrimeCut Eco and PrimeCut Pro

Integration into Your RAG Pipeline
LangChain Document Chunking and RAG Pipeline Integration — No Architectural Overhaul