Examples — PrimeCut vs. conventional chunking
PrimeCut chunks documents along their hierarchy: every chunkset is a root-to-leaf path through the structure, so a single retrieval hit carries its full ancestry (heading → subheading → clause) instead of an arbitrary text window. Conventional chunkers split the same document into fixed token windows with overlap — fast and simple, but they routinely break a single argument across three chunks and waste tokens duplicating overlap.
The viewer below lets you flip between the two on seven pre-ingested documents:
- Factsheet MSCI World ETF — financial product factsheet
- DPA POMA AI · OpenAI — Data Processing Agreement (contract)
- IFRS Regulation on Insurance Contracts — accounting standard
- Attention is All You Need — research paper with figures, tables, equations
- Sample Medical History — semi-structured clinical notes
- DSGVO — the German GDPR text, deeply nested regulation
- Insurance Contract — multi-section policy document
Pick POMA Chunksets to see the hierarchical groupings as coloured highlights (click a chunkset to layer it on); pick Conventional · 128 tok or 512 tok to see what naive fixed-window chunking produces from the same text.
How to read it
- POMA Chunksets mode. Each chunkset (the numbered buttons up top) is a group of related chunks that PrimeCut joined along the document's hierarchy. Click one chunkset to highlight the lines it covers; click several to see how they overlap. Notice how chunksets follow section boundaries, table rows, figure captions — the structure the document already has.
- Conventional · 128 / 512 tok modes. The document is re-tokenised with
gpt-tokenizerin your browser and sliced into fixed-size windows with ~25% overlap (32-token overlap for 128, 64-token for 512). Alternating yellow/blue colours show the chunk boundaries; the striped regions are the overlap. Notice where a window cuts across a section, a table row, or the middle of a paragraph — those are the seams retrieval has to deal with later. - Show figures. When enabled, the viewer renders the document's inline figures inside the POMA view. Toggle it off for a denser, text-only read.
What this demo doesn't do
This page is a viewer over pre-ingested PrimeCut output. It doesn't upload, doesn't ingest, doesn't call any POMA API at runtime — everything is static JSON shipped with the docs.
To run PrimeCut on your own documents, use the POMA Console for one-off ingestion, the SDK for scripted pipelines, or the CLI for command-line work.
See also
- Concepts → Ingestion — what happens inside the chunking engine
- Concepts → Cheatsheets — how chunksets combine into prompt-ready snippets
- PrimeCut overview — the product page