Skip to content

Examples — PrimeCut vs. conventional chunking

PrimeCut chunks documents along their hierarchy: every chunkset is a root-to-leaf path through the structure, so a single retrieval hit carries its full ancestry (heading → subheading → clause) instead of an arbitrary text window. Conventional chunkers split the same document into fixed token windows with overlap — fast and simple, but they routinely break a single argument across three chunks and waste tokens duplicating overlap.

The viewer below lets you flip between the two on seven pre-ingested documents:

  • Factsheet MSCI World ETF — financial product factsheet
  • DPA POMA AI · OpenAI — Data Processing Agreement (contract)
  • IFRS Regulation on Insurance Contracts — accounting standard
  • Attention is All You Need — research paper with figures, tables, equations
  • Sample Medical History — semi-structured clinical notes
  • DSGVO — the German GDPR text, deeply nested regulation
  • Insurance Contract — multi-section policy document

Pick POMA Chunksets to see the hierarchical groupings as coloured highlights (click a chunkset to layer it on); pick Conventional · 128 tok or 512 tok to see what naive fixed-window chunking produces from the same text.

How to read it

  • POMA Chunksets mode. Each chunkset (the numbered buttons up top) is a group of related chunks that PrimeCut joined along the document's hierarchy. Click one chunkset to highlight the lines it covers; click several to see how they overlap. Notice how chunksets follow section boundaries, table rows, figure captions — the structure the document already has.
  • Conventional · 128 / 512 tok modes. The document is re-tokenised with gpt-tokenizer in your browser and sliced into fixed-size windows with ~25% overlap (32-token overlap for 128, 64-token for 512). Alternating yellow/blue colours show the chunk boundaries; the striped regions are the overlap. Notice where a window cuts across a section, a table row, or the middle of a paragraph — those are the seams retrieval has to deal with later.
  • Show figures. When enabled, the viewer renders the document's inline figures inside the POMA view. Toggle it off for a denser, text-only read.

What this demo doesn't do

This page is a viewer over pre-ingested PrimeCut output. It doesn't upload, doesn't ingest, doesn't call any POMA API at runtime — everything is static JSON shipped with the docs.

To run PrimeCut on your own documents, use the POMA Console for one-off ingestion, the SDK for scripted pipelines, or the CLI for command-line work.

See also