Chunking strategy landscape

If you are trying to find the right chunking strategy, chunk size, and chunk overlap, the first useful step is to understand the landscape before jumping into tooling defaults.

Chunking is splitting large text into smaller units so embedding models do not truncate your input and retrieval returns self-contained pieces that are actually useful for search and answering.

The sweet spot is chunks small enough for precise retrieval, but complete enough to read sensibly on their own. What confuses a human also confuses the model.

Which chunking strategies are common?

Ever since RAG was first developed, researchers have experimented with a wide range of chunking strategies. Some approaches are simplistic, others are much more advanced, but nearly all of them are still trying to answer the same question: where should the boundaries go?

No chunking

Also known as: full-document embedding, whole-document embedding, single chunk

What it is: You embed an entire document as one vector and retrieve whole documents. This only works when your documents are naturally small enough to stay inside model limits.

Upsides

Simplest pipeline: no boundary bugs and no overlap tuning.
Works when the stored units are already tiny.

Downsides

Often impossible because embedding models have token limits.
Whole-document vectors dilute fine-grained facts.

Fixed-size chunking

Also known as: token chunking, length-based chunking, naive chunking

What it is: Split every N tokens, characters, or words, optionally with overlap to reduce boundary loss.

Upsides

Fast, deterministic baseline.
Often good enough for a first prototype or messy text.

Downsides

Can cut mid-sentence or mid-idea.
Overlap helps, but it increases duplication and index size.

Chunk size and overlap

A common baseline is 512 tokens with 50 to 100 tokens of overlap.
Some teams test from 128 or 256 up through 512 or 1024 depending on how much context the question needs.
Best fixed chunk size can swing widely by dataset and embedding model, so defaults are only starting points.

Sliding-window chunking

Also known as: windowed chunking, stride-based overlap

What it is: Fixed-size windows are marched across the text with heavy overlap by design.

Upsides

Strong continuity across boundaries.
Facts near boundaries appear in multiple chunks.

Downsides

Lots of near-duplicates.
Retrieval gets noisy unless you dedupe or rerank.

Sentence or paragraph chunking

Also known as: sentence splitting, passage splitting

What it is: Split on sentence or paragraph boundaries, often with a max-size cap.

Upsides

Natural boundaries.
Avoids mid-sentence cuts.

Downsides

Sentences are often too small to answer questions well.
Retrieval may need a larger top-k and more prompt stuffing.

Recursive delimiter chunking

Also known as: recursive chunking, recursive character splitting

What it is: Try higher-level separators first, such as paragraph breaks, then fall back to smaller separators only when needed.

Upsides

Good general-purpose default.
Respects structure better than blind fixed-size splitting.

Downsides

Separator lists become a maintenance problem across different formats.
Still fundamentally slices text into isolated chunks.

Structure-aware chunking

Also known as: document-structure chunking, header-based chunking, element packing

What it is: Use the document's native structure such as headings, pages, tags, tables, and parsed elements to decide safe boundaries.

Format-structure chunking

Split Markdown by headings, HTML by tags, and code by functions or classes so units align with what the author meant.

Partition-then-pack

Instead of splitting raw text directly, first parse the document into semantic elements such as paragraphs, list items, titles, and tables. Then pack consecutive elements into chunks up to a max size.

Upsides

Produces fewer nonsense splits.
Keeps titles, tables, and lists more coherent.

Downsides

Depends heavily on extraction quality.
PDFs and scanned documents can still fail upstream.

Semantic similarity chunking

Also known as: semantic chunking, meaning-based chunking

What it is: Use embeddings to detect topic shifts and cut where meaning changes rather than where characters hit a limit.

Upsides

Usually improves coherence over delimiter-only methods.
Can help precision in dense technical text.

Downsides

Costs more at ingest time.
Still assumes the right answer is a contiguous slice of text.

LLM-based or agentic chunking

Also known as: proposition extraction, LLM-decided chunking

What it is: An LLM chooses boundaries, rewrites content into retrieval-friendly units, or selects among chunking strategies.

Upsides

Can align chunks with the downstream QA task.
Often improves semantic quality on complicated documents.

Downsides

Higher cost and latency.
More nondeterminism.

Neural chunking

Also known as: learned boundary detection

What it is: A trained model predicts good boundaries based on learned coherence patterns.

Upsides

Can outperform hand-built heuristics in the right domain.

Downsides

Harder to debug.
Can fail quietly under domain shift.

Late chunking

Also known as: embed first, split second

What it is: Embed the full document with a long-context embedding model and derive chunk embeddings afterward so each chunk embedding stays aware of surrounding context.

Upsides

Directly attacks context loss from independently embedded chunks.

Downsides

Requires long-context embedding infrastructure and token-level outputs.

Hierarchical chunking

Also known as: parent-child chunking, multi-level chunking

What it is: Create multiple chunk layers such as large section-level chunks and smaller detail chunks, then retrieve coarse-to-fine when needed.

Upsides

Handles broad and specific questions without forcing one chunk size.

Downsides

Adds complexity in indexing and retrieval.
Still breaks text into separate units.

TL;DR

Recursive delimiter chunking remains a pragmatic general-purpose default, while structure-aware and hierarchical approaches are better fits when document structure matters. The next question is not just which boundary rule is best, but what kinds of failures still remain after you choose one.

Continue with common failure modes or jump straight to POMA chunksets.

Continue reading

Common failure modes — why even advanced strategies lose context
POMA chunksets — a non-breaking alternative to traditional chunking
Strategy comparison table — all 15 strategies compared side by side
The full chunking guide — the complete deep dive

Grill

Getting started

Concepts

Reference

PrimeCut

Getting started

Concepts

Reference

Python SDK

Getting started

Concepts

Reference

Integrations

Migration

CLI

MCP

Learn (study path)

Chunking

Ingestion

Chunking strategy landscape

Which chunking strategies are common?

No chunking

Fixed-size chunking

Chunk size and overlap

Sliding-window chunking

Sentence or paragraph chunking

Recursive delimiter chunking

Structure-aware chunking

Format-structure chunking

Partition-then-pack

Semantic similarity chunking

LLM-based or agentic chunking

Neural chunking

Late chunking

Hierarchical chunking

Continue reading

Chunking

Ingestion

Chunking strategy landscape ​

Which chunking strategies are common? ​

No chunking ​

Fixed-size chunking ​

Chunk size and overlap ​

Sliding-window chunking ​

Sentence or paragraph chunking ​

Recursive delimiter chunking ​

Structure-aware chunking ​

Format-structure chunking ​

Partition-then-pack ​

Semantic similarity chunking ​

LLM-based or agentic chunking ​

Neural chunking ​

Late chunking ​

Hierarchical chunking ​

Continue reading ​

Chunking strategy landscape

Which chunking strategies are common?

No chunking

Fixed-size chunking

Chunk size and overlap

Sliding-window chunking

Sentence or paragraph chunking

Recursive delimiter chunking

Structure-aware chunking

Format-structure chunking

Partition-then-pack

Semantic similarity chunking

LLM-based or agentic chunking

Neural chunking

Late chunking

Hierarchical chunking

Continue reading