Chunking strategy landscape
If you are trying to find the right chunking strategy, chunk size, and chunk overlap, the first useful step is to understand the landscape before jumping into tooling defaults.
Chunking is splitting large text into smaller units so embedding models do not truncate your input and retrieval returns self-contained pieces that are actually useful for search and answering.
The sweet spot is chunks small enough for precise retrieval, but complete enough to read sensibly on their own. What confuses a human also confuses the model.
Which chunking strategies are common?
Ever since RAG was first developed, researchers have experimented with a wide range of chunking strategies. Some approaches are simplistic, others are much more advanced, but nearly all of them are still trying to answer the same question: where should the boundaries go?
No chunking
Also known as: full-document embedding, whole-document embedding, single chunk
What it is: You embed an entire document as one vector and retrieve whole documents. This only works when your documents are naturally small enough to stay inside model limits.
Upsides
- Simplest pipeline: no boundary bugs and no overlap tuning.
- Works when the stored units are already tiny.
Downsides
- Often impossible because embedding models have token limits.
- Whole-document vectors dilute fine-grained facts.
Fixed-size chunking
Also known as: token chunking, length-based chunking, naive chunking
What it is: Split every N tokens, characters, or words, optionally with overlap to reduce boundary loss.
Upsides
- Fast, deterministic baseline.
- Often good enough for a first prototype or messy text.
Downsides
- Can cut mid-sentence or mid-idea.
- Overlap helps, but it increases duplication and index size.
Chunk size and overlap
- A common baseline is 512 tokens with 50 to 100 tokens of overlap.
- Some teams test from 128 or 256 up through 512 or 1024 depending on how much context the question needs.
- Best fixed chunk size can swing widely by dataset and embedding model, so defaults are only starting points.
Sliding-window chunking
Also known as: windowed chunking, stride-based overlap
What it is: Fixed-size windows are marched across the text with heavy overlap by design.
Upsides
- Strong continuity across boundaries.
- Facts near boundaries appear in multiple chunks.
Downsides
- Lots of near-duplicates.
- Retrieval gets noisy unless you dedupe or rerank.
Sentence or paragraph chunking
Also known as: sentence splitting, passage splitting
What it is: Split on sentence or paragraph boundaries, often with a max-size cap.
Upsides
- Natural boundaries.
- Avoids mid-sentence cuts.
Downsides
- Sentences are often too small to answer questions well.
- Retrieval may need a larger top-k and more prompt stuffing.
Recursive delimiter chunking
Also known as: recursive chunking, recursive character splitting
What it is: Try higher-level separators first, such as paragraph breaks, then fall back to smaller separators only when needed.
Upsides
- Good general-purpose default.
- Respects structure better than blind fixed-size splitting.
Downsides
- Separator lists become a maintenance problem across different formats.
- Still fundamentally slices text into isolated chunks.
Structure-aware chunking
Also known as: document-structure chunking, header-based chunking, element packing
What it is: Use the document's native structure such as headings, pages, tags, tables, and parsed elements to decide safe boundaries.
Format-structure chunking
Split Markdown by headings, HTML by tags, and code by functions or classes so units align with what the author meant.
Partition-then-pack
Instead of splitting raw text directly, first parse the document into semantic elements such as paragraphs, list items, titles, and tables. Then pack consecutive elements into chunks up to a max size.
Upsides
- Produces fewer nonsense splits.
- Keeps titles, tables, and lists more coherent.
Downsides
- Depends heavily on extraction quality.
- PDFs and scanned documents can still fail upstream.
Semantic similarity chunking
Also known as: semantic chunking, meaning-based chunking
What it is: Use embeddings to detect topic shifts and cut where meaning changes rather than where characters hit a limit.
Upsides
- Usually improves coherence over delimiter-only methods.
- Can help precision in dense technical text.
Downsides
- Costs more at ingest time.
- Still assumes the right answer is a contiguous slice of text.
LLM-based or agentic chunking
Also known as: proposition extraction, LLM-decided chunking
What it is: An LLM chooses boundaries, rewrites content into retrieval-friendly units, or selects among chunking strategies.
Upsides
- Can align chunks with the downstream QA task.
- Often improves semantic quality on complicated documents.
Downsides
- Higher cost and latency.
- More nondeterminism.
Neural chunking
Also known as: learned boundary detection
What it is: A trained model predicts good boundaries based on learned coherence patterns.
Upsides
- Can outperform hand-built heuristics in the right domain.
Downsides
- Harder to debug.
- Can fail quietly under domain shift.
Late chunking
Also known as: embed first, split second
What it is: Embed the full document with a long-context embedding model and derive chunk embeddings afterward so each chunk embedding stays aware of surrounding context.
Upsides
- Directly attacks context loss from independently embedded chunks.
Downsides
- Requires long-context embedding infrastructure and token-level outputs.
Hierarchical chunking
Also known as: parent-child chunking, multi-level chunking
What it is: Create multiple chunk layers such as large section-level chunks and smaller detail chunks, then retrieve coarse-to-fine when needed.
Upsides
- Handles broad and specific questions without forcing one chunk size.
Downsides
- Adds complexity in indexing and retrieval.
- Still breaks text into separate units.
TL;DR
Continue with common failure modes or jump straight to POMA chunksets.
Continue reading
- Common failure modes — why even advanced strategies lose context
- POMA chunksets — a non-breaking alternative to traditional chunking
- Strategy comparison table — all 15 strategies compared side by side
- The full chunking guide — the complete deep dive