Skip to content

Why RAG needs chunking after ingestion

Even the cleanest ingestion pipeline only gets you to normalized document representations, not to something a large language model can query efficiently.

Retrieval operates over chunks, not whole documents

RAG assumes that you can quickly retrieve the most relevant context for a query from a large corpus. That means you need an index of vector representations or keyword indexes over smaller chunks, not entire documents.

A single long PDF can lead to hundreds of kilobytes of text. Feeding that directly into a prompt is both impossible and unnecessary.

Embeddings need locality

The relevance signal comes from embeddings. If you generate one embedding for an entire document:

  • You lose locality, because the model cannot distinguish which part of the document addresses the query.
  • You dilute the signal, because one vector now represents unrelated sections.

Token budgets are finite and expensive

A good chunking strategy lets you control chunk size and overlap so the model sees enough context without wasting tokens on noise.

Different document structures demand different chunking policies

A legal contract, a Jupyter notebook, and an FAQ page all need different treatment. Ingestion should surface:

  • document types and templates
  • structural hints such as headings, tables, slides, and code blocks
  • logical units that can become chunk candidates

Chunking then uses that information to respect boundaries instead of cutting through important units blindly.

What chunking adds

Chunking is the process of splitting ingested document content into units that are small enough for efficient indexing and retrieval, yet large enough to retain meaning.

Good chunking strategies use the structure preserved during ingestion to create units that map to human-understandable logical units such as sections, subsections, tables, paragraphs, or code blocks.

If you want the chunking side in more detail, continue with the RAG chunking guide or the chunking learning section.

TL;DR

Ingestion gives you normalized structure. Chunking turns that structure into retrieval-sized units that embeddings, indexes, and prompts can actually use efficiently.

Continue reading