Best way to chunk PDF documents for RAG?

You'll want to use document-structure-aware chunking that respects headers, sections, and page boundaries rather than fixed-size splitting. A well-suited tool parses the PDF structure first, then chunks by logical units (both Unstructured.io and POMA AI fit this description). For PDFs with tables, use table-aware chunking to avoid mangling rows and columns. Hierarchical approaches like POMA AI preserve the full context path (e.g., chapter → section → paragraph) so retrieved chunks arrive with their structural lineage intact.

Best text splitter library for RAG?

LangChain and LlamaIndex are commonly used, if not necessarily optimal. Chonkie offers more specialized chunkers (semantic, code-aware, late chunking) with cleaner APIs. Unstructured.io is a solid option for multi-format document parsing with element-based chunking. For high-stakes use cases requiring preserved document hierarchy, POMA AI's chunkset approach eliminates context loss by keeping sentences linked to their structural path, rather than breaking text into isolated fragments.

LangChain text splitter alternatives?

Popular options include Chonkie (semantic, neural, and late chunking), Unstructured.io (element-based parsing for PDFs/HTML/DOCX with by_title and by_similarity modes), and POMA AI (hierarchical chunksets that preserve document structure without breaking text). These alternatives often provide better semantic coherence, specialized chunkers for tables and code, and more control over chunk boundaries than LangChain's recursive character splitter.

Semantic chunking vs. recursive chunking?

Recursive chunking splits text by trying paragraph breaks first, then sentences, then words. This is fast and deterministic, but purely syntactic. Semantic chunking embeds sentences and splits where similarity drops, creating more topically coherent chunks. The tradeoff is higher compute costs. Recursive chunking is commonly used for articles and general content, while semantic chunking is often preferred for dense technical or legal text where topic boundaries matter. Both still break text into isolated fragments, whereas hierarchical approaches like POMA AI chunksets avoid this entirely.

Optimal chunk size and overlap for RAG?

Start with 512 tokens and 10–20% overlap as a baseline, then tune based on retrieval evaluation. Smaller chunks (256–512) suit precise factoid retrieval, whereas larger chunks (1000+) help when answers span multiple sentences. Overlap reduces boundary loss but increases storage and can inflate duplicate retrieval. The real issue is that fixed parameters force a tradeoff. Hierarchical chunking (e.g., POMA AI chunksets) sidesteps this by preserving context paths rather than cutting text.

RAG Chunking Guide — Where to Start

by Dr. Alexander Kihm (POMA AI)

Retrieval-augmented generation (RAG) has been used as a way to turbocharge large language models (LLMs) since the early 2020s. By allowing LLMs to draw on information from sources not included in training libraries, RAG solves the problems inherent to having a static knowledge base.

But just as you cannot instantly absorb all the information in a book by glancing at it, RAG cannot magically transfer all the relevant information from a source document into an LLM pipeline. The solution is called chunking.

Chunking is splitting large text into smaller units so embedding models do not truncate your input and retrieval returns self-contained pieces that are actually useful for search and answering. The challenge is that the sweet spot is hard to hit: chunks must stay small enough for precise retrieval and complete enough to make sense on their own.

Start with these docs

What this guide is meant to answer

Which chunking strategies are common in modern RAG systems.
How chunk size and overlap shape retrieval quality and token cost.
Why most chunking methods still fail in similar ways.
How POMA chunksets and cheatsheets change the retrieval unit itself.

TL;DR

For general-purpose use that permits tradeoffs of accuracy and versatility in exchange for lowered compute costs, recursive delimiter chunking is a popular choice. When the stakes are higher, POMA AI chunksets are designed to preserve hierarchy instead of returning isolated text fragments.

Recommended path

If you want the quick structural version, go straight to the Chunking learning section. If you want the big-picture narrative first, use this page as the entry point and then move through the four topic pages above in order.

Ready to try hierarchical chunking?

Try PrimeCut for free — upload a document and inspect the chunks
PrimeCut product page — how it works
Pricing — from €0.003/page

RAG Chunking Guide — Where to Start ​

Start with these docs ​

What this guide is meant to answer ​

Recommended path ​

Ready to try hierarchical chunking? ​

RAG Chunking Guide — Where to Start

Start with these docs

What this guide is meant to answer

Recommended path

Ready to try hierarchical chunking?