LangChain Integration
Most LangChain RAG pipelines start with RecursiveCharacterTextSplitter — fast and simple, but it loses document structure, breaks tables mid-row, and requires manual chunk size and overlap tuning. POMA's LangChain integration replaces the splitter step with hierarchical chunksets that preserve full document context, while keeping the rest of your LangChain pipeline unchanged.
The integration provides drop-in replacements for LangChain's document loading, splitting, and retrieval steps — no need to rewrite your chain.
Installation
Install the integration:
pip install 'poma[langchain]'The LangChain integration gives you three helpers:
PomaFileLoaderto load files from a pathPomaChunksetSplitterto turn documents into POMA chunkset documentsPomaCheatsheetRetrieverLCto wrap a LangChain vector store and return cheatsheet documents
Chunk documents with PrimeCut
from poma import PrimeCut
from poma.integrations.langchain import PomaFileLoader, PomaChunksetSplitter
client = PrimeCut()
documents = PomaFileLoader("./docs").load()
splitter = PomaChunksetSplitter(client, verbose=True)
chunkset_docs = splitter.split_documents(documents)
print(len(chunkset_docs))
print(chunkset_docs[0].metadata.keys())Each output Document stores the chunkset text in page_content and includes the source chunks in metadata. The splitter expects each input document to carry a valid metadata["source_path"].
Add the chunksets to your vector store
# Replace this with your preferred LangChain vector store.
vector_store = ...
vector_store.add_documents(chunkset_docs)Retrieve cheatsheets instead of raw chunksets
from poma.integrations.langchain import PomaCheatsheetRetrieverLC
retriever = PomaCheatsheetRetrieverLC(vector_store, top_k=4)
cheatsheet_docs = retriever.invoke("How do I authenticate?")
print(cheatsheet_docs[0].page_content)PomaCheatsheetRetrieverLC groups hits by document and returns one cheatsheet Document per document.
Why replace RecursiveCharacterTextSplitter?
LangChain's default text splitter cuts documents into fixed-size fragments that lose structural context — section headers get separated from content, tables break mid-row, and overlap inflates your index with near-duplicates. POMA's PomaChunksetSplitter preserves the full document hierarchy as chunksets, so every retrieved fact arrives with its lineage (chapter → section → paragraph). The result: more accurate retrieval with typically 77% fewer tokens.
For a detailed comparison of all chunking strategies, see The Ultimate Guide to RAG Chunking Strategies.
Continue reading
- LlamaIndex integration — same approach for LlamaIndex pipelines
- Quickstart — get started with PrimeCut in 4 lines
- Chunking strategies comparison — RecursiveCharacterTextSplitter vs. all alternatives
- Pricing — from €0.003/page