LangChain Integration

Most LangChain RAG pipelines start with RecursiveCharacterTextSplitter — fast and simple, but it loses document structure, breaks tables mid-row, and requires manual chunk size and overlap tuning. POMA's LangChain integration replaces the splitter step with hierarchical chunksets that preserve full document context, while keeping the rest of your LangChain pipeline unchanged.

The integration provides drop-in replacements for LangChain's document loading, splitting, and retrieval steps — no need to rewrite your chain.

Installation

Install the integration:

bash

pip install 'poma[langchain]'

The LangChain integration gives you three helpers:

PomaFileLoader to load files from a path
PomaChunksetSplitter to turn documents into POMA chunkset documents
PomaCheatsheetRetrieverLC to wrap a LangChain vector store and return cheatsheet documents

Chunk documents with PrimeCut

python

from poma import PrimeCut
from poma.integrations.langchain import PomaFileLoader, PomaChunksetSplitter

client = PrimeCut()
documents = PomaFileLoader("./docs").load()

splitter = PomaChunksetSplitter(client, verbose=True)
chunkset_docs = splitter.split_documents(documents)

print(len(chunkset_docs))
print(chunkset_docs[0].metadata.keys())

Each output Document stores the chunkset text in page_content and includes the source chunks in metadata. The splitter expects each input document to carry a valid metadata["source_path"].

Add the chunksets to your vector store

python

# Replace this with your preferred LangChain vector store.
vector_store = ...
vector_store.add_documents(chunkset_docs)

Retrieve cheatsheets instead of raw chunksets

python

from poma.integrations.langchain import PomaCheatsheetRetrieverLC

retriever = PomaCheatsheetRetrieverLC(vector_store, top_k=4)
cheatsheet_docs = retriever.invoke("How do I authenticate?")

print(cheatsheet_docs[0].page_content)

PomaCheatsheetRetrieverLC groups hits by document and returns one cheatsheet Document per document.

Why replace RecursiveCharacterTextSplitter?

LangChain's default text splitter cuts documents into fixed-size fragments that lose structural context — section headers get separated from content, tables break mid-row, and overlap inflates your index with near-duplicates. POMA's PomaChunksetSplitter preserves the full document hierarchy as chunksets, so every retrieved fact arrives with its lineage (chapter → section → paragraph). The result: more accurate retrieval with typically 77% fewer tokens.

For a detailed comparison of all chunking strategies, see The Ultimate Guide to RAG Chunking Strategies.

Continue reading

LlamaIndex integration — same approach for LlamaIndex pipelines
Quickstart — get started with PrimeCut in 4 lines
Chunking strategies comparison — RecursiveCharacterTextSplitter vs. all alternatives
Pricing — from €0.003/page

LangChain Integration ​

Installation ​

Chunk documents with PrimeCut ​

Add the chunksets to your vector store ​

Retrieve cheatsheets instead of raw chunksets ​

Why replace RecursiveCharacterTextSplitter? ​

Continue reading ​

LangChain Integration

Installation

Chunk documents with PrimeCut

Add the chunksets to your vector store

Retrieve cheatsheets instead of raw chunksets

Why replace RecursiveCharacterTextSplitter?

Continue reading