python
from poma.integrations.langchain import (
PomaCheatsheetRetrieverLC,
PomaChunksetSplitter,
PomaFileLoader,
)PomaFileLoader
python
PomaFileLoader(input_path: str | Path)Load one file or every supported file under a directory into LangChain Document objects.
Method:
load() -> list[Document]
Behavior notes:
- Each output
Documentincludesmetadata["source_path"]andmetadata["doc_id"]. - PDF files are represented with empty
page_content; the actual ingestion happens later throughPrimeCut. - Unsupported or unreadable binary files are skipped.
PomaChunksetSplitter
python
PomaChunksetSplitter(
client: PrimeCut,
*,
verbose: bool = False,
**kwargs,
)Call the POMA API for each input document and return chunkset Document objects.
Method:
split_documents(documents: Iterable[Document]) -> list[Document]
Behavior notes:
- Input documents must include a valid
metadata["source_path"]. - Output documents store chunkset text in
page_content. - Output metadata includes
doc_id,chunkset_index,chunkset,chunks, andsource_path. split_text(...)is intentionally not implemented; usesplit_documents(...).
PomaCheatsheetRetrieverLC
python
PomaCheatsheetRetrieverLC(
vector_store: VectorStore,
*,
top_k: int = 6,
**kwargs,
)Wrap a LangChain vector store and return one cheatsheet Document per source document.
Behavior notes:
- Retrieval uses
vector_store.similarity_search(query, k=top_k). - Hits are grouped by
doc_id, then converted into one cheatsheetDocumentper source document. - Use
.invoke(query)through the standard LangChain retriever interface.