Skip to content
python
from poma.integrations.langchain import (
    PomaCheatsheetRetrieverLC,
    PomaChunksetSplitter,
    PomaFileLoader,
)

PomaFileLoader

python
PomaFileLoader(input_path: str | Path)

Load one file or every supported file under a directory into LangChain Document objects.

Method:

  • load() -> list[Document]

Behavior notes:

  • Each output Document includes metadata["source_path"] and metadata["doc_id"].
  • PDF files are represented with empty page_content; the actual ingestion happens later through PrimeCut.
  • Unsupported or unreadable binary files are skipped.

PomaChunksetSplitter

python
PomaChunksetSplitter(
    client: PrimeCut,
    *,
    verbose: bool = False,
    **kwargs,
)

Call the POMA API for each input document and return chunkset Document objects.

Method:

  • split_documents(documents: Iterable[Document]) -> list[Document]

Behavior notes:

  • Input documents must include a valid metadata["source_path"].
  • Output documents store chunkset text in page_content.
  • Output metadata includes doc_id, chunkset_index, chunkset, chunks, and source_path.
  • split_text(...) is intentionally not implemented; use split_documents(...).

PomaCheatsheetRetrieverLC

python
PomaCheatsheetRetrieverLC(
    vector_store: VectorStore,
    *,
    top_k: int = 6,
    **kwargs,
)

Wrap a LangChain vector store and return one cheatsheet Document per source document.

Behavior notes:

  • Retrieval uses vector_store.similarity_search(query, k=top_k).
  • Hits are grouped by doc_id, then converted into one cheatsheet Document per source document.
  • Use .invoke(query) through the standard LangChain retriever interface.