python
from poma.integrations.llamaindex import (
PomaCheatsheetRetrieverLI,
PomaChunksetNodeParser,
PomaFileReader,
)PomaFileReader
python
PomaFileReader()Load one file or every supported file under a directory into LlamaIndex Document objects.
Method:
load_data(input_path: str | Path) -> list[Document]
Behavior notes:
- Each output
Documentincludesmetadata["source_path"]andmetadata["doc_id"]. - PDF files are represented with empty
text; the actual ingestion happens later throughPrimeCut. - Unsupported or unreadable binary files are skipped.
PomaChunksetNodeParser
python
PomaChunksetNodeParser(*, client: PrimeCut)Call the POMA API for each input document and return chunkset nodes.
Use the standard parser entrypoint:
get_nodes_from_documents(documents, show_progress: bool = False) -> list[BaseNode]
Behavior notes:
- Input documents must include a valid
metadata["source_path"]. - Output nodes are
TextNodevalues containing chunkset text. - Output metadata includes
doc_id,chunkset_index,chunkset,chunks, andsource_path. - The parser excludes metadata fields from embeddings so only chunkset content is embedded.
PomaCheatsheetRetrieverLI
python
PomaCheatsheetRetrieverLI(base: BaseRetriever)Wrap an existing LlamaIndex retriever and turn grouped hits into cheatsheet nodes.
Methods:
as_query_engine(**kwargs)- standard retriever
.retrieve(...)flow
Behavior notes:
- Retrieval groups hits by
doc_id. - Each grouped result becomes one cheatsheet
TextNode. - Returned
NodeWithScorevalues keep the best score seen for that document.