Skip to content

PrimeCut.ingest(...) and PrimeCut.collect(...) return PomaResult.

Result shape

python
from poma import PrimeCut

client = PrimeCut()
result = client.ingest("example.pdf")

print(type(result.chunks[0]).__name__)
print(type(result.chunksets[0]).__name__)
print(result.images.keys())

PomaResult contains:

  • chunks: a list of PomaChunk
  • chunksets: a list of PomaChunkSet
  • images: a dictionary of extracted images as base64 data URIs

What's inside a .poma archive?

For the full embedded archive breakdown, see PomaArchive. That reference page includes the interactive archive explorer and shows which files are core to the SDK versus broader processing artifacts.

Reopen a saved .poma archive

python
from poma import PomaArchive

archive = PomaArchive(path="store/example.poma")
result = archive.unpack()

You can also build PomaArchive from in-memory bytes:

python
archive = PomaArchive(data=raw_poma_bytes)
result = archive.unpack()

You can also unpack directly from bytes or a path:

python
from poma import unpack

result = unpack("store/example.poma")

Converting to dictionaries

python
chunk_dict = result.chunks[0].to_dict()
chunkset_dict = result.chunksets[0].to_dict()

Use file_id for document identity. The older tag field is deprecated.