POMA Results and .poma Archive Format

PrimeCut.ingest(...) and PrimeCut.collect(...) return PomaResult.

Result shape

python

from poma import PrimeCut

client = PrimeCut()
result = client.ingest("example.pdf")

print(type(result.chunks[0]).__name__)
print(type(result.chunksets[0]).__name__)
print(result.images.keys())

PomaResult contains:

chunks: a list of PomaChunk
chunksets: a list of PomaChunkSet
images: a dictionary of extracted images as base64 data URIs

What's inside a `.poma` archive?

For the full embedded archive breakdown, see PomaArchive. That reference page includes the interactive archive explorer and shows which files are core to the SDK versus broader processing artifacts.

Reopen a saved `.poma` archive

python

from poma import PomaArchive

archive = PomaArchive(path="store/example.poma")
result = archive.unpack()

You can also build PomaArchive from in-memory bytes:

python

archive = PomaArchive(data=raw_poma_bytes)
result = archive.unpack()

You can also unpack directly from bytes or a path:

python

from poma import unpack

result = unpack("store/example.poma")

Converting to dictionaries

python

chunk_dict = result.chunks[0].to_dict()
chunkset_dict = result.chunksets[0].to_dict()

Use file_id for document identity. The older tag field is deprecated.

POMA Results and .poma Archive Format ​

Result shape ​

What's inside a .poma archive? ​

Reopen a saved .poma archive ​

Converting to dictionaries ​

POMA Results and .poma Archive Format

Result shape

What's inside a `.poma` archive?

Reopen a saved `.poma` archive

Converting to dictionaries