Skip to content

Grill API Reference

Reference for every /grill/* endpoint on POMA AI API v3.

  • Base URL: https://api.poma-ai.com/v3
  • Auth: Authorization: Bearer <token> for every endpoint. The token may be a login token, an account API key, or a project API key for a project with product: "grill". When you authenticate with account-level credentials, add the X-Project-ID header to select the target project (protected projects reject account credentials — use a project API key).
  • Spec: Swagger UI

For higher-level walkthroughs, see Ingestion, Retrieval, RetrievalContext format, and Document management.

Endpoint summary

MethodPathPurpose
POST/grill/ingestSubmit a file for ingestion into the project's Grill namespace (full pipeline).
POST/grill/ingestEcoSame as /grill/ingest but runs the cheaper Eco pipeline.
POST/grill/searchHybrid search across the namespace; returns prompt-ready context.
POST/grill/searchInDocSame as /grill/search but doc_filter is required.
GET/grill/docsList documents in the namespace.
GET/grill/docs/{docId}Get metadata for one document.
DELETE/grill/docs/{docId}Remove a document's vectors and storage.

The standard /jobs/{job_id}/... endpoints (status, delete) are reused by Grill ingest jobs — they are documented under the main API.


POST /grill/ingest

Create an ingestion job (full pipeline). Either send the file as raw bytes (application/octet-stream, filename carried in Content-Disposition) or point at a public URL with X-Remote-URL and let the server fetch it. Multipart is not accepted.

Request headers

HeaderRequiredNotes
AuthorizationyesBearer <token> — login token, account API key, or project API key (project must have product: "grill").
X-Project-IDconditionalRequired only when authenticating with account-level credentials, to select the target project. Omit when using a project API key.
Content-Typeconditionalapplication/octet-stream for a raw-bytes upload. Not needed when using X-Remote-URL.
Content-Dispositionconditionalattachment; filename="<name>.<ext>". The extension drives parser selection. Not required when using X-Remote-URL.
X-Remote-URLnoA publicly accessible URL to fetch the file from instead of uploading bytes. When set, Content-Disposition and the request body are optional.
X-LabelsnoJSON array of categorical string tags for query-time label filtering. Stored as the document's meta_tags, HMAC'd per-tenant (opaque — the vector DB never sees the plaintext), so matching is equality-only. Max 64 tags, ≤ 128 chars each, ≤ 4096 chars total (else 400). e.g. ["year:1982", "source:treasury"]. Filter at search time with meta_tags_any / meta_tags_all.
X-Meta-Int-1noCustomer-defined integer stored plaintext so it can be range-queried. Recommended convention: Unix epoch seconds. Bounded to the JS-safe int range. Filter with meta_int_1_gte / meta_int_1_lte.
X-Meta-Int-2noSecond customer-defined integer (recommended: revision/version number). Plaintext, range-queryable via meta_int_2_gte / meta_int_2_lte.
X-Unencrypted-StringsnoJSON object of plaintext key→value metadata for wildcard/glob filtering. Stored unencrypted (vendor-visible — do not put secrets/PII here; use X-Labels for sensitive tags). Max 32 keys; keys [A-Za-z0-9_] (case-insensitive) ≤ 64 chars, values ≤ 1024 chars, ≤ 16384 chars total. e.g. {"project": "acme-merger", "dept": "legal"}. Filter at search time with unencrypted_strings_match.
X-Base-URLnoBase URL used to resolve relative image links in the input file.
X-CompletionnoURL (and optional headers) to send a completion webhook to when the job finishes.

X-Labels (opaque) vs. X-Unencrypted-Strings (plaintext). Both attach filterable metadata. X-Labels is HMAC'd — the vector DB never sees the values — so it's safe for sensitive tags, at the cost of equality-only matching (filter via meta_tags_any / meta_tags_all). X-Unencrypted-Strings is stored in plaintext (vendor-visible) precisely so its values can be wildcard-matched (filter via unencrypted_strings_match) — never put secrets or PII there. (PrimeCut also accepts X-Labels.)

Response (201 Created)PublicJob

json
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "created_at": "2026-04-30T10:00:00Z",
  "properties": {
    "file": { "filename": "manual.pdf", "size": 1048576 }
  }
}

Status codes

StatusMeaning
201Job created. Poll status with GET /jobs/{job_id}/status or stream via GET /status/v1/jobs/{job_id}.
400Bad request (no X-Remote-URL and missing Content-Disposition, unsupported MIME, or empty body).
401Missing/invalid Bearer token.
403Multipart submitted, or the project's product is primecut.
500Server error.

See Ingestion for the full lifecycle.


POST /grill/ingestEco

Identical request shape to /grill/ingest — same headers (Content-Disposition / X-Remote-URL, X-Labels, X-Meta-Int-1, X-Meta-Int-2, X-Unencrypted-Strings, X-Base-URL, X-Completion), same application/octet-stream body, same 201 → PublicJob response and status codes. The only difference is that it runs Grill's Eco pipeline — a cheaper, lighter-weight ingest tier. Documents land in the same namespace and are searchable through /grill/search exactly like full-pipeline ingests.

Use /grill/ingestEco for high-volume or cost-sensitive corpora where the full pipeline's extraction depth isn't required; use /grill/ingest when you want maximum extraction quality.


POST /grill/search

Hybrid retrieval. Returns a prompt-ready RetrievalContext.

Request bodyGrillSearchRequest

FieldTypeRequiredDescription
querystringyesNatural-language query.
doc_filterstringnoOptional single-document id filter. Use /grill/searchInDoc when this is required.
doc_idsarray of stringnoRestrict search to this set of document ids.
exclude_doc_idsarray of stringnoDoc ids to exclude from results (max 100). Useful in agent loops to avoid re-citing docs already shown.
meta_tags_anyarray of stringnoMatch documents carrying any of these labels (set at ingest via X-Labels).
meta_tags_allarray of stringnoMatch documents carrying all of these labels.
meta_int_1_gte / meta_int_1_lteintegernoRange filter on X-Meta-Int-1. gte = greater-than-or-equal (≥) lower bound, lte = less-than-or-equal (≤) upper bound; both inclusive, either may be sent alone.
meta_int_2_gte / meta_int_2_lteintegernoSame, for X-Meta-Int-2.
unencrypted_strings_matchobject (string→string)noMatch on plaintext X-Unencrypted-Strings: a key→glob-pattern map. Case-insensitive Unix glob (* = any run, ? = one char; a pattern with no wildcard is an exact match). Multiple keys AND together.
return_assetsbooleannoReturn cited docs' figures (and tables, where available) in an assets field keyed by doc_id; images are base64 data URIs, referenced as [IMAGE: name] in the context.
formatstringnoprompt_ready (default) renders the XML+Markdown context block; json returns structured ranked hits instead.
min_relevancefloat 0..1noRelevance floor (default 0.3). Hits below it are dropped. Higher = stricter, lower = more permissive.
min_top_relevancefloat 0..1noFloor for the top hit. If the best hit scores below this, the entire result set comes back empty.
target_tokensinteger 100..64000noSoft token budget — the typical answer size (default 6000). Hits are admitted best-first up to this. The knob to size a response.
max_tokensinteger 100..64000noHard ceiling (default 16000). Grill expands past target_tokens toward it only for tight multi-document clusters; it's raised up to target_tokens if set lower, so it can't shrink a response on its own.
expand_tightnessfloat 0..1noHow aggressively the engine expands context around each hit.
retrieval_tierstringnoPer-query retrieval tier — standard (fusion only) or advanced (adds a reranker). See Retrieval tiers.
premiumbooleannoLegacy flag — true is equivalent to retrieval_tier: "advanced". Prefer retrieval_tier.

Result count is bounded server-side by relevance and the token budget — there is no top_k parameter.

Response (200 OK)RetrievalContext (default, format: "prompt_ready")

json
{
  "context": "<context><doc id=\"\">…</doc></context>",
  "query": "How did operating margin change year over year?",
  "namespace": "project_docs_rag_4f2",
  "result_count": 4,
  "tokens_estimated": 5820,
  "results_dropped": 2,
  "detected_lang": "english",
  "mode": "advanced",
  "search_units": 1
}

context is the prompt-ready block (see RetrievalContext format for the grammar) and the field you'll use 99% of the time; the siblings are metadata — result_count / results_dropped (how many hits made it / were dropped for budget), tokens_estimated (rendered size), detected_lang, mode (retrieval tier used), and search_units (billing). When return_assets: true and a cited doc has figures/tables, an assets object (keyed by doc_id) is also present; images are base64 data URIs referenced as [IMAGE: name] inside context.

With format: "json" the response is a SearchResponse instead: a results array of per-doc hits (doc_id, content, title, canonical_url, pages, score, scores.{ann,bm25,rrf,reranker}, …) plus the same query / namespace / result_count / detected_lang / results_dropped metadata — no rendered context.

Status codes

StatusMeaning
200OK.
400Validation error or upstream Grill 400.
401Missing/invalid token, or upstream 401.
403Upstream Grill 403.
404doc_filter references an unknown doc.
502Other upstream Grill / proxy error.
503Cannot reach the Grill backend.

POST /grill/searchInDoc

Same shape as /grill/search, but doc_filter is required and must be non-empty. Use this when you want server-side enforcement that retrieval stays inside one document.

Request bodyGrillSearchInDocRequest

FieldTypeRequiredDescription
querystringyesNatural-language query.
doc_filterstringyesDocument id to restrict search to.
return_page_imagesbooleannoDeprecated / not available — no-op today (page screenshots will be served via a dedicated endpoint, not inline).

It also accepts every optional field from /grill/searchdoc_ids, exclude_doc_ids, meta_tags_any / meta_tags_all, the meta_int_* range filters, unencrypted_strings_match, return_assets, format, min_relevance, min_top_relevance, target_tokens, max_tokens, expand_tightness, retrieval_tier, and premium — with the same semantics.

Same as /grill/search: result count is bounded by relevance and the token budget; there is no top_k.

Response (200 OK)RetrievalContext (identical to /grill/search).

Status codes

StatusMeaning
200OK.
400Missing query, missing/empty doc_filter, or upstream 400.
401Missing/invalid token.
403Upstream Grill 403.
404Doc not found in this project's namespace.
502Upstream / proxy error.
503Grill backend unreachable.

GET /grill/docs

List documents in the project's namespace.

Response (200 OK)ListDocsResponse

json
{
  "namespace": "project_docs_rag_4f2",
  "total_documents": 3,
  "documents": [ DocInfo,  ]
}
FieldTypeDescription
namespacestringProject namespace identifier in the Grill backend.
total_documentsintegerCount of documents in the namespace.
documentsarrayDocInfo entries (see below).

Status codes

StatusMeaning
200OK.
401Missing/invalid token.
403Upstream Grill 403.
404Upstream Grill 404.
502 / 503Upstream issues.

GET /grill/docs/{docId}

Fetch metadata for one document.

Path parameters

NameTypeDescription
docIdstringThe exact doc_id returned by /grill/docs.

Response (200 OK)DocInfo

FieldTypeDescription
doc_idstringStable identifier within the namespace.
titlestringDetected title (filename fallback).
languagestringBCP-47 language code.
filenamestringOriginal ingest filename.
canonical_urlstringCitation URL stamped at ingest by Core (e.g. the resolved target of a short link / redirect when ingesting from a URL). Empty when not applicable.
pagesintegerSource page count.
chunkset_countintegerChunksets produced.
chunk_countintegerChunks produced.
image_countintegerDetected figures/images.
table_countintegerDetected tables.
ingested_atstring (RFC 3339)Job completion time.
source_job_idstringjob_id of the originating /grill/ingest.
bm25_statestringState of the document's BM25 (lexical) index.

Status codes

StatusMeaning
200OK.
400Empty docId.
401Missing/invalid token.
403Upstream Grill 403.
404Doc not found in this namespace.
502 / 503Upstream issues.

DELETE /grill/docs/{docId}

Remove a document's vectors and storage from the namespace. Project, key, and other documents are unaffected.

Path parameters

NameTypeDescription
docIdstringDocument to delete.

Response (200 OK)DeleteDocResponse

FieldTypeDescription
doc_idstringEchoes the deleted docId.
vectors_deletedintegerVector count removed from the namespace.
storage_deletedbooleanWhether stored bytes (assets, page images) were removed.

Status codes

StatusMeaning
200OK.
400Empty docId.
401Missing/invalid token.
403Upstream Grill 403.
404Doc not found in this namespace.
502 / 503Upstream issues.

Schemas (canonical shapes)

These are the same definitions used by the OpenAPI v3 spec. All are JSON.

GrillSearchRequest

jsonc
{
  "query": "string (required)",
  "doc_filter": "string",         // restrict to one doc
  "doc_ids": ["string"],          // restrict to a set of docs
  "exclude_doc_ids": ["string"],  // max 100
  "meta_tags_any": ["string"],       // match any label (X-Labels)
  "meta_tags_all": ["string"],       // match all labels
  "meta_int_1_gte": 0, "meta_int_1_lte": 0,  // X-Meta-Int-1 range
  "meta_int_2_gte": 0, "meta_int_2_lte": 0,  // X-Meta-Int-2 range
  "unencrypted_strings_match": { "path": "legal/contracts/*" }, // glob on X-Unencrypted-Strings
  "return_assets": false,
  "return_page_images": false,    // deprecated, no-op
  "format": "prompt_ready",       // or "json" for structured hits
  "min_relevance": 0.3,           // relevance floor, 0..1 (default 0.3)
  "min_top_relevance": 0.0,       // floor for the top hit; below → empty set
  "target_tokens": 6000,          // soft budget / answer size (default 6000, 100..64000)
  "max_tokens": 16000,            // hard ceiling (default 16000, 100..64000)
  "expand_tightness": 0.0,        // context-expansion aggressiveness, 0..1
  "retrieval_tier": "standard",   // or "advanced" (reranker)
  "premium": false                // legacy: true == retrieval_tier "advanced"
}

Every field except query is optional. min_relevance tunes precision (higher = stricter); target_tokens sizes the answer; max_tokens is a ceiling that only matters for tight multi-document clusters (raised up to target_tokens if set lower). meta_tags_*, meta_int_*, and unencrypted_strings_match filter against metadata set at ingest (all combined via AND). retrieval_tier picks the reranking tier — see Retrieval tiers.

GrillSearchInDocRequest

Same fields as GrillSearchRequest, except both query and doc_filter are required and doc_filter must be non-empty.

RetrievalContext

jsonc
{
  "context": "string (XML+Markdown — see RetrievalContext format page)",
  "query": "string",
  "namespace": "string",
  "result_count": 0,        // hits included in context
  "tokens_estimated": 0,    // rendered size of context
  "results_dropped": 0,     // hits dropped for the token budget
  "detected_lang": "string",// BM25 query language; null on low confidence
  "mode": "string",         // retrieval tier actually used
  "search_units": 0,        // billing units
  "assets": {               // present only when return_assets=true and assets exist
    "<doc_id>": { "images": [], "tables": [] }
  }
}

format: "json" instead returns a SearchResponse: { results: [{ doc_id, content, title, canonical_url, pages, chunk_indices, score, scores: { ann, bm25, rrf, reranker } }], query, namespace, result_count, detected_lang, results_dropped }.

DocInfo

jsonc
{
  "doc_id": "string (required)",
  "title": "string",
  "language": "string",
  "filename": "string",
  "canonical_url": "string",   // citation URL, set by Core at ingest
  "pages": 0,
  "chunkset_count": 0,
  "chunk_count": 0,
  "image_count": 0,
  "table_count": 0,
  "ingested_at": "RFC 3339 timestamp",
  "source_job_id": "uuid",
  "bm25_state": "string"
}

ListDocsResponse

jsonc
{
  "namespace": "string",
  "total_documents": 0,
  "documents": [ /* DocInfo */ ]
}

DeleteDocResponse

jsonc
{
  "doc_id": "string",
  "vectors_deleted": 0,
  "storage_deleted": true
}

PublicJob (for /grill/ingest)

jsonc
{
  "job_id": "uuid",
  "created_at": "RFC 3339 timestamp",
  "status": { "job_id": "…", "status": "pending|processing|done|failed", "code": 200 },
  "properties": { "file": { "filename": "string", "size": 0 }, "base_url": "string" }
}

See also