Retrieval tiers

Every Grill query runs through hybrid retrieval first: dense (cosine) and lexical (BM25) scores are fused into a single candidate ranking. On top of that, you choose a retrieval tier per query, which controls whether a reranker re-scores the candidates before the RetrievalContext is built.

The reranker is a managed Cohere model; you don't configure it. Picking a tier is the only knob — standard for the cheapest, fastest path, and advanced when answer quality on hard or ambiguous queries is worth paying for.

The two tiers

Tier	What it does	When to use it	Billing
`standard`	Fusion only — no reranker. Returns the hybrid cosine + BM25 ranking directly.	Cheap, high-volume lookups where fusion is enough — most chat and knowledge-base queries, latency-sensitive paths.	1 query-credit per query, flat, regardless of how much context is returned.
`advanced`	Adds a Cohere `rerank-v4.0-pro` cross-encoder pass over a candidate pool that scales with your requested token budget.	When answer quality on hard or ambiguous queries matters — large namespaces, near-duplicate documents, "must not miss the one right passage" retrieval.	Per delivered ktoken — see below.

How a tier affects results

standard is the fusion ranking on its own. It's fast and cheap, and for well-chunked documents it's often all you need — the hybrid score already combines semantic and exact-term signals.
advanced adds a second-pass cross-encoder that sees the full query–candidate pair, not just vector similarity. This is where ambiguous or near-duplicate candidates get correctly ordered. The rerank pool scales with your requested token budget, so a larger answer reranks more deeply — a relevant passage that fusion ranked too low still has a chance to surface.

There's no separate "deeper" tier. Going deeper just means requesting a larger token budget with target_tokens: rerank depth and cost both follow the budget. The tier only changes the ranking quality, not the response shape — min_relevance, target_tokens, and max_tokens apply the same way regardless of tier. See Retrieval.

Query credits

standard is billed at a flat 1 query-credit per query, whatever the answer size.

advanced is billed per delivered ktoken, in proportion to the context returned:

credits = max(50, round(10 × delivered_ktokens))

Delivered answer	Credits
5,000 tokens (default)	50
15,000 tokens	150
500,000 tokens (max)	5,000

Reach for advanced selectively — on the queries where precision is worth it — rather than as a global default. A common pattern is standard for routine, high-volume traffic and advanced only when a query is known to be hard or high-stakes, paying in proportion to the context delivered.

Request fields

retrieval_tier — "standard" or "advanced". (Legacy premium: true is equivalent to advanced.)
target_tokens — the requested answer budget. Default 5,000, range 100–500,000 (never exceeds max_tokens). On advanced, this drives both rerank depth and cost.
max_tokens — hard ceiling on the returned context. Default 15,000, range 100–500,000.

Retrieval — request shape and the min_relevance / target_tokens / max_tokens knobs.
RetrievalContext format — the grammar of the returned block.

Grill

Getting started

Concepts

Reference

PrimeCut

Getting started

Concepts

Reference

Python SDK

Getting started

Concepts

Reference

Integrations

Migration

CLI

MCP

Learn (study path)

Chunking

Ingestion

Retrieval tiers

The two tiers

How a tier affects results

Query credits

Request fields

Next

Chunking

Ingestion

Retrieval tiers ​

The two tiers ​

How a tier affects results ​

Query credits ​

Request fields ​

Next ​

Retrieval tiers

The two tiers

How a tier affects results

Query credits

Request fields

Next