Skip to content

Retrieval tiers

Every Grill query runs through hybrid retrieval first: dense (cosine) and lexical (BM25) scores are fused into a single candidate ranking. On top of that, you choose a retrieval tier per query, which controls whether a reranker re-scores the candidates before the RetrievalContext is built.

The reranker is a managed Cohere model; you don't configure it. Picking a tier is the only knob — standard for the cheapest, fastest path, and advanced when answer quality on hard or ambiguous queries is worth paying for.

The two tiers

TierWhat it doesWhen to use itBilling
standardFusion only — no reranker. Returns the hybrid cosine + BM25 ranking directly.Cheap, high-volume lookups where fusion is enough — most chat and knowledge-base queries, latency-sensitive paths.1 query-credit per query, flat, regardless of how much context is returned.
advancedAdds a Cohere rerank-v4.0-pro cross-encoder pass over a candidate pool that scales with your requested token budget.When answer quality on hard or ambiguous queries matters — large namespaces, near-duplicate documents, "must not miss the one right passage" retrieval.Per delivered ktoken — see below.

How a tier affects results

  • standard is the fusion ranking on its own. It's fast and cheap, and for well-chunked documents it's often all you need — the hybrid score already combines semantic and exact-term signals.
  • advanced adds a second-pass cross-encoder that sees the full query–candidate pair, not just vector similarity. This is where ambiguous or near-duplicate candidates get correctly ordered. The rerank pool scales with your requested token budget, so a larger answer reranks more deeply — a relevant passage that fusion ranked too low still has a chance to surface.

There's no separate "deeper" tier. Going deeper just means requesting a larger token budget with target_tokens: rerank depth and cost both follow the budget. The tier only changes the ranking quality, not the response shape — min_relevance, target_tokens, and max_tokens apply the same way regardless of tier. See Retrieval.

Query credits

standard is billed at a flat 1 query-credit per query, whatever the answer size.

advanced is billed per delivered ktoken, in proportion to the context returned:

credits = max(50, round(10 × delivered_ktokens))
Delivered answerCredits
5,000 tokens (default)50
15,000 tokens150
100,000 tokens1,000

Reach for advanced selectively — on the queries where precision is worth it — rather than as a global default. A common pattern is standard for routine, high-volume traffic and advanced only when a query is known to be hard or high-stakes, paying in proportion to the context delivered.

Request fields

  • retrieval_tier"standard" or "advanced". (Legacy premium: true is equivalent to advanced.)
  • target_tokens — the requested answer budget. Default 5,000, up to 500,000. On advanced, this drives both rerank depth and cost.
  • max_tokens — hard ceiling on the returned context. Default 15,000.

Next