Retrieval tiers
Every Grill query runs through hybrid retrieval first: dense (cosine) and lexical (BM25) scores are fused into a single candidate ranking. On top of that, you choose a retrieval tier per query, which controls whether a reranker re-scores the candidates before the RetrievalContext is built.
The reranker is a managed Cohere model; you don't configure it. Picking a tier is the only knob — standard for the cheapest, fastest path, and advanced when answer quality on hard or ambiguous queries is worth paying for.
The two tiers
| Tier | What it does | When to use it | Billing |
|---|---|---|---|
standard | Fusion only — no reranker. Returns the hybrid cosine + BM25 ranking directly. | Cheap, high-volume lookups where fusion is enough — most chat and knowledge-base queries, latency-sensitive paths. | 1 query-credit per query, flat, regardless of how much context is returned. |
advanced | Adds a Cohere rerank-v4.0-pro cross-encoder pass over a candidate pool that scales with your requested token budget. | When answer quality on hard or ambiguous queries matters — large namespaces, near-duplicate documents, "must not miss the one right passage" retrieval. | Per delivered ktoken — see below. |
How a tier affects results
standardis the fusion ranking on its own. It's fast and cheap, and for well-chunked documents it's often all you need — the hybrid score already combines semantic and exact-term signals.advancedadds a second-pass cross-encoder that sees the full query–candidate pair, not just vector similarity. This is where ambiguous or near-duplicate candidates get correctly ordered. The rerank pool scales with your requested token budget, so a larger answer reranks more deeply — a relevant passage that fusion ranked too low still has a chance to surface.
There's no separate "deeper" tier. Going deeper just means requesting a larger token budget with target_tokens: rerank depth and cost both follow the budget. The tier only changes the ranking quality, not the response shape — min_relevance, target_tokens, and max_tokens apply the same way regardless of tier. See Retrieval.
Query credits
standard is billed at a flat 1 query-credit per query, whatever the answer size.
advanced is billed per delivered ktoken, in proportion to the context returned:
credits = max(50, round(10 × delivered_ktokens))| Delivered answer | Credits |
|---|---|
| 5,000 tokens (default) | 50 |
| 15,000 tokens | 150 |
| 100,000 tokens | 1,000 |
Reach for advanced selectively — on the queries where precision is worth it — rather than as a global default. A common pattern is standard for routine, high-volume traffic and advanced only when a query is known to be hard or high-stakes, paying in proportion to the context delivered.
Request fields
retrieval_tier—"standard"or"advanced". (Legacypremium: trueis equivalent toadvanced.)target_tokens— the requested answer budget. Default 5,000, up to 500,000. Onadvanced, this drives both rerank depth and cost.max_tokens— hard ceiling on the returned context. Default 15,000.
Next
- Retrieval — request shape and the
min_relevance/target_tokens/max_tokensknobs. - RetrievalContext format — the grammar of the returned block.