An LLM’s context window is often compared to its “working
memory.” This analogy is both helpful (in understanding its
function) and potentially misleading (in understanding its
practical applications).
Theoretically, a much larger context window would solve many
of the most pressing issues currently facing LLMs. For
example: a supersized context window could be expected to
greatly reduce hallucinations, since this would enable an LLM
to “remember” more information it ingests. As a result, there
would be fewer gaps in the LLM’s knowledge base and fewer
opportunities for it to invent incorrect information to fill
those gaps.
But it’s a bit more complicated in practice.
Larger context windows require more computing power—a lot
more, since compute requirements increase quadratically in
comparison with the input. In other words: if the context
window doubles in size, the LLM requires four times as much
power to process the information in it.
In addition to increased compute costs, larger context windows
haven’t always led to improved accuracy in real-world use
cases. Perhaps the most notorious example is the case of an
Australian lawyer who was recently stripped of his ability to
practice as a principal lawyer after
submitting court documents riddled with nonexistent
AI-generated citations. Despite being produced by “reputable” AI-powered legal
software, the output was still plagued by hallucinations.
Here’s where the “working memory” analogy is unintentionally
apt. The larger the context window, the more information is
contained in the middle of the window. And in real world use
cases, LLMs have a tendency to skim over those details—much
like a human reader when confronted with a solid page of text.
Whether you’re a robot or a meatsack, it’s easy to get lost in
the middle.
The term
“bathtub curve”
has become popular among engineers for describing the failure
rates of the products they build. This mental image—with the
ends of the tub clearly visible, and the middle entirely
submerged—can also be useful for understanding how information
gets lost even in the biggest context windows.
LLMs might read the first few sentences carefully, but soon
their eyes glaze over. Upon reaching the end of the page,
their attention may perk up again, but their comprehension of
everything in the middle remains hazy. As a result, the LLM
remains prone to introducing incorrect information into its
outputs.