An LLM’s context window is often compared to its “working memory.” This analogy is both helpful (in understanding its function) and potentially misleading (in understanding its practical applications).
Theoretically, a much larger context window would solve many of the most pressing issues currently facing LLMs. For example: a supersized context window could be expected to greatly reduce hallucinations, since this would enable an LLM to “remember” more information it ingests. As a result, there would be fewer gaps in the LLM’s knowledge base and fewer opportunities for it to invent incorrect information to fill those gaps.
But it’s a bit more complicated in practice.
Larger context windows require more computing power—a lot more, since compute requirements increase quadratically in comparison with the input. In other words: if the context window doubles in size, the LLM requires four times as much power to process the information in it.
In addition to increased compute costs, larger context windows haven’t always led to improved accuracy in real-world use cases. Perhaps the most notorious example is the case of an Australian lawyer who was recently stripped of his ability to practice as a principal lawyer after submitting court documents riddled with nonexistent AI-generated citations. Despite being produced by “reputable” AI-powered legal software, the output was still plagued by hallucinations.
Here’s where the “working memory” analogy is unintentionally apt. The larger the context window, the more information is contained in the middle of the window. And in real world use cases, LLMs have a tendency to skim over those details—much like a human reader when confronted with a solid page of text. Whether you’re a robot or a meatsack, it’s easy to get lost in the middle.
The term “bathtub curve” has become popular among engineers for describing the failure rates of the products they build. This mental image—with the ends of the tub clearly visible, and the middle entirely submerged—can also be useful for understanding how information gets lost even in the biggest context windows.
LLMs might read the first few sentences carefully, but soon their eyes glaze over. Upon reaching the end of the page, their attention may perk up again, but their comprehension of everything in the middle remains hazy. As a result, the LLM remains prone to introducing incorrect information into its outputs.