Tuesday, January 20, 2026

Over-Looking in Search-Augmented Massive Language Fashions


Search-augmented giant language fashions (LLMs) excel at knowledge-intensive duties by integrating exterior retrieval.
Nonetheless, they typically over-search – unnecessarily invoking search software even when it doesn’t enhance response high quality,
which results in computational inefficiency and hallucinations by incorporating irrelevant context. On this work, we conduct a
systematic analysis of over-searching throughout a number of dimensions, together with question varieties, mannequin classes, retrieval
circumstances, and multi-turn conversations. Our discovering reveals: (i) search typically improves reply accuracy on answerable
queries however harms abstention on unanswerable ones; (ii) over-searching is extra pronounced in advanced reasoning fashions
and deep analysis programs, is exacerbated by noisy retrieval, and compounds throughout turns in multi-turn conversations; and
(iii) the composition of retrieved proof is essential, because the presence of unfavorable proof improves abstention. To quantify
over-searching, we introduce Tokens Per Correctness (TPC), an analysis metric that captures the performance-cost
trade-off for search-augmented LLMs. Lastly, we examine mitigation approaches at each the question and retrieval ranges
and launch the OverSearchQA benchmark to foster continued analysis into environment friendly search-augmented LLMs.

Related Articles

Latest Articles