Over-Looking in Search-Augmented Massive Language Fashions

January 20, 2026

46

Search-augmented giant language fashions (LLMs) excel at knowledge-intensive duties by integrating exterior retrieval.
Nonetheless, they typically over-search – unnecessarily invoking search software even when it doesn’t enhance response high quality,
which results in computational inefficiency and hallucinations by incorporating irrelevant context. On this work, we conduct a
systematic analysis of over-searching throughout a number of dimensions, together with question varieties, mannequin classes, retrieval
circumstances, and multi-turn conversations. Our discovering reveals: (i) search typically improves reply accuracy on answerable
queries however harms abstention on unanswerable ones; (ii) over-searching is extra pronounced in advanced reasoning fashions
and deep analysis programs, is exacerbated by noisy retrieval, and compounds throughout turns in multi-turn conversations; and
(iii) the composition of retrieved proof is essential, because the presence of unfavorable proof improves abstention. To quantify
over-searching, we introduce Tokens Per Correctness (TPC), an analysis metric that captures the performance-cost
trade-off for search-augmented LLMs. Lastly, we examine mitigation approaches at each the question and retrieval ranges
and launch the OverSearchQA benchmark to foster continued analysis into environment friendly search-augmented LLMs.

† Duke College
** Work achieved whereas at Apple

Over-Looking in Search-Augmented Massive Language Fashions

Related Articles

NASA’s DART spacecraft modified an asteroid’s orbit across the solar

Programming an estimation command in Stata: World macros versus native macros

GenCtrl — A Formal Controllability Toolkit for Generative Fashions

Latest Articles

NASA’s DART spacecraft modified an asteroid’s orbit across the solar

Programming an estimation command in Stata: World macros versus native macros

GenCtrl — A Formal Controllability Toolkit for Generative Fashions

CIOs say they would not pull workloads again from the cloud

So, how come we will use TensorFlow from R?