Reasoning’s Razor: Reasoning Improves Accuracy however Can Harm Recall at Important Working Factors in Security and Hallucination Detection

November 1, 2025

84

Reasoning has grow to be a central paradigm for big language fashions (LLMs), constantly boosting accuracy throughout various benchmarks. But its suitability for precision-sensitive duties stays unclear. We current the primary systematic examine of reasoning for classification duties beneath strict low false constructive fee (FPR) regimes. Our evaluation covers two duties—security detection and hallucination detection—evaluated in each fine-tuned and zero-shot settings, utilizing commonplace LLMs and Massive Reasoning Fashions (LRMs). Our outcomes reveal a transparent trade-off: Assume On (reasoning-augmented) technology improves total accuracy however underperforms on the low-FPR thresholds important for sensible use. In distinction, Assume Off (no reasoning throughout inference) dominates in these precision-sensitive regimes, with Assume On surpassing solely when larger FPRs are acceptable. As well as, we discover token-based scoring considerably outperforms self-verbalized confidence for precision-sensitive deployments. Lastly, a easy ensemble of the 2 modes recovers the strengths of every. Taken collectively, our findings place reasoning as a double-edged device: helpful for common accuracy, however usually ill-suited for functions requiring strict precision.

‡ Equal contribution
† College of Maryland, Faculty Park
** Work achieved whereas at Apple

Reasoning’s Razor: Reasoning Improves Accuracy however Can Harm Recall at Important Working Factors in Security and Hallucination Detection

Related Articles

Excerpt—The Nice Shadow, by Susan Sensible Bauer

Epidemiology research designs: Choosing the proper lens

A Sensible Toolkit for Time Sequence Anomaly Detection, Utilizing Python

Latest Articles

Excerpt—The Nice Shadow, by Susan Sensible Bauer

Epidemiology research designs: Choosing the proper lens

A Sensible Toolkit for Time Sequence Anomaly Detection, Utilizing Python

Florida Crystals CIO on constructing on historical past of IT transformation

How Automation and AI Are Reworking Small Enterprise Operations