Studying to Purpose for Hallucination Span Detection

March 12, 2026

3

Massive language fashions (LLMs) usually generate hallucinations — unsupported content material that undermines reliability. Whereas most prior works body hallucination detection as a binary process, many real-world functions require figuring out hallucinated spans, which is a multi-step determination making course of. This naturally raises the query of whether or not specific reasoning might help the advanced process of detecting hallucination spans. To reply this query, we first consider pretrained fashions with and with out Chain-of-Thought (CoT) reasoning, and present that CoT reasoning has the potential to generate no less than one appropriate reply when sampled a number of occasions. Motivated by this, we suggest RL4HS, a reinforcement studying framework that incentivizes reasoning with a span-level reward perform. RL4HS builds on Group Relative Coverage Optimization and introduces Class-Conscious Coverage Optimization to mitigate reward imbalance challenge. Experiments on the RAGTruth benchmark (summarization, query answering, data-to-text) present that RL4HS surpasses pretrained reasoning fashions and supervised fine-tuning, demonstrating the need of reinforcement studying with span-level rewards for detecting hallucination spans.

† Nationwide Taiwan College, Taiwan

Studying to Purpose for Hallucination Span Detection

Related Articles

Effectivity comparisons by Monte Carlo simulation

Giant enterprises want high-performing networks to scale AI

Utilizing AI to Automate Reporting With out Dropping Analytical Depth

Latest Articles

Effectivity comparisons by Monte Carlo simulation

Giant enterprises want high-performing networks to scale AI

Utilizing AI to Automate Reporting With out Dropping Analytical Depth

Main MediaTek safety flaw may expose information on thousands and thousands of Android telephones

Scientists might have discovered a tablet for sleep apnea