Saturday, March 21, 2026

This instance begins with a chi-square however ends with a lesson on how even well-written prompts can lead to hallucinations.


A analysis examine counted how typically ChatGPT made up citations for 3 totally different classes of psychological issues (binge consuming, physique dysmorphic, and main depressive). They used a chi-square to find out if charges of made up citations differed by dysfunction (they do). 

If ever there was an article that belonged on this weblog, that is it. You should use it in your stats class for instance of chi-square and/or as a warning to college students in case you ask them to carry out literature evaluations to your class.

The unique paper, Affect of subject familiarity and immediate specificity on quotation fabrication in psychological well being analysis utilizing massive language fashions: Experimental Examine was printed in December 2025, and summarized by PsyPost shortly after publishing. 

What the researchers did:

What the researchers discovered:

Results: Across the 6 reviews, GPT-4o generated 176 citations; 35 (19.9%) were fabricated. Among the 141 real citations, 64 (45.4%) contained errors, most frequently incorrect or invalid digital object identifiers. Fabrication rates differed significantly by disorder (χ22=13.7; P=.001), with higher rates for binge eating disorder (17/60, 28%) and body dysmorphic disorder (14/48, 29%) than for major depressive disorder (4/68, 6%). While fabrication did not differ overall by review type, stratified analyses showed higher fabrication for specialized versus general reviews of binge eating disorder (11/24, 46% vs 6/36, 17%; P=.01). Accuracy rates also varied by disorder (χ22=11.6; P=.003), being lowest for body dysmorphic disorder (20/34, 59%) and highest for major depressive disorder (41/64, 64%). Accuracy rates differed by review type within some disorders, including higher accuracy for general reviews of major depressive disorder (26/34, 77% vs 15/30, 50%; P=.03)

Easy methods to use in school:

1. This can be a good chi-square outcomes part. They shared the check worth and the p worth, in fact, however I like how they shared the various charges of inaccuracy as absolute information and percentages all through. Chi-squares may be tough to current in textual content (versus a desk) and the authors did a superb job right here. 

2. In case you are speaking to your college students about correct use of AI: These researchers shared their precise prompts of their supplemental materials. This demonstrates a) correct, moral citations of prompts when utilizing AI in analysis and that b) the well-written prompts nonetheless resulted in bogus information. 

Related Articles

Latest Articles