I used to be requested to do one thing new at work: Given an information dump of unstructured textual content information, give us an in depth PDF report of insights about what prospects are saying about our merchandise this quarter.
So I wrote a transparent immediate. Gave Claude an in depth set of directions. Fed it the dataset. It gave me an output. I delivered it.
However when the stakeholder and I reviewed the deliverable in depth, we observed some more and more unsettling issues.
Claude was confidently mistaken.
Not mistaken mistaken, like hallucinating information from nowhere. Extra like… overconfident mistaken. It could generate a quarterly perception report and say one thing like:
“Adverse sentiment within the Clothes division elevated 23% this quarter, indicating a big shift in buyer satisfaction that warrants instant consideration from the product staff.”
Sounds nice. Besides that spike was pushed nearly fully by a single fashionable merchandise that launched mid-quarter with a identified sizing defect. One product. Not the entire division.
Claude had no thought. And my immediate didn’t inform it to care.
A Quarterly Buyer Assessment Report Talent
I’m going to stroll you thru a Claude ability I constructed that generates a quarterly buyer sentiment report from unstructured product overview textual content, delivered as a PDF to stakeholders.
Clearly, I received’t be sharing the precise dataset I analyzed at work. The dataset I’m utilizing is the Ladies’s E-Commerce Clothes Evaluations dataset from Kaggle (CC0 license). It accommodates 23,000 actual, anonymized buyer opinions throughout clothes departments (Tops, Clothes, Bottoms, Jackets, and extra) with textual content, star rankings, and product metadata. References to the corporate within the opinions have been changed with “retailer.”
The ability ought to:
- Learn a filtered slice of opinions for the present quarter
- Group them by division
- Determine tendencies & considerations
- Write knowledgeable abstract PDF for the product management staff
Right here’s the unique immediate:
You’re a information analyst producing a quarterly buyer sentiment report for a girls’s clothes e-commerce retailer. Given this quarter’s buyer opinions (together with overview textual content, star rankings, and division), write knowledgeable stakeholder report that features:
– An total sentiment abstract for the quarter
– Key themes by division (Tops, Clothes, Bottoms, Jackets)
– 2-3 standout insights from the overview textual content
– A short advice for the product staff
Be skilled and clear.
While you’re executed with this job, please create a ability titled reviews-analysis and save your directions in there.
What “Confidently Flawed” Truly Seems to be Like
Right here’s an instance of what Claude produced with the naive ability above, on 1 / 4 the place the Clothes division had an inflow of adverse opinions:
“Adverse sentiment within the Clothes division elevated considerably this quarter, with prospects regularly citing match and sizing points. This implies the retailer’s sizing requirements could also be drifting from buyer expectations — a development that, if unaddressed, may erode model loyalty on this key class.”
The true clarification? One costume (a single SKU) launched in Week 7 with a batch high quality difficulty. The opinions have been nearly fully about that one merchandise. The remainder of the Clothes division was performing high quality.
Claude didn’t essentially invent something. It simply had no context for why the sample existed. And with out that context, it did what LLMs do: it crammed the hole with probably the most plausible-sounding narrative.

The Repair: 4 Strains You MUST Embody
Line 1: Inform Claude What Context It’s Lacking
You do NOT have entry to product launch calendars, stock data, promotional campaigns, or particular person SKU-level historical past. Do NOT attribute department-level tendencies to brand-wide causes. Report patterns you observe within the textual content; don’t clarify why they exist until the opinions themselves make it unambiguous.
This single instruction eliminates an enormous class of assured wrongness. With out it, Claude will all the time attain for a strategic narrative as a result of that’s what a superb analyst does, and Claude is making an attempt to be a superb analyst.
The issue is {that a} good analyst additionally is aware of what they don’t know. They are saying “We’re seeing elevated sizing complaints in Clothes this quarter. This can be remoted to a current launch however we’d want SKU-level information to verify.” Claude received’t say that until you inform it to.
Line 2: Outline What “Important” Truly Means
Claude loves the phrase important. It makes use of it on a regular basis. And it nearly by no means defines it.
Solely flag a sentiment shift as “important” if it represents a change of greater than 15 share factors in optimistic/adverse ratio in comparison with the prior quarter, OR if a theme seems in additional than 20% of opinions in a given division. For smaller indicators, use language like “slight uptick” or “minor improve.” Don’t use the phrase “notable” or “important” for something beneath these thresholds. All the time report the precise quantity worth for the shift alongside along with your declare.
You’ll be able to regulate the 15% and 20% thresholds to no matter is sensible in your information. The purpose is to anchor Claude’s language to one thing actual.
With out this, Claude will name each a 3-review spike in complaints and a real 30-point sentiment drop “important”. Your stakeholders will begin to tune out. And when one thing really important occurs, they received’t understand it.
Line 3: Power a Confidence Qualifier on Each Perception
Earlier than every perception, embrace a confidence label in brackets: [Data-Supported], [Possible], or [Speculative].
Use [Data-Supported] solely when the perception follows straight from the overview textual content supplied. Use [Possible] when the perception is an inexpensive inference from the textual content. Use [Speculative] if you find yourself making assumptions about causes or context that aren’t current within the opinions themselves.
After I first added this line, I used to be anticipating principally [Data-Supported] tags. What I really acquired was a mixture of all three, which instructed me precisely how a lot Claude had been filling in gaps in my earlier experiences with out me realizing it.
An instance of what the output appears to be like like after including this line:

Now your stakeholders can see precisely what’s strong and what’s a guess. That’s a way more sincere report.
Line 4: Require Claude to State the Limits of the Evaluation
On the finish of the report, embrace a bit known as “What This Report Can not Inform You.” Listing 2-3 issues that will be wanted to attract stronger conclusions, for instance, SKU-level overview breakdowns, return charges, or repeat buy information.
This line forces Claude to acknowledge the sides of its personal evaluation. And it offers your stakeholders a transparent roadmap for what questions to analyze additional, which is definitely probably the most useful factor an analyst can do.
Right here’s the output:

Find out how to Use Claude to Refine the Talent
Writing a ability as soon as isn’t sufficient. You want to check it and enhance it the identical approach you’d iterate on a mannequin.
Step 1: Run the ability on identified examples.
Filter the dataset to a time window the place you already know what occurred. (1 / 4 with a product recall, a seasonal promotion, a interval with unusually excessive return charges, and so forth.) See what Claude says. Does it use the phrase “important” accurately? Does it state information/statistics the place it ought to?
Step 2: Feed Claude its personal output and ask it to audit.
Claude is nice at catching its personal overconfidence once you explicitly ask it to search for it.
Here’s a quarterly buyer sentiment report generated by an AI analyst. Assessment each perception on this report and flag any that:
– Make causal claims with out direct proof within the overview textual content
– Use phrases like “important” or “notable” with out justification
– Attribute particular person product points to brand-wide tendencies
– Assume context not current within the dataset (launch calendars,
stock, buy historical past)
For every flagged merchandise, counsel a revised model that’s extra appropriately hedged.
Step 3: Add a clause for every failure you discover.
Each time Claude produces a report with a clearly mistaken or overconfident perception, ask it so as to add a brand new constraint to your ability. Over time, your ability just about turns into a report of every thing Claude will get mistaken.
A Phrase of Warning
Including constraints to your ability can generally make Claude produce an output the place each single sentence ends with “…although extra information can be wanted to verify this.”
That’s not helpful both.
The aim is calibrated confidence the place the energy of Claude’s language matches the energy of the proof. When you discover Claude turning into overly wishy-washy, you may add a counterbalancing constraint:
Don’t over-qualify each assertion. If a sample seems clearly and constantly throughout many opinions, state it plainly and embrace references to the info behind the sample. Reserve qualifiers for genuinely unsure or speculative claims.
Conclusion
Claude is spectacular at producing professional-looking experiences, which might generally be the issue.
The polish hides the overconfidence. Your stakeholders see clear formatting and authoritative language, they usually assume the insights are strong even once they’re not.
The 4 traces I’ve walked by means of right here don’t make Claude much less succesful. They make it extra sincere. And in a reporting context, sincere is extra useful than spectacular.
Learn extra about what different use circumstances Claude is nice for right here, together with constructing dashboards, debugging, and writing documentation:
→ 3 Claude Expertise Each Knowledge Scientist Wants in 2026
Thanks for Studying
Join with me on LinkedIn
Purchase me a espresso to assist my work!
