I used to be requested to do one thing new at work: Given an information dump of unstructured textual content knowledge, give us an in depth PDF report of insights about what clients are saying about our merchandise this quarter.
So I wrote a transparent immediate. Gave Claude an in depth set of directions. Fed it the dataset. It gave me an output. I delivered it.
However when the stakeholder and I reviewed the deliverable in depth, we seen some more and more unsettling issues.
Claude was confidently unsuitable.
Not unsuitable unsuitable, like hallucinating info from nowhere. Extra like… overconfident unsuitable. It could generate a quarterly perception report and say one thing like:
“Unfavourable sentiment within the Clothes division elevated 23% this quarter, indicating a major shift in buyer satisfaction that warrants quick consideration from the product group.”
Sounds nice. Besides that spike was pushed nearly solely by a single fashionable merchandise that launched mid-quarter with a identified sizing defect. One product. Not the entire division.
Claude had no thought. And my immediate didn’t inform it to care.
A Quarterly Buyer Evaluation Report Talent
I’m going to stroll you thru a Claude talent I constructed that generates a quarterly buyer sentiment report from unstructured product evaluate textual content, delivered as a PDF to stakeholders.
Clearly, I received’t be sharing the precise dataset I analyzed at work. The dataset I’m utilizing is the Ladies’s E-Commerce Clothes Opinions dataset from Kaggle (CC0 license). It comprises 23,000 actual, anonymized buyer critiques throughout clothes departments (Tops, Clothes, Bottoms, Jackets, and extra) with textual content, star rankings, and product metadata. References to the corporate within the critiques have been changed with “retailer.”
The talent ought to:
- Learn a filtered slice of critiques for the present quarter
- Group them by division
- Establish tendencies & issues
- Write an expert abstract PDF for the product management group
Right here’s the unique immediate:
You’re a knowledge analyst producing a quarterly buyer sentiment report for a ladies’s clothes e-commerce retailer. Given this quarter’s buyer critiques (together with evaluate textual content, star rankings, and division), write an expert stakeholder report that features:
– An general sentiment abstract for the quarter
– Key themes by division (Tops, Clothes, Bottoms, Jackets)
– 2-3 standout insights from the evaluate textual content
– A quick advice for the product group
Be skilled and clear.
If you’re performed with this job, please create a talent titled reviews-analysis and save your directions in there.
What “Confidently Fallacious” Truly Seems to be Like
Right here’s an instance of what Claude produced with the naive talent above, on 1 / 4 the place the Clothes division had an inflow of unfavourable critiques:
“Unfavourable sentiment within the Clothes division elevated considerably this quarter, with clients ceaselessly citing match and sizing points. This implies the retailer’s sizing requirements could also be drifting from buyer expectations — a pattern that, if unaddressed, might erode model loyalty on this key class.”
The actual rationalization? One gown (a single SKU) launched in Week 7 with a batch high quality problem. The critiques have been nearly solely about that one merchandise. The remainder of the Clothes division was performing high-quality.
Claude didn’t essentially invent something. It simply had no context for why the sample existed. And with out that context, it did what LLMs do: it stuffed the hole with probably the most plausible-sounding narrative.

The Repair: 4 Traces You MUST Embrace
Line 1: Inform Claude What Context It’s Lacking
You do NOT have entry to product launch calendars, stock information, promotional campaigns, or particular person SKU-level historical past. Do NOT attribute department-level tendencies to brand-wide causes. Report patterns you observe within the textual content; don’t clarify why they exist until the critiques themselves make it unambiguous.
This single instruction eliminates an enormous class of assured wrongness. With out it, Claude will at all times attain for a strategic narrative as a result of that’s what a great analyst does, and Claude is making an attempt to be a great analyst.
The issue is {that a} good analyst additionally is aware of what they don’t know. They are saying “We’re seeing elevated sizing complaints in Clothes this quarter. This can be remoted to a latest launch however we’d want SKU-level knowledge to substantiate.” Claude received’t say that until you inform it to.
Line 2: Outline What “Important” Truly Means
Claude loves the phrase vital. It makes use of it on a regular basis. And it nearly by no means defines it.
Solely flag a sentiment shift as “vital” if it represents a change of greater than 15 proportion factors in optimistic/unfavourable ratio in comparison with the prior quarter, OR if a theme seems in additional than 20% of critiques in a given division. For smaller indicators, use language like “slight uptick” or “minor enhance.” Don’t use the phrase “notable” or “vital” for something beneath these thresholds. At all times report the precise quantity worth for the shift alongside along with your declare.
You possibly can modify the 15% and 20% thresholds to no matter is sensible on your knowledge. The purpose is to anchor Claude’s language to one thing actual.
With out this, Claude will name each a 3-review spike in complaints and a real 30-point sentiment drop “vital”. Your stakeholders will begin to tune out. And when one thing truly vital occurs, they received’t comprehend it.
Line 3: Power a Confidence Qualifier on Each Perception
Earlier than every perception, embrace a confidence label in brackets: [Data-Supported], [Possible], or [Speculative].
Use [Data-Supported] solely when the perception follows straight from the evaluate textual content offered. Use [Possible] when the perception is an affordable inference from the textual content. Use [Speculative] when you’re making assumptions about causes or context that aren’t current within the critiques themselves.
Once I first added this line, I used to be anticipating principally [Data-Supported] tags. What I truly received was a mixture of all three, which instructed me precisely how a lot Claude had been filling in gaps in my earlier stories with out me realizing it.
An instance of what the output appears to be like like after including this line:

Now your stakeholders can see precisely what’s strong and what’s a guess. That’s a way more sincere report.
Line 4: Require Claude to State the Limits of the Evaluation
On the finish of the report, embrace a piece known as “What This Report Can’t Inform You.” Listing 2-3 issues that might be wanted to attract stronger conclusions, for instance, SKU-level evaluate breakdowns, return charges, or repeat buy knowledge.
This line forces Claude to acknowledge the perimeters of its personal evaluation. And it provides your stakeholders a transparent roadmap for what questions to research additional, which is definitely probably the most priceless factor an analyst can do.
Right here’s the output:

Tips on how to Use Claude to Refine the Talent
Writing a talent as soon as isn’t sufficient. You might want to take a look at it and enhance it the identical manner you’d iterate on a mannequin.
Step 1: Run the talent on identified examples.
Filter the dataset to a time window the place you already know what occurred. (1 / 4 with a product recall, a seasonal promotion, a interval with unusually excessive return charges, and many others.) See what Claude says. Does it use the phrase “vital” accurately? Does it state info/statistics the place it ought to?
Step 2: Feed Claude its personal output and ask it to audit.
Claude is sweet at catching its personal overconfidence once you explicitly ask it to search for it.
Here’s a quarterly buyer sentiment report generated by an AI analyst. Evaluation each perception on this report and flag any that:
– Make causal claims with out direct proof within the evaluate textual content
– Use phrases like “vital” or “notable” with out justification
– Attribute particular person product points to brand-wide tendencies
– Assume context not current within the dataset (launch calendars,
stock, buy historical past)
For every flagged merchandise, counsel a revised model that’s extra appropriately hedged.
Step 3: Add a clause for every failure you discover.
Each time Claude produces a report with a clearly unsuitable or overconfident perception, ask it so as to add a brand new constraint to your talent. Over time, your talent just about turns into a report of all the things Claude will get unsuitable.
A Phrase of Warning
Including constraints to your talent can generally make Claude produce an output the place each single sentence ends with “…although extra knowledge could be wanted to substantiate this.”
That’s not helpful both.
The objective is calibrated confidence the place the power of Claude’s language matches the power of the proof. When you discover Claude changing into overly wishy-washy, you may add a counterbalancing constraint:
Don’t over-qualify each assertion. If a sample seems clearly and persistently throughout many critiques, state it plainly and embrace references to the information behind the sample. Reserve qualifiers for genuinely unsure or speculative claims.
Conclusion
Claude is spectacular at producing professional-looking stories, which might generally be the issue.
The polish hides the overconfidence. Your stakeholders see clear formatting and authoritative language, they usually assume the insights are strong even once they’re not.
The 4 strains I’ve walked by right here don’t make Claude much less succesful. They make it extra sincere. And in a reporting context, sincere is extra priceless than spectacular.
Learn extra about what different use instances Claude is sweet for right here, together with constructing dashboards, debugging, and writing documentation:
→ 3 Claude Abilities Each Information Scientist Wants in 2026
Thanks for Studying
Join with me on LinkedIn
Purchase me a espresso to help my work!
