The LLM Looked Smart. The Metrics Disagreed
In early 2025, a company faced backlash after offering digital bank accounts to individuals without credit. Although an AI tool was implemented to classify complaints, it struggled with recall, missing a significant number of issues. The author ultimately found that manual labeling was necessary to accurately assess the situation and improve the product's reception.
- ▪The company aimed to expand its product by offering digital bank accounts to those without credit.
- ▪Public backlash emerged as users complained about account approvals, leading to lower brand affinity.
- ▪An AI classifier showed high precision but low recall, missing many complaints.
- ▪The author manually labeled complaints to gain a clearer understanding of the issues.
- ▪Tuning the AI prompts became increasingly complex and raised concerns about overfitting.
Opening excerpt (first ~120 words) tap to expand
The LLM Looked Smart. The Metrics Disagreed This case from early 2025 is an interesting reminder that, even in this brave new world where models are increasingly commoditized through transformers and LLMs, the old concepts of data science still stubbornly refuse to die. Back then, Will Bank had one clear mission: credit. That was about to change. I was hired to help expand the product by offering free digital bank accounts, even to people who wouldn’t qualify for credit. I already touched on part of this story in “An Approval Model That Finally Got Approved”, but this chapter came with an entirely different headache. Among many operational issues, there was one problem loud enough to echo through every metrics dashboard: public brand backlash.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Rio.