Skip to main content

Stop defaulting to the most powerful model for rule-based tasks

·450 words·3 mins
AI Tech Product

My default when starting a new AI integration is to reach for the most capable model available — in this case, the most capable model in the client’s purchased tier. More capability should mean better results. That logic feels obvious until it breaks something.

In this case, it nearly broke a customer-facing chatbot before we pushed it to production.

The hallucination hiding in plain sight
#

Illustration for The hallucination hiding in plain sight We were building a customer service bot for a tour operator. The knowledge base was explicit: the 3-day, 2-night package does not include airline tickets.

When a customer asked, “Does this tour include airline tickets?”, the model read the knowledge base and then reasoned its way past it. It knew that most all-inclusive tours typically include flights, so it answered: “This tour includes round-trip airline tickets.”

Dead wrong — completely contrary to the actual policy. Had we not caught it during testing, a customer would have shown up at the airport without a ticket. The model was too capable for this specific job.

Why high capability becomes a liability here
#

Illustration for Why high capability becomes a liability here Powerful models tend to fill gaps. When the knowledge base did not resolve a question cleanly, this one answered as if it had — drawing on general world knowledge about how tours usually work rather than stopping at what the document actually said. For creative writing or debugging complex code, that tendency is useful. For a customer service bot operating from a fixed knowledge base, it is the failure mode: the model’s instinct to help becomes the thing that breaks compliance.

Customer service bots don’t need creativity. They need to stay inside the lines — answer from the knowledge base, follow the rules, and stop there.

I switched to a smaller model with the same knowledge base and the same prompt. The bot responded: “This tour does not include airline tickets. Please book your own tickets or contact us if you need help with a separate booking.” Done. No fabrication, no embellishment.

Matching the model to the actual job
#

Illustration for Matching the model to the actual job This reshaped how I think about model selection. The rough split I use now: scripted customer service, knowledge base Q&A, rule-based classification, and schema extraction all call for a smaller, more predictable model where stability matters more than reasoning range. Debugging, content generation, and multi-factor analysis are where a larger model earns its place — the ability to infer and synthesize is the whole point.

Defaulting to the most powerful model available creates risk that does not surface until something goes wrong in production. The right question is not which model is most capable — it is which model is most appropriate for what this task actually requires.

That is the engineering decision.

Related

When you push an AI coding agent too hard, it starts optimizing for the wrong thing
·1185 words·6 mins
AI Tech Engineering
Anthropic’s interpretability research on internal activation patterns helps explain a pattern most AI coding tool users have hit: the agent passes all the tests, but the code is quietly broken. Here’s what that means for how you run these sessions.
The AI apprenticeship model: building senior talent in the age of automation
·2607 words·13 mins
AI Tech Engineering Talent Leadership
AI isn’t eliminating junior engineering roles — it’s removing the accidental training system most companies relied on. Here’s the business model for rebuilding it deliberately, with measurable economics and a 90-day pilot structure.
Why the AI playbook breaks when you scale past a few developers
·811 words·4 mins
Tech AI Scaling
What works for a solo developer or a team of 10 falls apart at enterprise scale. Here is how the bottleneck shifts from personal tooling to messy data pipelines and organizational coordination as you grow.