Blog

Enterprise AI is becoming a model-routing problem

The Uber/Claude Code budget story was a governance warning. The next layer down is inference economics: if DeepSeek-class models can handle large volumes of enterprise work at a fraction of frontier-model cost, then model routing becomes a board-level control surface.

2026-05-306 min readAI economicsAI governancemodel routingenterprise AIDeepSeekinference

The next layer under the Uber story

The Uber/Claude Code story was not really about one company overspending on one tool. It was a governance warning. Usage can scale faster than discipline if the organization opens access before it has budget ownership, review patterns, and clear workflow boundaries.

This CNBC segment with Aidan Gomez points to the next layer down. Even if an enterprise learns the governance lesson, it still has to confront the economics of inference itself. The question is no longer whether to use AI. The question is how much of the work actually requires frontier-model pricing.

That is where the buying equation changes. Once companies move from proofs of concept into production, inference stops being a novelty expense and becomes an operational architecture decision.

The arithmetic is what forces the issue

The arithmetic has gotten sharper since the CNBC segment aired. As of late May 2026, Claude Opus 4.8 costs $5 per million input tokens and $25 per million output tokens according to Anthropic's pricing page. DeepSeek V4 Pro, now at permanently reduced pricing after DeepSeek's 75% promo, costs $0.435 per million input and $0.87 per million output. That is an 11.5× gap on input, a 28.7× gap on output, and roughly an 18× gap blended for typical enterprise workloads. A budget that buys a month of Opus can cover most of a year on DeepSeek Pro.

That matters because most enterprise workloads are not moonshot reasoning tasks. A large share of real usage is summarization, extraction, classification, coding assistance, workflow augmentation, and repetitive internal operations. Many of those jobs do not need the most expensive model in the market every time.

Once CFOs begin examining spend at production scale, the old habit of routing everything through the premium model starts to look less like sophistication and more like procurement laziness.

The market is splitting by trust, not just capability

The interesting implication is not that Chinese open models will simply replace OpenAI or Anthropic. It is that the market is splitting. One side is driven by cost and volume: organizations willing to use very cheap, very capable open models for lower-sensitivity workloads. The other side is driven by trust: regulated industries, critical infrastructure, and security-constrained environments that will pay a premium for Western or private deployments.

That second market is exactly where Cohere, Nvidia, and similar players are moving. Their bet is not just 'better model.' It is secure deployment, constrained-compute efficiency, and the ability to run useful systems on-prem or in tightly controlled environments.

In other words, premium pricing does not disappear. It narrows. The moat is no longer 'we are the smartest lab in the world.' The moat becomes 'we can satisfy your trust, provenance, deployment, and liability requirements without blowing up your cost structure.'

On-prem inference changes the privacy argument

This is where the DeepSeek point becomes strategically important. For many corporations, the appeal is not only that Chinese open models are cheaper. It is that open-weight or privately deployable systems create a different privacy posture. If the workload is sensitive, the company can choose to run a capable model on its own hardware instead of sending everything to a third-party frontier API.

That does not dissolve the governance problem. It changes its shape. Once the model is inside your own walls, you own more of the logging, access control, model provenance, output review, and cyber-risk surface. You also lose the illusion that the hosted provider is carrying the full burden of safety for you.

But for many enterprises, that is still an attractive trade. They would rather govern a model they control than remain permanently exposed to a cloud-only cost and privacy structure they do not.

What teams should do

The right response is not to declare frontier models overpriced or to declare cheap models good enough for everything. It is to build a routing strategy that treats model choice as a governance and architecture problem.

  • separate high-trust, regulated, or code-sensitive workloads from bulk lower-sensitivity workflows
  • measure model spend by task class and business outcome, not just by user or team
  • define when frontier reasoning is actually worth the premium and when smaller or cheaper models are sufficient
  • evaluate private deployment options for sensitive internal workflows instead of assuming every task belongs in a hosted API
  • treat model routing as a standing operating policy rather than an ad hoc engineer preference
  • review the liability surface of self-hosted and open models instead of treating them as a simple cost hack

The bottom line

The Uber story showed that AI adoption can outrun governance. The DeepSeek and Cohere story shows that inference economics are now forcing a second discipline: enterprises need explicit model-routing strategies. The future enterprise stack is not one premium model used everywhere. It is a governed mix of frontier models, cheaper open models, and private deployments chosen according to cost, trust, and consequence.

Talk it through

Need help translating the lesson into operating discipline?

If you want to turn this into a budget, review, or rollout pattern that actually survives contact with the team, Luis can help.

luis@yugenadvisors.com