Toward Data-Driven Multi-Model Enterprise AI
There is a pattern that time after time emerges in artificial intelligence infrastructure: a new problem appears not as a sudden shock, but as an accumulation of quiet, difficult, and consequential engineering constraints—growing just fast enough to be cumbersome, but not yet painful enough to draw architectural rethinks. That is, until someone re-frames the problem, not as a limitation of the models, but of the software abstraction layers that support them.
Not Diamond is one such company doing that reframing. Tomás Hernando Kofman, Tze-Yang Tung, and Jeffrey Akiki are building the critical infrastructure for the next wave of enterprise AI: not another foundation model, but the unifying layer that makes heterogeneous model ecosystems viable.
From Model Monocultures to Model Markets
When we look at the past few years of AI deployment, what’s notable isn’t just the rise of foundation models, but the tendency for companies to overfit on a single model provider. The appeal is straightforward: a single API, fewer integration paths, and a sense of control. But as the ecosystem matures and competitive models (and their versions) emerge, from OpenAI GPT to Anthropic Sonnet to Meta Llama to proprietary vertical models in healthcare or finance—this monoculture begins to break down. Companies start to ask different questions. Which model is cheapest? Fastest? Easiest to deploy under regulatory constraints? Best at reasoning vs. summarization?
The answer, increasingly, is not “one model” but “a fleet.” This is what Not Diamond enables—first with intelligent routing, and now with Prompt Adaptation. Their vision is a world where models become modular, swappable, and optimizable components within a larger orchestration layer. Not Diamond abstracts over foundation models the way Kubernetes did over physical machines.
Current demand for AI infrastructure has already begun shifting toward intelligent, adaptive orchestration that selects models based on context, and agents that manage prompts across models and providers. This shift is all about pragmatic forces that eventually bend architectures; it’s about cost, uptime, latency, and reliability.
Prompt Engineering at Scale is a Hidden Cost Center
Prompting may begin as an art—a skill learned through trial, error, and intuition—at a small scale. But when you multiply this task across dozens of models, hundreds of use cases, and thousands of edge cases, it becomes infrastructure debt.
This is the pain point Not Diamond is solving with Prompt Adaptation. The tool is (deceptively) simple: given a prompt and some labeled data, it rewrites that prompt for any other model to maximize accuracy. It does so through automated search over prompt space, combined with iterative evaluation loops of potential prompts. In practice, this automates and improves on what engineering teams are doing manually—re-tuning prompts each time a new model is trialed or swapped, a process that becomes increasingly unsustainable as the number of models grows.
The downstream effect is dramatic. In early deployments with large enterprises such as SAP, prompt adaptation has cut down the process of prompt engineering from weeks to hours. And as AI adoption accelerates, the overhead of managing multiple models will only increase. Companies cannot afford to throw endless human hours at the problem; it’s inefficient, costly, and doesn’t scale.
A Layer Designed for Entropy
There’s a tendency in infrastructure deployment to look for uniformity, predictability and control. But the AI stack is drifting in the opposite direction toward entropy. New models appear weekly, open-weight alternatives grow stronger, and fine-tuned vertical specialists outpace generalists in key domains.
Routing and adaptation become not just features but requirements. The question is no longer “How do I use GPT-4 well?” but “How do I structure my system so that switching from GPT-4 to Claude, or from Claude to a fine-tuned Llama, takes minutes, not months?”
Not Diamond’s system design reflects this: their routing infrastructure doesn’t privilege any one provider. It optimizes across models based on configurable user-defined metrics—accuracy, latency, token cost, carbon footprint, or anything else a team can measure. It’s been operational with some of the largest enterprises for nine months, serving over 100,000 users, and consistently outperforms individual foundation models on major benchmarks. Prompt Adaptation does not aim for a ‘universal’ prompt but a performant, model-specific one. Together, these form a toolkit for adapting to change—both in models and in the incentives around them.
The Economics of Adaptation
From a purely financial perspective, Not Diamond’s value proposition maps cleanly onto measurable outcomes. Tasks like retrieval-augmented generation (RAG), text-to-SQL conversion, or contract analysis are not optional in enterprise settings, they’re operational workflows. Today, improving model performance on these tasks often comes from trial-and-error prompt engineering, fine-tuning, or vendor switching.
What Prompt Adaptation enables is a new axis of optimization: the ability to redeploy the same workflow across models while maintaining performance and reducing cost. This decouples workload logic from model-specific quirks, much as containerization decoupled app logic from infrastructure quirks. In internal benchmarks, Prompt Adaptation has yielded performance improvements ranging from 5% to 60% on enterprise tasks. Time-to-deployment drops from weeks to hours. This changes the economics of experimentation—it reduces the switching cost across models and allows companies to exploit price/performance arbitrage as the model market evolves. Not Diamond reduces the friction of change.
We believe this is a precursor to the commoditization of model APIs and the rise of orchestration-as-strategy.
From Horizontal Tools to AI Assembly Lines
Today’s AI tooling either aims too high (autonomous agents) or too low (helper scripts and wrappers). We believe Not Diamond strikes the right balance: infrastructure that assembles other AI systems. The dual components—routing and prompt adaptation—enable modular workflows.
Enterprises can now compose, test, and optimize across different models, providers, and input formats. In this way, Not Diamond becomes a strategic layer not because it’s smarter than the models it routes, but because it makes them swappable.
The Defensibility of Invisible Infrastructure and the “Meta-Stack”
Not Diamond's vision and first-mover advantage in enterprise multi-model AI orchestration are matched only by their technical execution.
First, the real challenge wasn’t building a prompt adaptation system. It was making one that works reliably across dozens of idiosyncratic model APIs and use cases, at scale and with measurable impact. That requires data, tuning infrastructure, and partnerships—everything Tomás, Tze-Yang, Jeffrey and the team brought together in record time.
Second is market timing. As of mid-2025, multi-model frameworks have become real enough to drive infrastructure demand, in the part of the S-curve where orchestration moves beyond an optimization to a necessity. Not Diamond is ahead not because of a single insight, but because they built for this inevitability while others are still framing it as edge-case complexity.
We’ve seen this before: computing environments diversify, tooling lags, then someone builds a meta-stack—a control plane that abstracts, optimizes, and arbitrates across lower layers. In cloud computing, it was Kubernetes. In data engineering, it was dbt and Airflow. In AI, we believe it could be Not Diamond.
In investing in Not Diamond alongside SAP.iO Fund, IBM and others, we are not betting on a specific model or modality. We are betting on entropy—and the infrastructure needed to turn it into leverage.