Five questions that will tell you whether a vendor is selling real engineering or a wrapper around a model anyone can use. Plus the answers that should end the conversation.
The AI vendor landscape right now is genuinely hard to navigate. There are real companies doing real engineering, and there are companies that are one API call away from being a spreadsheet formula. From the outside, they often look identical.
I get asked about this constantly. Founders send me vendor decks. They paste pitch emails into our DMs. The question is always some version of: “Is this real?”
Here are five questions I’d ask every AI vendor before signing a contract. You don’t need a technical background to ask them. You just need to listen carefully to the answers.
This is the first question I ask, and the answer tells me a lot.
Real engineering shops know exactly what models they’re running. They can tell you whether they’re using a foundation model (GPT-4o, Claude 3.5, Gemini 1.5) or something they’ve fine-tuned themselves. They have an opinion about why they made that choice. They’ve thought about what happens when the next version drops.
The bad answer: vague language about “proprietary AI” or “advanced models” without specifics. That’s not a moat. That’s marketing. Any company with an API key can say their product is powered by advanced AI.
The good answer: specific model names, a clear reason for the choice (“we fine-tuned Llama 3 on our specific vertical because the domain language is too specialized for a general model”), and an explicit policy for how they handle model updates (“we validate new models against our benchmark suite before upgrading”).
If they can’t answer this question clearly, they’re a wrapper. Which might still be fine for your use case — but you should know that going in, and you should be paying wrapper prices, not engineering prices.
This is the question that separates real ML engineers from people who set up an API connection and called it a product.
Every serious AI system has an evaluation framework. They have test sets. They have metrics they track over time. They can tell you what their accuracy is on a held-out benchmark, and they can tell you what happens to that number when they make a change.
The bad answer: “Our customers tell us when it’s wrong” or “We monitor for errors.” That’s not evaluation — that’s waiting for failure reports. It means they have no systematic way to know whether the system is improving or degrading.
The good answer: specific metrics (“we track precision and recall on our classification task, and we run a regression test against 500 held-out examples before every deployment”), a description of how they build and maintain that test set, and ideally a willingness to share benchmark results with you.
If they can’t describe their evaluation process, they can’t tell you whether their product actually works. That’s a serious problem.
Every AI system makes mistakes. The question is not whether yours will — it will. The question is whether the vendor has thought seriously about what happens when it does.
There are three things to look for here: failure mode design, observability, and escalation paths.
Failure mode design: Does the system know when it’s uncertain? Can it abstain or flag low-confidence outputs for human review? Or does it always output something, even when it’s guessing?
Observability: Can you see what the system is doing? Are there logs you can access? Can you audit a decision after the fact?
Escalation paths: When something goes wrong at 2am, who do you call? What’s the SLA? What does remediation look like?
The bad answer: “We have very high accuracy” (without defining the metric or the benchmark), followed by vague reassurances about reliability.
The good answer: a clear description of how the system signals uncertainty, access to logs and audit trails, and a specific, contractual escalation process.
This is not a technical question, but it’s load-bearing.
With AI systems, data ownership questions have real consequences. If you’re feeding customer data, proprietary processes, or regulated information into a vendor’s platform, you need to know:
The bad answer: buried GDPR language in the terms of service that you’re expected to read yourself.
The good answer: clear, direct answers to all four questions above, without hesitation, in plain English.
If a vendor can’t answer these questions clearly in a sales call, they haven’t thought about them carefully. That means your legal and compliance team will be figuring it out retroactively, after you’re already locked in.
This is the oldest sales due diligence question in the world, and it applies equally well to AI vendors.
A reference customer who is running the system in production — not in a pilot, not in a POC, but actually in production with real volume — is the most credible evidence a vendor can give you. Their story will include the things the sales deck leaves out: the integration challenges, the accuracy in the real world (not on the demo data), the support experience when something broke.
The bad answer: a case study PDF. Those are written by the vendor.
The also-bad answer: “We’d be happy to connect you with some customers” — and then they never do, or the reference they offer is a free-trial user who has been live for two weeks.
The good answer: a specific name, a warm introduction, and a call that the vendor isn’t on. You want to ask the customer “what’s the one thing you wish you’d known before you signed?”
If you step back, these five questions are all versions of the same question: Has this vendor actually shipped this in production, at scale, and do they understand what they built?
A wrapper shop can’t answer questions 1 and 2. A company that hasn’t thought about risk can’t answer question 3. A company that treats data as an asset rather than a liability can’t answer question 4. And a company without a real customer base can’t answer question 5.
That’s not to say every wrapper is bad. Sometimes a well-designed wrapper is exactly what you need, and paying for someone else’s integration work is worth the premium. But you should make that choice with your eyes open.
If you’re in the middle of evaluating vendors and you want a second opinion, book a free intro call. Fifteen minutes, and I can usually tell you which category a vendor falls into after reading their deck.
Book a free intro call. No pitch — just a direct conversation about where AI fits (or doesn’t) in your business right now.
Book a free intro call