What to Build First with AI: A 5-Question Framework

The most common thing I hear on intro calls is some version of: “We have a bunch of AI ideas. We just don’t know where to start.”

That’s not a technology problem. It’s a prioritization problem. And it’s completely solvable with a structured approach.

I’ve run this framework on every Assessment I’ve done. It doesn’t require a PhD. It doesn’t require a data science team. It requires about two hours with your leadership team and a willingness to be honest about what your business actually does.

Here are the five questions.

Question 1: Where does a person make a judgment call more than ten times a day?

Most people start AI prioritization by asking “what can AI do?” That’s the wrong direction. Start with your operations instead.

Walk through your business process. Find the places where a human being is making a repeated, structured judgment: flagging a support ticket, scoring a lead, categorizing a document, deciding whether an order is fraudulent. Not creative judgment — routine judgment. The kind of thing where the person doing it wishes they could be doing something else.

Those spots are candidates. Everything else is noise.

The reason this question works is that AI systems are learning functions. They learn to replicate patterns. If the pattern is being made ten times a day by a person, you have training data, you have a clear success condition, and you have a meaningful volume of work to automate. If someone makes that call once a week with deep contextual reasoning, you’re not building AI — you’re building a very expensive consultant.

A company I worked with last year had a three-person team reviewing incoming applications for a grant program. Each reviewer was reading the same five fields and making a go/no-go decision. Two hundred applications a week. That’s a textbook AI use case. We had a working classifier in eight weeks.

Question 2: What does a wrong answer actually cost?

This is where founders get surprised. Not every automation is worth building, even if it’s technically easy.

There are two types of cost to think about:

Direct cost. If the AI gets it wrong, what happens? A misclassified support ticket gets routed wrong — that’s a frustrated customer and a support rep doing cleanup. A wrongly flagged fraud transaction blocks a legitimate purchase — that might cost a sale and a customer relationship. A missed safety issue in a manufacturing process could be a liability event.

Recovery cost. How hard is it to catch and fix the error? A routing mistake is easy to fix. A wrong recommendation that a user acted on is harder. A compliance failure is very hard.

You want to build in spaces where the error cost is low OR the error rate you can achieve is low enough that the total error cost is still acceptable. I usually sketch this as a 2x2: error cost vs. error rate. Start in the low-low quadrant.

The point isn’t to avoid all risk. It’s to go in with your eyes open. I’ve seen companies spend six months building an AI decision system for a context where a wrong answer is a regulatory violation. That’s a bad trade.

Question 3: Do you have at least 500 labeled examples?

This is the question that kills the most ideas, and it should.

Modern foundation models (GPT-4, Claude, Gemini, the open-source equivalents) are remarkably capable out of the box. But “out of the box” means they’re generalists. To make them good at your specific task, you need fine-tuning data, evaluation data, or at least enough examples to write good prompts.

The number 500 isn’t magic. It’s a sanity check. If you can’t identify 500 examples of the thing you want the AI to do — 500 support tickets that were correctly routed, 500 documents that were correctly classified, 500 decisions that were made correctly — then you’re building a system you can’t evaluate. And a system you can’t evaluate is not a product. It’s a prayer.

If you’re below 500, that’s not the end. It means your first phase is data collection, not AI building. That’s a perfectly reasonable outcome. Some of my clients spend the first six weeks of an engagement just logging decisions that were previously made in someone’s head. That’s valuable work.

Question 4: Is this a core operation or a peripheral one?

There’s a meaningful difference between automating something that touches your core value proposition and automating a back-office process.

Automating how you generate invoices is low-risk. Automating how you underwrite a loan — if you’re a lender — is high-risk, high-leverage, and probably regulated. Automating how you write marketing copy is medium-everything.

I’m not saying don’t touch core operations. I’m saying go in with higher rigor when you do. The closer you are to the thing that makes or loses you money, the more important it is to have human review, explainability, and a rollback plan.

For a first build, I usually recommend starting one step removed from the core. Automate the triage, not the decision. Automate the summary, not the advice. Get comfortable with the technology and your ability to evaluate it before you put it in the critical path.

Question 5: If this worked perfectly, how would you know?

This is the question most teams skip, and skipping it is how you end up six months in with a system you can’t evaluate.

Before you build anything, define what success looks like in a way you can measure. Not “it helps our support team” — that’s not a measurement. More like: “Support tickets are resolved 30% faster, as measured by ticket-open duration, and CSAT scores stay flat or improve.”

If you can’t define the metric, you can’t build the evaluation set. And if you can’t build the evaluation set, you can’t tell whether the model is working or not. Which means you’re flying blind when you deploy.

This question also surfaces misaligned expectations early. I’ve been in rooms where one stakeholder thinks success means full automation and another thinks it means a decision-support tool. Those are different projects. Better to find that out now.

How to use the framework

These five questions are not a checklist — they’re a sorting function. You’re trying to find the AI idea in your pipeline that scores best across all five:

High-frequency, routine judgment calls
Low-to-moderate error cost
Enough data to evaluate
Not your most critical operation (for your first build)
Clear, measurable success criteria

The ideas that score well on all five are your starting point. The ideas that fail on question 3 or 5 are table-stakes problems to fix before you build anything.

I work through these questions with every founder I sit across from. They usually take about two hours with a leadership team. Sometimes an idea that sounded exciting dies in the first question. Sometimes a boring back-office process turns out to be the highest-leverage thing a company can automate.

The point is not to find the flashiest use case. It’s to find the one that will actually work — and to know why before you write a line of code.

If you want to run through this framework for your own business, book a free intro call. It’s fifteen minutes, no pitch, and you’ll leave with at least one prioritized candidate for a first build.