How to write the JD, screen for real ability versus resume keywords, run the technical interview without being technical, and avoid the three patterns that look great on paper but fail in practice.
Hiring your first ML engineer is one of the highest-stakes decisions you’ll make as an AI-forward company. Get it right and you have someone who can translate between the technical and business worlds, make realistic promises, and ship something that actually works. Get it wrong and you end up with a very expensive experiment that doesn’t produce results.
I’ve been on both sides of this. I’ve helped founders hire ML engineers, and I’ve been hired as one. Here’s what I’d do if I were in your seat.
Most ML engineer JDs are written by founders who copy from other JDs, or by HR teams who treat ML like any other software engineering role. The result is a list of framework keywords that doesn’t actually describe what the person will be doing.
Start with the work, not the credentials.
Write a paragraph that describes a typical week in the role. What problems will this person be solving? What data will they be working with? Who will they collaborate with? What does success look like in the first six months?
Then work backward to requirements. If the work is fine-tuning transformer models on your domain data, “experience with PEFT or LoRA methods” is a meaningful requirement. If the work is primarily building data pipelines and evaluation frameworks, “experience with distributed training” is not — and listing it will filter out the right candidates.
A few things to avoid in the JD:
Don’t list every framework in existence. If you ask for experience in PyTorch, TensorFlow, JAX, Keras, scikit-learn, Spark, dbt, and Kubernetes, you’re not describing a role — you’re fishing. Serious candidates read that list and conclude you don’t know what you’re building.
Don’t conflate ML engineer, data scientist, and AI researcher. They’re different jobs. An ML engineer builds and ships systems. A data scientist analyzes data and builds models for internal insight. An AI researcher advances the state of the art. Most early-stage companies need an ML engineer. Very few need a researcher.
Don’t require a PhD unless you need a PhD. The honest signal a PhD sends is “this person can read papers and formulate novel approaches to unsolved problems.” For most production ML work, strong engineering judgment and experience with real data is more valuable. PhD requirements narrow your pool and don’t necessarily improve the signal.
Resumes are nearly useless for ML engineer screening. A resume that lists GPT, fine-tuning, RAG, and LangChain tells you that the person knows what words to put on a resume in 2026 — nothing more.
There are two things I look for at the resume/portfolio stage:
Evidence of shipped systems. Not research papers, not Kaggle competitions, not personal projects that are one notebook on GitHub. Evidence that this person has taken something from an idea to production. That means a model that serves real traffic, a pipeline that processes real data, a system that a real user depends on. The artifact matters less than the story: “I built X, it does Y, Z people or processes depend on it.”
Evidence of evaluation discipline. A senior ML engineer will always have thought about how to measure whether their system works. In their portfolio, they’ll describe accuracy metrics, eval sets, offline vs. online evaluation strategies. If someone can’t describe how they evaluated their work, they probably haven’t done serious production ML.
For the initial screen, I’d have a quick phone call focused on one question: “Tell me about the most complex ML system you’ve built and shipped. What was the hardest part, and how did you know it was working?” Listen for specificity, honest acknowledgment of what went wrong, and a clear description of the evaluation methodology.
This is the part that trips founders up. You’re not a ML engineer — how do you evaluate one?
The answer is: don’t try to evaluate technical depth yourself. Design the interview to reveal competence without requiring you to be a domain expert.
Have a technical advisor do the deep technical screen. If you don’t have one in-house, bring in someone from your network for a few hours. Pay them for their time. This is worth it. The cost of a bad hire far exceeds the cost of a few hours of advisory time.
Ask for a work sample. Give candidates a take-home problem that mirrors your actual work. Not a whiteboard algorithm question — a problem with real (anonymized) data from your context. Ask them to document their approach, show their code, describe their evaluation, and explain their results. This reveals how they think about ambiguous problems, how they communicate their reasoning, and whether their instincts are calibrated.
Evaluate the non-technical signals yourself. In your interview round, focus on things you can assess without domain expertise:
There are three profiles that interview well and disappoint in practice. Learn to recognize them.
The researcher who can’t ship. Strong paper, impressive citations, maybe a PhD from a great program. Their technical knowledge is real. But they’ve spent their career optimizing for academic outcomes, not product outcomes. They measure success in SOTA benchmarks, not in whether the system works for your specific users. When they hit a real-world constraint — your data is messy, your compute is limited, your deadline is fixed — they don’t know how to adapt.
How to spot it: Ask about the gap between a model’s benchmark performance and its production performance. Ask what they’d do if the best-performing approach was too slow to deploy. If they’ve never had to make those tradeoffs, they haven’t built production systems.
The demo builder. They’ve built impressive demos. Their GitHub is full of projects. They know every new tool in the ecosystem. But demos and production systems are different things. Demos don’t handle edge cases, don’t degrade gracefully, and don’t have to stay running when you’re not watching.
How to spot it: Ask about the last system they maintained in production for more than six months. Ask what broke and how they fixed it. Maintenance experience reveals whether they’ve actually lived with a system’s consequences.
The “AI” person. They know a lot about AI as a field. They read the papers, they follow the discourse, they can discuss GPT-4 vs. Claude vs. Gemini in depth. But knowing about AI and being able to build AI systems are different skills. The former is strategic knowledge; the latter is engineering craft.
How to spot it: The work sample is the best filter. Ask them to build something, and see how they approach a real problem with real constraints.
The right first ML hire for most startups is someone who can do three things: build data pipelines, train and evaluate models, and deploy and monitor production systems. That’s an unusual combination. It maps most closely to the “ML engineer” title at a tech company, or a strong senior data scientist who has done production work.
You probably don’t need a specialist yet. You need a generalist who can move fast and adapt.
If you’re in the process of defining this role and want a gut check on your JD or interview process, book a free intro call. I’ve made this mistake on both sides and I’m happy to save you a few months.
Book a free intro call. No pitch — just a direct conversation about where AI fits (or doesn’t) in your business right now.
Book a free intro call