Mastering AI Evals: The missing skill in AI product management | Hamel Husain

Published at : 23 Dec 2025

In this podcast episode, Amir Rezaei interviews @hamelhusain7140 ML Engineer with 25 years of experience & Creator of AI Evals for Engineers & PMs about the concept of evals in AI, focusing on their importance in improving AI systems. Hamed explains the difference between product evals and general evals, emphasizing the need for systematic evaluation methods over vague assessments. The discussion covers error analysis, the role of vibe checks, and how to create effective evals for chatbots. Hamed also introduces the concept of using LLMs as judges in evaluations and the significance of modular pipelines in testing new models. The episode concludes with insights on the role of product managers in AI development and the importance of looking at data to drive improvements.

Hamel Husain: https://www.linkedin.com/in/hamelhusain/
Course discount 30% off: https://maven.com/parlance-labs/evals?promoCode=amir-product-founder
Spotify Episode: https://open.spotify.com/episode/3rSqc9O4RSkJplivQBma01?si=SAb3NX0JTFSfpAArj3A3Lg
Episode Post: https://product-founder.com/mastering-ai-evals-the-missing-skill-in-ai-product-management/

Chapters:
00:00 Introduction to Evals and AI Performance
02:33 Understanding Product Evals vs. General Evals
05:39 The Importance of Error Analysis in AI
08:13 Vibe Checks vs. Systematic Evaluation
10:44 Creating Effective Evals for Chatbots
13:10 Using LLMs as Judges in Evaluations
15:56 Modular Pipelines and Testing New Models
18:40 The Role of Product Managers in AI Development
21:11 Tools and Resources for Learning Evals
23:35 The Three Gulfs Model in AI Evaluation
26:11 Final Thoughts and Course Offerings


#AI #evals #erroranalysis #productmanagement, #chatbotevaluation #LLMsperformancemetri #machinelearning #systematicevaluation #dataanalysis