Lenny's Podcast · September 25, 2025
Why AI evals are the hottest new skill for product builders | Hamel Husain
with Hamel Husain & Shreya Shankar
Learn why AI evaluation skills are becoming critical for product builders working with LLMs and generative AI systems. Hamel Husain and Shreya Shankar reveal practical frameworks for measuring LLM performance, including error analysis through open coding, LLM-as-judge evaluation techniques, and how to build reliable AI evals that scale. Discover why traditional product requirements are being replaced by evaluation prompts, and get actionable techniques for systematic data analysis that takes just 30 minutes per week to maintain. Essential listening for anyone building AI-powered products who wants to move beyond guesswork to systematic quality measurement.
Featured insight
Evals are the new PRDs - your LLM judge prompts become detailed product requirements that run automatically and constantly. They specify exactly how your AI should behave in different scenarios, derived from actual user data rather than hypothetical requirements. — Lenny Rachitsky
Best for: AI Product Managers building LLM-powered features, Technical Founders scaling generative AI products, ML Engineers responsible for production AI systems
Loading the full episode…