Lenny's Podcast · September 25, 2025

Why AI evals are the hottest new skill for product builders | Hamel Husain

with Hamel Husain & Shreya Shankar

Learn why AI evaluation skills are becoming critical for product builders working with LLMs and generative AI systems. Hamel Husain and Shreya Shankar reveal practical frameworks for measuring LLM performance, including error analysis through open coding, LLM-as-judge evaluation techniques, and how to build reliable AI evals that scale. Discover why traditional product requirements are being replaced by evaluation prompts, and get actionable techniques for systematic data analysis that takes just 30 minutes per week to maintain. Essential listening for anyone building AI-powered products who wants to move beyond guesswork to systematic quality measurement.

Featured insight

Evals are the new PRDs - your LLM judge prompts become detailed product requirements that run automatically and constantly. They specify exactly how your AI should behave in different scenarios, derived from actual user data rather than hypothetical requirements. — Lenny Rachitsky

Best for: AI Product Managers building LLM-powered features, Technical Founders scaling generative AI products, ML Engineers responsible for production AI systems

Loading the full episode…