Evals as PRD
"If the model is the product, then the eval is the product requirement document." - Brendan Foody
What It Is
Evals (evaluations) are the AI equivalent of product requirement documents. Just as PRDs define what success looks like for a software product, evals define what success looks like for an AI model. This framework reframes AI development through a product lens: labs and AI companies need clear, measurable definitions of desired capabilities before they can effectively improve their models.
The eval serves multiple purposes simultaneously: it's the specification document that tells researchers what to build, the benchmark that measures progress, and the sales collateral that demonstrates capabilities to customers. As Brendan puts it, "Evals are what you give to researchers to show them what they should be building, but they're also the way that you demonstrate the efficacy of capabilities."
How It Works
An eval consists of:
- Success criteria - A rubric or test that defines what "good" looks like for a specific capability
- Domain expertise - Expert knowledge encoded into the measurement (e.g., a lawyer defining what good contract redlining looks like)
- Scalable verification - A way for AI to apply the criteria repeatedly, enabling reinforcement learning
The eval functions at two levels:
- As benchmark: Measures whether the model has the capability
- As training signal: Rewards model trajectories that achieve the capability (RLHF → RLAIF)
How to Apply It
For AI companies and enterprises adopting AI:
- Identify your value chain - What are the core outputs your business produces?
- Define the rubric - What does excellent look like? What criteria distinguish good from great?
- Make it measurable - Create systematic ways to score outputs against your rubric
- Iterate on the eval - The quality of your eval determines the quality of your AI application
For product teams working with AI:
- Write the eval before building - Like working backwards from a PRD, define success first
- Engage domain experts - The eval quality depends on capturing expert judgment
- Use evals as specs - Share evals with AI teams/vendors as your requirements
- Treat evals as marketing - Your eval demonstrates your understanding of quality
When to Use It
- When deploying AI to automate any workflow
- When evaluating AI vendors or models for your use case
- When building AI-powered features
- When trying to improve AI performance in a specific domain
- When communicating AI requirements to technical teams
Source
- Guest: Brendan Foody
- Episode: "Why experts writing AI evals is creating the fastest-growing companies in history"
- Key Discussion: (00:06:39) - The core framing of evals as PRDs
- Additional Context: (00:07:39) - How enterprises should think about evals
- YouTube: Watch on YouTube
Related Frameworks
- Working Backwards / PR-FAQ - Start with desired outcomes
- Problem-First Approach - Define the problem before the solution