Taste in Post-Training

"There's an art to post training. It's not purely a science. When you are deciding what kind of model you're trying to create and what it's good at, there's this notion of taste and sophistication." - Edwin Chen

What It Is

Taste in Post-Training is the recognition that training AI models involves countless subjective decisions that reflect the values, priorities, and aesthetic sensibilities of the teams making them. These choices—about what data to include, what behaviors to reward, what quality standards to enforce—are not purely technical decisions. They require taste, judgment, and a clear vision of what the model should become.

This framework explains why different AI models from different labs develop distinct "personalities" and capabilities, even when using similar architectures. The values of the companies and the taste of their researchers get encoded into the models themselves.

How It Works

The Infinite Choice Problem: Post-training involves countless decisions with no objectively "right" answer:

Human data vs. synthetic data ratios
Which capabilities to prioritize (coding vs. writing vs. reasoning)
What "quality" means for different outputs
Whether to optimize for benchmarks vs. real-world tasks
Visual design preferences for generated content
Tone and personality of responses

The Poetry Example: When training a model to write poetry, a low-taste approach checks boxes: "Is this a poem? Does it contain eight lines? Does it contain the word, moon?"

A high-taste approach asks: "Is this Nobel Prize-winning poetry? Is it full of subtle imagery? Does it surprise you and target your heart? Does it teach you something about the nature of moonlight?"

Taste Propagation: The taste of the people designing the training data, writing the rubrics, and evaluating outputs shapes what the model learns to produce. "Certain frontier labs, the ones with more taste and sophistication, they will realize that [quality] doesn't reduce to this six set of checkboxes and they'll consider all of these kind of implicit, very subtle qualities instead."

How to Apply It

Define quality deeply - Go beyond checkbox compliance. For any output type, articulate what "excellent" looks like in nuanced, multidimensional terms.
Hire for taste - The people designing training data and evaluations need sophisticated judgment, not just technical skills. "Types of people who could literally spend 10 hours digging through a dataset, and playing around with models."
Choose your trade-offs explicitly - Acknowledge that optimizing for benchmarks may hurt real-world performance. Decide which matters more.
Think like a product designer - What do you want users to experience? What emotions should the model evoke? What behaviors should it encourage or discourage?
Resist the metrics trap - Easy-to-measure metrics (like benchmark scores) can crowd out harder-to-measure but more important qualities.

When to Use It

When designing AI training data and evaluation criteria
When making product decisions about AI behavior and personality
When evaluating AI output quality
When building any product where "quality" is subjective and multidimensional

The Differentiation Effect

This framework predicts increasing differentiation between AI models over time:

"Over the past year, I've realized that the values that the companies have will shape the model... In the same way that when Google builds a search engine, it's very different from how Facebook would build a search engine, which is very different from how Apple would build a search engine. They all have their own principles and values and things that they're trying to achieve in the world that shape all the products that they're going to build. And in the same way, all the [AI labs] will start behaving very differently too."

Source

Guest: Edwin Chen
Episode: "The $1B AI company training ChatGPT, Claude & Gemini on the path to responsible AGI"
Key Discussion: (00:15:31 - 00:17:09) - The art vs. science of post-training and how taste shapes model capabilities
YouTube: Watch on YouTube

Related Frameworks

Design Tenets - Decision-making tools that resolve recurring debates
Opinionated Software Design - Building products with baked-in best practices
Culture is Product - Companies build two products—one for customers and one for teams