Trajectory Analysis
"Sometimes even though the model reaches the correct answer, it does so in all these crazy ways... Sometimes you want models to get to the correct answer by reflecting on what it did. Sometimes you want it to get at the correct answer by just one-shotting it. And if you ignore all of that, it's just like... missing a lot of the information that you could be teaching a model to do." - Edwin Chen
What It Is
Trajectory Analysis is the practice of evaluating not just outcomes but the entire path taken to reach them. In AI training, this means examining every step a model takes—not just whether it arrived at the right answer. The same principle applies to evaluating teams, products, and processes: how you achieve results contains crucial information that outcome-only measurement misses.
A model (or person, or team) might reach the correct destination through luck, brute force, or elegant reasoning. Each path tells you something different about underlying capability and future reliability.
How It Works
The Hidden Information in Trajectories: When you only evaluate outcomes, you miss:
- Efficiency - Did it take 3 steps or 50 random attempts?
- Reliability - Was success reproducible or lucky?
- Reasoning quality - Did it understand why the answer was right?
- Learning signals - What intermediate failures revealed capability gaps?
The Reward-Hacking Problem: Models (and people) can appear successful while gaming the system:
"Sometimes [the model] just does things very inefficiently or it almost reward-hacks a way to get at the correct answer... It may have tried 50 different times and failed, but eventually it just kind of randomly lands on a correct number."
A pure outcome focus would reward this; trajectory analysis exposes it.
Long Trajectories Amplify the Problem: "If all you're doing is checking whether or not the model reaches the final answer, it's like there's all this information about how the model behaved in the intermediate step that's missing." The longer and more complex the task, the more information lives in the journey.
How to Apply It
Log the journey - Capture not just final outcomes but intermediate steps, decision points, and pivots along the way.
Distinguish efficiency from effectiveness - Reaching a goal inefficiently may indicate fragile capability that won't generalize.
Value process quality - Did they get there through sound reasoning or trial-and-error? Both matter differently for different contexts.
Look for trajectory patterns - Consistent patterns across multiple tasks reveal underlying capability (or its absence) better than isolated outcomes.
Design for trajectory observation - Structure work to make the path visible, not just the destination.
When to Use It
- Evaluating AI model performance
- Assessing team or individual capability
- Understanding why experiments succeeded or failed
- Designing evaluation frameworks for complex tasks
- Post-mortems and retrospectives
Applications Beyond AI
The principle extends to any domain where how you achieve results matters:
Product Development: A team that ships successfully through heroic crunch teaches you something different than a team that ships through excellent planning. Both reached the goal; the trajectories predict different future outcomes.
Learning and Skill Development: Someone who passes a test through deep understanding vs. memorization has developed different capabilities, even with identical scores.
Business Metrics: Revenue achieved through sustainable growth vs. one-time windfalls vs. unsustainable discounting—same number, very different trajectories.
Source
- Guest: Edwin Chen
- Episode: "The $1B AI company training ChatGPT, Claude & Gemini on the path to responsible AGI"
- Key Discussion: (00:39:55 - 00:41:03) - Why paying attention to trajectories in AI training is critical
- YouTube: Watch on YouTube
Related Frameworks
- RL Environments for Learning - Simulations that enable trajectory observation
- Shortening Feedback Loops - Find intermediate signals to learn faster
- Series of Small Decisions - Success is compound result of many small decisions