For years, the goal of A/B testing has been simple: find a “winner” to ship. But for a company the size of Spotify, which relies on constant innovation to serve hundreds of millions of users, this approach was too narrow. As the music giant’s experimentation platform, Confidence, scaled to support hundreds of teams—growing from about 40 regularly experimenting teams in 2018 to nearly 300 by the end of 2022—their focus evolved from merely boosting velocity to maximizing experiment quality and learning.
Rethinking Experiment Success
Spotify realized that true value often lies not in what works, but in understanding what doesn’t. As they stated, “A successful experiment yields enough valid information to inform product decisions, not just those that find ‘winners.’” This broader definition led to the development of the Experiments with Learning (EwL) framework. This metric redefines a successful experiment as one that is valid (correctly implemented) and decision-ready. A decision-ready result is a clear outcome, whether it’s:
- Success: A key metric improves (a win).
- Regression detected: Metrics worsen (a fail to abort).
- Neutral but informative: No effect, but the test was powered strongly enough to confirm this.
By including regressions and conclusive neutral results as valuable “learnings,” Spotify shifted its cultural emphasis from chasing wins to mitigating risk.
The Power of Learning Rate
The difference this reframing makes is dramatic. Across Spotify R&D, the average win rate (experiments with positive results) stands at roughly 12%. However, their learning rate (EwL rate) is significantly higher, averaging 64%. This stark gap highlights the core message: most value comes from discovering what not to ship. Without this learning, the company would risk launching far more harmful or neutral changes than helpful ones.
The EwL framework is now a crucial strategic tool. It helps the platform insights team, including key contributors like Lizzie Eardley, Caroline Thordenberg, and Johan Rydberg, manage innovation and guide investment. It also addresses the very real-world challenge of testing capacity, a limitation acknowledged by figures like Mark Zuckerberg, by optimizing how testing bandwidth is used across high-traffic app surfaces.
Fostering a Learning Culture
Ultimately, the EwL metric drives continuous improvement. It signals to platform owners exactly where issues lie: is a team running too many underpowered experiments, or are there technical integration challenges in the platform? This feedback loop helps Confidence evolve, for example, by refining the sample size calculator to ensure tests have a high chance of yielding clear answers. For product development, the lesson is clear: progress is measured by learning velocity, not just by wins.
