Celebration of Errors (CoE) is a practice that can turn individual failures into business success, and more importantly: a psychologically safe workplace. When you create a safe psychological environment for employees, you set yourself up for business success, by reducing problem avoidance, accelerating trouble shooting, and increasing innovation. Taking this approach to errors demonstrates a leader’s acceptance that people need to make mistakes in order to improve so that your business can achieve ever-greater goals.
Ward Vuillemot, CTO and CPO of Real Self, brought the CoE practice to his team on his very first day in the role. At first, people found it annoying and even awkward, but now they seek out the opportunity to take this approach to learning from mistakes. Vuillemot didn’t state his plan to build a psychologically safe team, but now, over a year later, he hears it from employees on a regular basis, and sees the results in reduced error rates and shortened resolution periods.
Processes Over People
This approach might sound counterintuitive from a people-first leader or culture. But the point is that people do not make intentional mistakes. With very few exceptions, everyone wants to succeed and satisfy their teammates, managers, and customers. And so when mistakes happen, the priority is fixing the process rather than trying to ‘fix’ the person involved.
The CoE process – originally known as a Correction of Error process at Amazon – is a way to deeply understand errors without blaming or threatening anyone involved. By taking this impersonal approach, it’s much easier to get to the true root cause and find a solution. Further, employees are more likely to come forward when they find mistakes if they know they won’t be embarrassed or demoted.
The First Step Toward A CoE: Remove the blame
When mistakes happen, there is often a reflexive drive to blame someone. But blame leads to shame, which is not an optimal state for learning. No lessons are learned and mistakes will recur. Thus employees fear repercussions and attempt to hide their errors, missing the opportunity to resolve the error, learning from it, and co-creating a new process that ensures the same error won’t happen again.
Creating a blame-free environment allows people to be open with themselves and others so they can own their inevitable failures and accept the need to repair them. A former colleague of Ward’s, an alum of Spotify, Adobe, Microsoft, and now CTO in his own right, Kevin Goldsmith, says “Figure out how to fail effectively is a superpower at organizations I've been at. Versus others that haven’t, and are still punishing failure. It really destroys all innovation.”
The CoE Template
A CoE is an actual document that a team member or small group writes up, before or after a team conversation about the incident. There are three parts to the document, which in total need only be a page or two in length. It should be kept on record for future reference. At Real Self, Ward and his team connect the action items from each CoE to their internal tracking systems to ensure accountability. In his words, “We say what we do, and we do what we say.”
1. Impact of the Error
The first step is to address factually – not emotionally – what the impact of the error was on the business, and also on the customer. This portion of a CoE should be 3-5 sentences at most, and can be sent out to the entire company as an update on the incident.
Ward warns of the “common mistake to conflate why it happened with why it matters. People tend to be very long-winded and write volumes here.” The language should be deductive, based on specific facts and information, not inductive, or trying to draw conclusions or explain what happened. The process of writing this section also helps employees learn to report out factually and concisely, which can be helpful in other written or verbal presenting situations.
Even more important than teaching employees to create executive summaries, the CoE helps build empathy for customers and the business itself. For employees who don’t interact regularly with the end user, or the business’s profitability, it’s valuable to recognize the impact of their work – and eventual mistakes on these stakeholder groups.
Ultimately, this section of a CoE describes why the error matters (impact on customers and the business), not why it happened.
2. Resolutions
This is the most important section of the CoE document. The team gets involved for a brainstorming session, and together, identifies a short- and long-term resolution. In the short-term, the fire has to be put out, whatever that means in a certain case, for example, getting the site back up and running. Ward also encourages his team to look for ways to ensure that a system, not a human, observes future failure. He explains: “Finding mistakes is a waste of people’s time, since computers can do it infinitely better.”
In the long-term, the resolution will ensure that this same problem doesn’t happen again. Long-term solutions generally involve process or policy changes. That said, Ward emphasizes that it’s important to avoid talking about lofty solutions like, “we should re-architect the system.” In his words, “Either re-architect the system (put it into plan) or don't talk about it.”
Ward tells his teams, “ALWAYS be the first to the fire. It's never good to be told your area is on fire by another team - it erodes trust.” And so the goal of the Resolution section of the CoE is to set up systems that enable the team to 1) instrument, 2) monitor, and 3) alert when they detect an issue. He explains, “Sometimes we cannot ensure a fire will happen again, but we can ensure that 1) we alerted quickly, and 2) we were alerted by a tireless bot rather than a human.”
3. Root Cause
Ward calls this portion the ‘show your work’ section of the CoE report, and references it only briefly unless he doesn’t find the Resolutions to be credible. The idea here is to get to the bottom of what really went wrong. It can be easy to find a bandaid that will resolve the problem for now, without really digging into the underlying cause.
The 5-Why’s game is an excellent tool to ensure the team really addresses every aspect connected to the actual incident. This is as simple as asking ‘why’ something happened, then asking why that was the case, and so on. Dig deeper at least five times, so you really get at why the mistake occurred, and can remedy that root cause.
A practical example is a customer delivery that was received late. The answer to the first “why” might be as simple as bad weather along the route. The second “why” would reveal that the new shipping partner is ground-only and so can’t avoid even minor weather delays. Asking “why” again would show that the new shipping partner is part of cost savings to increase margin. And answering “why” once more would bring up the fact that customer expectations for shipping haven’t been adjusted on the site to reflect the new vendor’s capacity.
This is a proper cause-and-effect analysis, pointing to the actual areas needing improvement. Perhaps customer expectations for shipping times need to be adjusted based on the new vendor, up-charges for faster shipping could be offered, or a new vendor should be explored to increase shipping times. By digging deep with the five “why’s,” the team can identify the actual root cause and put measures in place to fix the problem at that level. This approach will resolve the main problem as well as preventing it from recurring.
When To Do A CoE
Ward uses CoEs in three cases:
- If you’ve seen the same mistake occur a few times over.
- If it’s a really bad mistake that you can’t afford to have happen again.
- If it’s a basic mistake that the team shouldn’t be making any longer.
In any of these cases, it is helpful to explicitly address the mistake, identify the root cause, and get the team to co-create a solution to avoid it going forward.
The final example is drawn from a lesson that Ward learned skiing with his father. His father told him that if you’re not falling, you’re probably not skiing hard enough! So celebrate the falls, and ensure you’re always seeking out more challenging terrain that leads you to fall. Eventually you’ll notice that you graduate from green circle beginner trails, to blue square intermediates, to black diamonds and eventually back country tree skiing! The same goes for a team: by learning from mistakes and fixing them at the root cause level, you have to try new, harder, better ways to bump into mistakes and further the team’s learning.
Outsized Return On Investment
CoEs build the psychological safety required for an innovative team, which is in itself a powerful payoff. But the practice also has other learning benefits for employees, including how to:
- summarize events for leaders across the organization;
- have empathy for the customer and business bottom line, even among those employees who aren’t directly involved with users or sales and finance;
- be responsible and accountable to the business beyond your immediate tasks.
Celebrating errors with the CoE practice requires a mindset-shift, recognizing that people are almost doing their best, and blame for errors should be assigned to a process (or lack thereof). CoEs focus on the error at a process level, and collaborating to design constructive steps to fixing the inevitable errors in a growing business.
For fast-growing companies that will inevitably make errors, as well as people-first companies who want to empower employees to take initiative, CoEs are a powerful way to build the psychological safety and accountability required.
Hear more wisdom from Ward as he shares his experience of 'going first', as part of his deep and authentic commitment to leading purposefully. Learn more about my new book, Going First: Find the Courage to lead Purposefully and Inspire Action, here.