Failure Models and NUMB3RS. How do you get from backwards look to real time learning?

20130929-133849.jpgIf you’ve seen the television show NUMB3RS you may have the impression that it is possible to mathematically predict complex situations. Gather some NUMB3RS (data, information etc), apply a relevant mathematical model, calculate the next occurrence of the event and catch the perpetuator in the act. All you need is the application of some fiendishly clever mathematics, and a genius with a chalkboard.

Brilliant! I want to apply some of this method to learning from failure.

Recent posts have been about, ‘how I learnt from failure’ and how some organisations like Honda are successfully using failure to innovate. Extending my logic I thought that; if I can better understand how failure works, there may be away of approaching it so that I can extract the helpful learning (and make positive changes), before things go horribly wrong. Simple, just like in NUMB3RS.

The result has been a bit of bewilderment on my part. There’s a lot of mathematics about failure on the internet. There are even some ‘chalkboard’ videos, but I’m no Dr Sheldon Cooper (although my wife claims I have behavioural similarities) and I’ve struggled. There is good news though, I’ve found three other useful approaches: Swiss Cheese, The Incident Pit and Timeline of Inevitable Failure (sounds like an Indiana Jones film).

The Swiss Cheese Model of Failure in Systems. The basic idea here is that human systems are like slices of swiss cheese stacked side by side. Slices of Swiss cheese typically have holes in them. In the model the holes represent points of failure. The failures can be latent (built into the system and can remain dormant for a long time) and active failures, which are most often unsafe acts by humans.

20130926-224131.jpgFor the Swiss cheese model to work, the latent and active failure holes line up between the slices of cheese and a catastrophic failure occurs. The model has been used widely in aviation and healthcare and seems to be a useful way of illustrating how apparently unrelated, often small errors, in different parts of a system can join up to create a major failure.

The Duke University Medical Centre has a good explanation of the Swiss Cheese Model (picture source) on their Patient Safety – Quality Improvement pages, along with a good general section on ‘Error’. If you fancy a technical appraisal, this 2006 report, ‘Revisiting the Swiss Cheese Model of Accidents’, from the European Organisation for the Safety of Air Navigation is more interesting than you might imagine. Thank you to Failure Dynamics for letting me know about this model.

The Incident Pit. This represents my first formal encounter with failure and risk management. The model goes back to 1973 when the concept was presented at the British Sub Aqua Club Diving Officers Conference. It was developed from the evidence gathered from numerous diving accidents (including fatalities) and was part of the diver training I went through in the 1980’s.

20130926-224140.jpgKey point is that due to the hazardous nature of underwater swimming (diving) minor incidents are always happening; caused by the environment, equipment or humans. The ideas is that you are constantly aware, so that you recognise when these minor incidents are occurring and rectify then before they escalate into something more serious. Taking action early (learning from small failures) prevents you ending up dead at the bottom of the the incident pit. There is something in this approach which fits in with my objective of learning early from failure.

This was highly effective as a learning device back in the 1980’s and to be honest, it was at the front of my mind when I abandoned my sea swim in August 2013. This example from Swanage Coast Guard suggests that is still very much in use.

The Timeline of Inevitable Failure. I must thank Matt from Complex Care Wales for sharing this model he has developed.  I hope Matt will publishing some more detail about this soon. (larger scale version at the end of the post)

20130926-224123.jpgThe model is self explanatory, I’ve shown it to a few people who’ve grasped the ideas instantly.

The key points for me (a bit like the Incident Pit):

  • There are always errors or failures occurring in day to day processes.
  • The more complicated or complex these activities, the more they occur.

The best way to approach these is to have triggers that allow you to recognise the errors, and then you do something about them before they escalate. These are the places where you can most usefully learn, and it seems to fit in with the idea of ‘fast intelligent failure’.

Back to NUMB3RS. Well, there aren’t many numbers in these models. What they are is different graphical explanations of how the failure process works. What is useful for me is that they give an indication of where it might be possible to ‘get in early’ and do some effective learning from failure. Moving from the ‘backwards look’ towards some ‘real time’ learning. Some more on this in the next post.

So, what’s the PONT?

  1. Learning from failure as early as possible (fast intelligent failure) is preferable to a major inquiry when something big goes wrong.
  2. The graphical explanations of various failure models assist in identifying where it might be best to look for learning from failure.
  3. Numbers (or even NUMB3RS) aren’t always the answer.

Picture Sources & Useful Links:

Swiss Cheese Model: Duke University Medical Centre

The Incident Pit: Wikipedia article,

NUMB3RS picture:

Timeline of Inevitable Failure – Complex Care Wales main site:


About whatsthepont

The things I’m currently interested in are: 1.How people learn and share knowledge; 2.Social Media, Web2.0 whatever you want to call the world of the internet; 3.Better public services.

7 Responses

  1. James Reason explains a deep truth about error with the lovely, simple Swiss Cheese model. But I like Professor Charles Vincent and the way he translates that into patient safety predominantly using systems thinking. Before you can understand a failure you have to understand the context and when you understand the context, failure is never the result of the person getting blamed. Axe-welding baby killers aside! I haven’t seen the Pit before, but I’ve personally seen the consequences of good people trying to do simple things under extreme pressure. It’s easy to follow the policy on a beautifully idealised middle class, sunny day surrounded by God’s finest leadership types. Typically errors don’t happen there!

    My own approach was developed when studying SUDI. Sudden unexpected death in infancy. It’s simply not good enough, to just turn up and count the dead. If Reason’s Cheese and Vincent’s Psychology is right there had to be a way for people to consciously recognise the worm in the cheese. It’s w model has “human or independent judgement” to hint that situational awareness is absolutely key, to finding failure before it gets dangerous. In other words complex living systems will always fail, so instead of trying to make them failsafe, is much more useful to make them safe to fail. And therefore when they do fail, find it fast and fix it quick. If you can’t fix it, limit it and escalate out of normal process with resources prepared but only deployed in proximity to failure conditions. Then when critical all normal rulers of work are suspended to protect a few experienced people to experiment at the edge of chaos, where learning emerges in the shadow of serendipity. If your going to fail, you may as well do it properly and learn as much as possible in the moment. As Dave Snowden says, when you send In a crisis team, make sure the innovation team gets there before them!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s