In our experience, we can regularly achieve a significant (2x, 3x, 4x) improvement in operational performance. For example, doubling output, halving delivery or development timeframes, reducing defects or scrap by 75%. This article explains what we typically find and why this magnitude of potential upside exists in most organisations - if you know where to look.
Consider a simplified breakdown of a typical production / operations system. Each one presents its own opportunities to improve performance:
This article explores each of these aspects in turn. Note that this list is not exhaustive – the intent is merely to demonstrate the vast amount of potential performance upside we see in most organisations.
There are three main opportunity areas:
a. Elimination of failure demand
Failure demand is demand that is generated by the failure of an upstream process. If we address the root cause(s) of those issues, we can ‘turn off’ this demand. Failure demand is a poor customer experience and consumes capacity to serve. In our experience, failure demand can exceed 80% in some teams. While this is predominantly a service industry concept, it is often relevant in administrative functions of any large organisation, e.g. the call centre, IT help desk and/or HR department of a manufacturing company.
A call centre analysis of a large Australian Health Insurer revealed that the third most common call was ‘I’d like to follow up on a claim I made previously’. This call type made up more than 10% of total call volumes, equivalent to nearly100,000 6-minute calls per annum. Total failure demand was 36% of all calls. By moving upstream and fixing the process so that all claims were paid within a day of receipt, we were able to practically eliminate these calls, in addition to achieving significant benefits in the claims process itself.
b. Demand migration
Channeling demand to cheaper / preferred channels – e.g. to online self-service / mobile
Our client ran a collections call centre in the US. They offered their clients 4 ways to pay outstanding debts. The cost-to-serve ranged from (near) zero (self-service) to several dollars (phone payments). Similarly, depending on the chosen payment method, the transaction cost for their client ranged from zero to several tens of dollars. The online (self-service) payment option was free for both the business and their clients, but the webpage and call scripts did not explain the cost of each option, nor was the 'free-free' option the first one presented. A simple change to improve communication and highlight the best option for the customer created win/win dynamics, shifted the choice of payment options and reduced both call volumes and handling times.
c. Smoothing of demand
This includes opportunities to smooth volumes (soften peaks and/or fill-in troughs) and/or better match workforce capacity profiles. The more ‘unevenness’ there is in demand, the harder it is for operations to achieve a good combination of service and cost through the cycle, mainly because capacity is relatively fixed in the short term. This leads to instances of over and undercapacity, which in turn (depending on the management response) drives various issues, including cost (idle capacity, or overtime), deterioration of service (backlogs, queues, shortages), inventory accumulation and overburden.
In manufacturing, this ‘smoothing’ can be (artificially) achieved by implementing a heijunka system (‘heijunka’ is the Japanese word for ‘calm meadow’) which essentially uses a combination of a finished goods inventory supermarket which acts as a low pass filter (i.e. absorbs peaks and troughs in demand) and a ‘heijunka board’ to mimic the ‘perfect customer’, releasing kanbans back into production in a controlled, level and regular manner to trigger replenishment. Responding to this levelled ‘demand signal’ (as distinct from actual demand) has numerous benefits and is the key to running a good kanban system.
In service environments, you can’t hold a buffer of finished ‘goods’ (services) as the service is delivered in real time and/or is custom made for each individual (e.g. assessing an insurance claim cannot happen before the damage occurs and the claim is lodged). However, it is still possible to apply some of the same principles and achieve similar benefits.
The concept of ‘waste elimination’ is well known in manufacturing, codified by Toyota in developing the Toyota Production System (TPS) and later branded as ‘Lean manufacturing’ by Womack Jones and Roos in ‘The Machine that Changed the World’. Taiichi Ohno, regarded as the father of TPS, talked about three wastes, one of which was the 'waste of non-value adding activities’ (‘Muda’ is the Japanese term), the sub-categories of which are easily remembered using the acronym TIMWOODS: Transport, Inventory, Motion, Waiting, Overproduction, Overprocessing, Defects (rework) and Skills (note that the other two wastes are ‘unevenness’ and ‘overburden’ as mentioned above).
At an activity level (i.e. the actual execution of the work), the critical wastes are motion, overprocessing, defects and skills (see ‘Opportunities to improve the flow of work’ for a discussion of the other wastes). Techniques like ‘5S’ can be used to improve the layout of the workstation to put higher frequency use items in defined locations closer at hand, to minimise motion and searching. When done effectively, eliminating these forms of waste will reduce the amount of time it takes to complete the work (i.e. the cycle time).
When asked to optimise a popular Sydney bar, we used Point-of-Sale (POS) data to understand the mix of drinks served and then mapped the movement required for the bartender to produce ('manufacture') each drink. We found a material number of instances where high-frequency use items were stored some distance away from ‘home base’ and a lot of items that were rarely used stored close at hand. By reorganising the storage of equipment (e.g. glasses) and materials (e.g. gin), we were able to eliminate 56 City2Surfs of bartender movement (walking) in the front bar per annum (~780km).
The equivalent wastes are probably even more common (and/or more readily accepted) in service industries, despite the fact that many people work at desks. Consider the amount of time spent navigating to and searching for files, switching between screens / applications, copying and pasting and scrolling because the information is presented (or required) in a suboptimal sequence. There’s been a lot of effort in recent years to streamline these activities via robotic process automation, but this is often just ‘automating the waste’ with trade-offs and limitations. In many cases, it would be better to reengineer the process to eliminate as much of this waste as possible and then assess the business case for automation (a big topic for another article).
Another, often overlooked, opportunity is the skill of the individual completing the work, or more specifically – skilled people doing unskilled work. It’s very common to hear people in service industries promoting process improvements where the number of handoffs is reduced – as if handoffs are inherently bad (they are not). The skill requirement for each ‘package’ of work is defined by the skill required to complete the most complex sub-component. If we bundle too much together, we almost invariably introduce a greater range of skill requirements, and it follows that the skilled people in the process will spend a greater proportion of their time doing unskilled work. Also, repeatability suffers as the work duration increases, i.e. it becomes almost impossible for an individual to maintain a consistent method, let alone to have multiple people completing the work the same way.
We completed a process walkthrough with executives of a US financial services business where we all watched an underwriter complete their part of the assessment process. At the end of nearly two hours of observations, I asked the executive team if they had observed anything that they wanted their skilled underwriter doing. The answer was ‘no’ (none of the activities observed required his level of skill).
Work (as described above) can be completed by a person, a machine, or a combination of both. First, consider an individual working an 8-hour day. There will be a proportion of that day where the person is not ‘on the tools’. This time includes meetings, interruptions, breaks, training and other similar activities. The remaining ‘on tools’ time can then be broken down into work that generates revenue for the business (or equivalent) and work that’s more internally directed, e.g. generating reports, projects, etc. (there’s an appropriate level for these activities, but go too far and we’re ‘doing business with ourselves’). Time spent on the ‘revenue generating’ work can be further broken down to ‘work performed at standard' (i.e. how long it should take to do the work following the right method) and additional time required (speed or efficiency loss). There will also be instances where the same work gets done more than once due to errors or omissions, i.e. rework.
If we consider the time represented by ‘work completed at standard’ and divide that by the paid time worked (e.g. 8 hours), then it’s unrealistic to expect productivity scores close to (or even exceeding) 100% - and yet this is what we see reported by most businesses. A figure close to 100% suggests very little upside and tends to create a management mindset where the only way to improve performance is to adopt new technology/systems to reduce work effort per unit (standard times).
A global financial services business was reporting productivity levels of ~100% (and sometimes higher). When we removed the measurement distortions, our result was lower than 50% on average and much lower for certain individual and teams. Once these losses were exposed, we were able to upskill leaders to constructively and systematically work to eliminate the root causes. Average productivity uplifts of ~20% were achieved across AU, US and UK operations.
The same dynamic is common in highly automated manufacturing processes. Proctor & Gamble ‘phase 4’ plants are the best in the 'P&G world'. Spend a couple of days in one of these plants and it’s likely you won’t even see a micro-stop – the production lines are that reliable. One of the key metrics they focus on is Overall Equipment Effectiveness (OEE), and as you would expect they achieve very high percentages – sometimes >90%. However, step outside the P&G world and people will quote similar OEE performance levels even though you’ve seen several production lines stop over the course of your short walk through their factory.
Just as there are good and bad ways to measure productivity, there are good and bad ways to use that data – particularly when it comes to assessing the people aspects of performance. Team and individual engagement will rapidly increase if this information is used in a constructive manner that develops people and builds a positive culture of highlighting and solving problems as part of daily work. The opposite occurs if you head down the path of 'command and control'.
A large Australian bank implemented a new loan origination system expecting to achieve a significant improvement in performance. A large consulting firm was part of the implementation program and deployed new metrics, visual management, a new operating rhythm and a leadership upskill program. Performance was unchanged. When we were given an opportunity to help, we introduced (at a superficial level) the same elements – new metrics, visual management, a new operating rhythm and a leadership upskill program. The result was a productivity uplift of >80% across ~600 FTE achieved over 3 months. What made the difference was a deeper understanding of the process dynamics and the 'why' underpinning the solution elements, enabling subtle design changes, nuance in how the elements were combined, and being able to transfer that understanding to frontline leaders so that they made better decisions and were more effective in developing their teams.
Often, proper measurement of productivity also reveals the profound impact of rework and poor inbound quality. In service industries, rework rates often exceed 50%, causing a blockage to the flow of work and eroding productivity. In the case of missing and/or inaccurate information, the channels by which these are subsequently obtained are often informal and unstructured (e.g. email, phone). This masks the problem (most rework is ‘under the radar’), and it means that structured feedback loops are not in place to support root cause problem solving (Plan-Do-Check-Act cycle), hence high rework levels persist indefinitely, driving costs up and blowing out turnaround times.
A Big-4 AU Bank acknowledged that some rework occurred in their business lending application process, but didn’t consider it a material opportunity. When measured properly, it was found that 95% of applications were reworked, 43% were reworked 4x or more, that the average handling time for reworked applications was >150% higher and that the time to decision was >89% longer (and more variable) than for non-reworked applications.
The utility of handoffs to break the work down into more repeatable sub-components and to match a higher proportion of the work to the skill level of the person completing the task was discussed above (handoffs can also be necessary for risk management purposes, e.g. separation of duties). However, despite the fact that handovers can be desirable, we typically find that there is substantial decoupling between each step and that the queues and time delays between steps are very significant, i.e. the handoffs are not well designed or managed. Here are two examples:
We worked with an iconic Australian sporting goods manufacturer to increase the output of their premium product. The diagram below shows the production flow for the key component - queue times are shown in red and work time in green. The graph is predominantly red, i.e. queue time accounts for the vast majority of the throughput time. This is a visual way to illustrate why it takes ~38 days to complete less than 15 minutes of work.
Linking steps to form cells was the first step towards a significant reduction in inventory levels. For example, at the back end of this process, a new cell design reduced inventory levels by >90% and compressed the throughput time from weeks to minutes. Output per day increased ~100% and output per person by more than 200%.
This is an example from personal loan origination for a big-4 AU bank. Again, you can see that the majority of the time the work is in the system, it's not being worked on. In this particular instance, the throughput time was extremely important because the longer the bank delayed saying ‘yes’ or ‘no’ to a personal loan the less likely they were to convert. This is because applicants tend to apply to more than one institution and if they beat you to ‘yes’ then you miss out and your effort is wasted. We were subsequently able to reduce the throughput time by ~80% which increased the conversion rate by ~150%.
Note these ‘red-green’ plots are a more visual version of the class Value Stream Map ‘saw tooth’ and the title of this article ‘Learning to See’ is a homage to the classic book on Value Stream Mapping by John Shook and Mike Rother (highly recommended – very manufacturing focused, but the principles apply to any end-to-end process).
These inventories (the ‘red’) build up because we tolerate an extended period of faster process steps feeding slower ones. However, we should also expect to see starvation (zero inventories and idle capacity) after the slowest (bottleneck) step, and this is almost never the case - particularly in service industries. Why? Bottlenecks do move around as product mix, staffing levels and productivity changes, but more fundamentally, it’s because the faster steps ‘pace’ to the slower one(s), there’s so much inventory in the system that there’s no accountability between steps and it’s not actively managed because people are working in their silos and to flawed KPIs such as Service Level Agreements (SLAs).
These massive queues have a variety of negative performance impacts including lower productivity (‘pacing’ and/or idle capacity), lack of bottleneck awareness and management (which needs to be understood to improve overall system capacity), exposure to defects, time delays (poor responsiveness and customer service), additional material handling, consumption of floorspace and higher levels of working capital.
Managers are trying to do the best job they can. However, they typically see performance through the lens of incomplete, distorted and/or flawed metrics (not a solid foundation for success) and then make what they consider to be optimal decisions, based on their mental model of the system and what has been successful for them in the past. They also have limited opportunities to coordinate with their peers across other parts of the organisation (e.g. to share resources, or forecast changes). The result is current state performance.
To change performance, we need to provide better information, in better forums, with a better operating cadence to promote collaboration and coordination. We also need to improve people’s understanding of the system dynamics, so that they make different decisions and then demonstrate to them (with data) that those decisions lead to better outcomes. It’s also prudent to provide some initial support (‘hold their hand’) through some more challenging periods/events as they arise, so they don’t revert to old thinking under stress and to build their confidence that the new logic is robust and superior across the spectrum of operational changes faced. The result is sustained improvement – no rational manager will revert to the old logic once they have proven to themselves that there is a better way.
One final point of note is the criticality of understanding which problems to solve first for maximum impact. Even the most rudimentary diagnostic (for example interviewing people in the process) will probably find 100+ problems / opportunities for improvement. However, we are dealing with complex systems, everything affects something else, some problems matter more than others and we have limited resources, so we must choose our focus carefully. The ingredients of the ‘secret sauce’ are generally isolating the highest impact areas and then combining several solution elements to create dynamics that shift the performance of the entire system.
As a graduate engineer, I worked for Robert Bosch making automotive electronics and worked out a way to reduce the cycle time for the surface mount (placement) machine on a particular variant by ~25%. What was the impact of this improvement? If the placement machine was not the bottleneck, then I would have made a faster (automated machine) slightly faster, which would have simply resulted in it waiting longer each cycle for the slower machine up or down stream, i.e. there would have been no benefit. However, this was the bottleneck (which was one reason I was focused on it) and it was still the bottleneck after the improvement. The other key factor was this was the highest volume variant of that product – it consumed ~50% of the line’s capacity (the other reason I was looking at it). Therefore, the overall impact on the line was a >12% capacity uplift (50% x 25%).
The intent of this article was to cover some of the typical improvement opportunities we see to evidence the headline claim that significant improvement (2x, 3x, 4x) are often possible, simply through changes to processes, enhancements to measurement and/or uplifting operational maturity and the capability of management. The other evidence is the results we have delivered for clients.
You’ll note very little mention of technology solutions and/or new systems. People typically blame their ‘systems’ for current state performance, but one of the main reasons for the high failure rate of technology programs is the failure to get the process right first (a topic for another article).
Key takeaways: