Edition 19: Security's eternal prioritisation problem
What if the task I deprioritised leads to a breach that blows everything up? This is a question that's gone through every Security leader's mind. This edition provides a way to try and address that
Recent research by Datadog said, “Only 3 percent of critical vulnerabilities are worth prioritizing”. This confirmed my belief that most scanner output should be used as a starting point for understanding the security posture of your products and not as a means of creating a laundry list of bugs to fix. What to fix and what to ignore requires thorough prioritization.
While deciding what to remediate is hard, it is just the tip of the iceberg. Managing security programs requires constantly making decisions on what initiatives to drop and what to pick up. Do we prioritize building a SOC team or onboarding a software composition analysis tool? Should we invest in performing more manual, high-quality penetration tests or onboard a scanner that works at scale? Is cost optimization more important or risk reduction? None of these questions have a single right answer. However, given time and money are limited resources, Security teams have to make a choice.
Hypothesis
Incorrect prioritization has seen and unseen effects on building an effective security program. Every security team has an implicit framework on how to do this. However, implicit decisions make it harder to get buy-in and build a feedback loop. To counter this, Security teams should explicitly define a prioritization framework, which helps explain the choices made by the team.
The solution
Prioritization is not a Security specific problem. This is a problem that has troubled leaders forever. So, instead of inventing a new framework, let’s use an existing, popular framework: The Eisenhower Matrix or also called the urgent-important matrix.
The idea is simple: Draw a 2X2 matrix with increasing importance on one axis and decreasing urgency on the other. Fill in the tasks that you plan to do on each of the quadrants. Respond to urgent, important tasks first. Drop the less important, less urgent tasks.
Let’s apply a Security lens to each of the quadrants:
Crises (Urgent and Important): Managing crises is a core part of Security that we cannot wish away. So, when there is a crisis (e.g.: responding to an incident), it has to take top priority (P0), no questions asked. This is like taking a painkiller when you are sick. It may not solve the underlying problem, but necessary to keep things going. However, working only on managing crises leads to burnout and is impossible to scale.
Maturity initiatives(not urgent but important): If the first quadrant was a pain killer, this quadrant is about building a healthy lifestyle that reduces the chances of falling sick. There is no such thing as perfect security (just like there is no 100% immunity to diseases), however, it’s critical to build systems that reduce the likelihood of crises and increase the responsiveness to security incidents. The goal of “maturity” should be the number of crises and interruptions you have to deal with.
Interruptions (urgent but not important): Dealing with interruptions quickly help unblock crises and maturity initiatives. However, too many interruptions can also reduce the amount of resources you have to focus on maturity initiatives. Wherever possible, managing interruptions should be delegated (easier said than done). A good example is solving for unstable testing environments. While this can block initiatives to perform security testing (a key initiative), it does not have to be solved by Security. Security can convince to have DevOps/Platform teams to solve this problem as this affects more than just Security. Another example are processes that add inefficiencies to systems (e.g.: engineering managers asking “How many findings do we have open?”). It’s an important question, but this should really be delegated to automation. Building Jira dashboards that can be consumed by EMs will eliminate the need for this activity.
Distractions (neither urgent nor important): Any initiative which does not resolve a crisis, part of maturity goals, or clear interruptions is a distraction. There are no exceptions to this. We often find excited, well-intentioned engineers who get inspired by a conference talk or a newsletter edition (the irony is not lost on me :P) who want to implement something quickly. In most cases, this is a distraction and should be avoided.
While the framework works well in theory, there are a few obvious questions that come to mind:
How do you know what is important?: This is an important question and requires a separate series of posts to address. In short, what’s important should be predetermined as part of a planning exercise. This could be OKRs or AOPs or whatever else works in your org. There is an even deeper question on what you add to your OKRs. That depends on your business risks (different for a fintech v/s an e-commerce company), current risk posture (where do you stand today?), and your appetite for risk (how much risk are you willing to live with?).
How do you know what is urgent?: This is simpler. Anything that already has or can lead to the compromise of your systems is urgent. Anything that blocks progress being made on maturity initiatives is also urgent.
Given fixed resources, how do we decide how much resources to spend on each quadrant: While quantifying this is hard, here’s a framework that can help:
Managing crises is nonnegotiable. As many resources are needed to handle this must be allocated. While predicting crises is hard, you must have a sense of how many crises your organization deals with on a regular basis and allocate the necessary resources. For a given scope, this number should ideally go down over time.
Next, resources should be allocated to work on maturity initiatives that help reduce crises and interruptions (e.g.: automated assessments, SOAR, etc.). By performing these tasks, the number of crisis tasks goes down, which means, more bandwidth is available for maturity initiatives. This is a virtuous circle we should aspire to get into
Finally, allocate resources to remove interruptions. There will be pressure to prioritize this over maturity initiatives and it’s OK to give in at times. If this happens often, prioritize maturity tasks which can reduce the number of interruptions you face (e.g.: if you get too many ad-hoc requests for data, build a dashboard that provides on-demand data)
Ignore all distractions
Not all resources are fungible. Some can only respond and others can only build. How do we manage that? This is tricky. If your Security team is filled with specialists who can only do incident response, there is little chance of completing maturity initiatives. FWIW - this isn’t necessarily a bad design. Many teams hire specialists for each role and keep the teams static. They manage changing circumstances by downsizing some teams and hiring elsewhere (e.g.: Have too many SIEM alerts? Outsource SOC work to a 3rd party until you can tune your engines. Once done, you don’t renew the SOC contract).
This is essentially a culture decision. You can fill your team with full-stack security engineers with fungible skills (who can switch between automation and penetration testing and incident response) too. This will help you allocate the right amount of resources for each quadrant efficiently. However, the learning curve for each new initiative may be high. Alternatively, you can fill your team with specialists and change the team composition as your needs change.
Do initiatives in each quadrant require different program management techniques? Yes. Speed is important in managing crises and interruptions (“urgent” tasks). Also, there is no way to precisely know how many crises or interruptions you will face. Frameworks like Kanban are better for managing such initiatives. For maturity initiatives, steady progress is more important than speed. Sprints are a better way to handle them. Your mileage may vary on the exact frameworks to use, but it’s important to make sure that we don’t apply the same success criterion and program management techniques to all initiatives.
Are all distractions bad? Shouldn’t we sometimes respond to changing trends instead of sticking to plans in a rigid manner? While this is true, teams with no operational excellence tend to overestimate which trends can be useful. There’s also the fact that humans are attracted to shiny new objects. Unless security leaders can clearly articulate why this distraction is path-breaking, it’s best to treat every new trend as a distraction. One way to enforce this discipline is to require a change of defined goals in order to incorporate distractions as maturity initiatives. For instance, if an engineer feels ChatGPT can be used to improve the security of 3rd party components used, then revise your annual goals to add this initiative. Also, decide which initiative to drop from the current list to accommodate this new shiny request. If you cannot find an initiative that can be dropped, then drop the shiny new object. Such rigor will ensure conscious decisions are made on trade-offs.
That’s it for today! Are there other frameworks you use to prioritize security tasks? How do you prepare your team to respond to changing needs? Tell me more! You can drop me a message on Twitter, LinkedIn, or email. If you find this newsletter useful, share it with a friend, or colleague, or on your social media feed.