The New Metrics of Security Operations: Tracking [R]evolutionary Improvement In Efficiency and Effectiveness with Automated Detection & Response (ADR)

Tuesday, September 26, 2017
Fidelis and the New Metrics of Security Operations

CISOs have become leaders in their businesses rather than just experts in their departments. As a result, they face three new requirements:

  1. To educate their peers on the scope, scale, severity and solutions for cybersecurity and how emerging threats affect each aspect of the business.
  2. To elevate the cybersecurity discussion out of the trenches of speeds, feeds and fingerprints.
  3. To report on evolving metrics that impact the bottom line of the business to facilitate rapid decision making by the rest of their executive peers.

Download your free Fidelis Infographic: Mapping Security Value  

In 2016 The SANS Institute found that 71% of organizations do not have regular metrics for or even measure incident response performance, process and effectiveness.

Without metrics there is no objective way to determine progress. Fortunately, a new approach to security operations optimization and automation enables metrics that drive results for the business. The new, unified approach of Automated Detection & Response – or ADR for short - impact not only security posture but the bottom line of the business as well.

A purpose-built ADR platform enables new Metrics for the Security Operations Leader. The list of six new metrics for the Security Operations Leader are some - but not all - of the new metrics that emerge when ADR is implemented:

  1. Cost Per Incident 
    CPI can be measures as the time per incident * average hourly rate for a Tier 1 analyst (~$30/hour for a fully loaded FTE @$45K/year). To get a baseline, run that formula through your IR playbook for each phase of a response from detection, decision to escalate & investigate to response determination to response and remediation execution. Then run it again with an ADR platform in place in a POC or even as a table-top exercise. A further extension of this metric involves the empowerment of tier 1 and 2 analysts. When Tier 1 and 2 analysts are empowered with an ADR Platform to perform or augment the work of a Tier 3 analyst (a very scarce resource!), then substantial effectiveness savings can be quantified.
  2. Cost Per Workflow
    Review, investigation and response workflows are both personnel and technology-dependent. Automation reduces personnel and technology dependencies. Reducing technology dependencies decreases personnel maintenance requirements. Thus, automation impacts personnel cost, technology cost, and maintenance cost. Leaders will see that entire steps of their workflows are able to be reduced or eliminated completely; delivering massive acceleration, huge savings and massive efficiency boosts as teams focus on validate and conclusive incidents rather than wasting time on the wild goose chase.
  3. Automatic Detection vs Manual Detection
    Establish a baseline for determining the ratio of detections your security stack produces vs the combined number of human detections you receive. To figure out the human detections, determine the number of staff detections (e.g. an employee recognizes that their machine is malfunctioning or an IT Admin recognizes that a system is performing in unusual ways) plus the number of external detections (e.g. the number of times you get a call from the FBI /IT Admins) plus the number of detections your Security Operations staff created by manually synthesizing data from your security stack and SEIM. This will give you a sense of the efficiency of your current system. With ADR you can expect the ratio to tilt substantially toward the automation side of the equation which means substantially better security operations efficiency.
  4. Percent Investigation vs Volume
    Determine what is slipping through the cracks. By measuring investigations versus alert volume you can get a sense for what might be slipping through the cracks and creating risk. With the ADR system you should expect to see a shrinking gap and massive improvement. For example, if an organization is typically performing 3 investigations for every 100 alerts (3/100 or 3%) and then implements an ADR which sees a 10% alert-to-conclusion rate and an additional 2 investigations (5/10 or 50%) that yields a massive 1500% increase to security operations effectiveness.
  5. Ratio of Investigation to Response
    This metric shows how many items that were investigated lead to a response workflow going through completion. The ratio indicates where Security Operations teams may be wasting time. If an investigation is started and then abandoned due to lack of context, insight or actionable intelligence, then time and resources are not only wasted, but they are face a huge opportunity cost in lost time and focus on threats and attacks that are actionable. Organizations that implement an ADR platform should expect to see a convergence of investigations to response since more investigations are against validated conclusions rather than merely suspected attacks.
  6. Rate of Validation
    This metric measures the time it takes to make a decision. Analysis paralysis and Security Operations uncertainty increases dwell time and risks the spread of an attack. It also takes time away from investigating and responding to other attacks or compromises that may be happening at the same time. By measuring the decision rate both before and after implementing an ADR platform, the Security Operations team is able to demonstrate nimbleness and increase response capacity without adding scarce people resources.
  7. Remediation Response vs Reimage
    This metric measures business disruption. Disrupted business means substantially higher cost from delays, lost productivity or even liability to third parties. The more surgical and remote responses that are enabled by the ADR platform, the fewer “big hammer” fixes of reimaging an end-user’s endpoint have to happen. That means less business disruption and inconvenience for employees. Business disruption can be quantified based on the staff role, affected device role and length of time for a response. Taking someone’s laptop for a day to reimage it is an inconvenience. Taking down a payment processing server is a substantial disruption – even when hot backups and clustered failovers are part of the solution.

These are just some of the new metrics that can be tracked by security operations teams to better equip the business they serve. What is missing from this list? What have you found that works well in your organization? Tweet us at @FidelisCyber and let us know!

- Billy Cripe, VP of Marketing