Cyber Security Operations Center Measurement Indicators

Omar Alanezi
17/6/2023


Introduction:

In Managing SOC, the saying "You Can't Improve What You Don't Measure" holds true.


As cybersecurity threats continue to evolve, SOC teams are always tasked with the responsibility to do the following: Monitoring, analyzing and sometimes responding to and mitigating any cybersecurity incidents. SOC team serves as the "First Line of Defense" against all security incidents and events. During the monitoring and analyzing phase, SOC analysts are often overwhelmed with alert triggers, incident detections, responding to security incidents and finally A LOT of false positives which leads to alerts fatigue. Many organizations today struggle to optimize their SOC performance due to lack of insights into their operational metrics and KPIs. This is where tracking SOC operational Metrics and KPIs can play a crucial role in improving SOC efficiency. While the goal is to reduce some of the KPIs such as Time to Response & Time to detect, optimizing other metrics is just as important for measuring team operational metrics. In this article, we will explore the importance of tracking & calculating SOC metrics & KPIs.

SOC Metrics & Key Performance Indicators – KPIs:

Keep your (SOC analyst) Close, and your KPIs closers.

While there are many SOC Metrics and KPIs that can be measured, the most crucial ones for optimizing SOC operations and achieving operational KPIs are Time to Detect (TTD), Time to Response (TTR), False Positive Rate (FPR), Incident Closure Rate, Staff Utilization Rate, and Knowledge Management. SOC managers, Managed Security Service Providers (MSSPs), and cybersecurity decision makers can utilize these KPIs to not only optimize SOC operations but also to justify investments in cybersecurity infrastructure. By measuring and monitoring these key metrics, they can gain valuable insights into the effectiveness and efficiency of their SOC operations. Armed with this data, they can make informed decisions about how to improve their cybersecurity posture and secure their organizations against cyber threats.



Let's take a closer look at each of these metrics and KPIs and provide a clear definition of each one:

1-Time To Detect (TTD): To measure the time it takes SOC team to detect an Incident.

a.Mean Time to Detect (MTTD): the average it takes to detect an incident.

2-Time To Response (TTR): To measure the time it takes SOC team to respond to an incident.

a.Mean Time to Response (MTTR): The average it takes to respond to an incident.

3-False Positive Rate: To measure the percentage of security alerts that are false positive.

4-Incident Closure Rate: To measure the percentage of closing incidents within a given timeframe.

5-Staff utilization: To measure the percentage of time that SOC analysts are actively working on security incidents.

6-Knowledge management: the process of sharing and managing knowledge and information within your organization. It includes providing & arming your team with the proper training needed to do the job, applying best practices for Incident response, threat intelligence and threat hunting.

"The Easy Math":

1-Time to Detect

2-Time to Response:

When it comes to SOC operational metrics, there is nothing more important than your ability to detect & respond to cybersecurity incidents. But have you asked yourself how are these metrics calculated? What is the logical representation of these numbers and how to translate them to business language to justify your investment in Security Operation Centers. In this section we will learn how to do the calculations of TTD, MTTD, TTR and MTTR. Even though most SIEM have this calculation for you, it is still important to know these calculations are done and most importantly, what do they mean how to interpret them into decision making inputs.

Let's assume that we have 4 Alarms triggered and your SOC team validated these alarms and opened a case in the SIEM solution case management system and their information as below:
Simply, Time to Detect is the difference in time between case creation time and alarm trigger time, meaning that you are aware that this alarm is an incident and case has been opened for further investigation. While time to response is can be calculated as the difference in time between the mitigation time and Case Creation time. Sometimes it is calculated by taking the difference between the closure time rather than the mitigation time.

TTD= Case Creation Time – Alarm Trigger Time

TTR= Incident Mitigation Time – Case Creation Time.
As a SOC Manager, Monitoring MTTD and MTTR is essential to measure the effectiveness of your SOC in detecting, responding and mitigating security incidents. These metrics help you identify the gaps in security infrastructure and processes. MTTD and MTTR can also be used to set and measure performance against a predefined Service Level Agreement (SLA). The lower MTTD, the faster SOC can respond to a security incident. the question to ask is, how to lower MTTD and MTTR:


Detection:

-What kind of detection security tools are being used?

-Are the detection rules up-to-date and effective?

-Are there any false positive and false negatives impacting the detection?

Response:

-Is there a clear and well-defined IR process in place?

-Are there sufficient resources and staff available to respond to incidents?

-are there well trained and equipped IR team members?

-Are there playbooks and response plans in place for different incidents?



1-False Positive = Total number of false positive alerts / total number of alerts generated by SIEM for a specific period (day or a week).

For example, if your SIEM generated 209 alerts during a week, and 50 of those alerts were determined to be false positive, the false positive rate would be calculated as follow:

False Positive Rate = (# of False positive Alerts / Total Number of Alerts) x 100

False Positive Rate = (50/209) x 100

False Positive Rate = 23.92%

What does this indicate?

This indicates that 23.92% of your alerts are false positive, the higher the number, the more fine-tuning work needs to be done on your security controls, SIEM use cases, and outdated threat intelligence. Reducing this metric should be a priority and give your analysts time to investigate real incidents and alerts.

2-Incident Closure Rate = Total number of incidents successfully resolved / total number of incidents identified

For example, if your SOC identified 17 security incidents during a week/month/year and successfully resolved 13 of them, the incident closure rate would be calculated as follows:

Incident Closure Rate = (# of resolved incidents/ total # of incidents) x 100

Incident Closure Rate = (13/17) x 100

Incident Closure Rate = 76.47 %.

What does this indicate?

This indicates that 76.47 % of your incidents were successfully resolved. While it is important to note the difference between incident Closure Rate and the TTR, incident Closure Rate gives you a broader view of the overall effectiveness of the SOC responding to and resolving security incidents. while TTR gives you an indication of the time it takes your SOC team to respond to an incident.



3-Staff Utilization Rate = Total amount of time SOC analyst actively engaged in activities / total available time for each analyst.

For example, if you have 5 SOC analysts, and each analyst is available for 40 hours per week, the total available time would be 200 hours (5 analysts x 40 hrs ). Let's assume that the total amount of time spent actively engaged in activities during the week is 165, the Staff utilization rate would be calculated as follows:

Staff Utilization Rate = (total active time / total available time) x 100

Staff Utilization Rate = (165 / 200) x 100

Staff Utilization Rate = 82.5 %

What does this indicate?

This indicates that SOC analysts spend 82.5 % of their available time actively engaged in security monitoring and incident response activities. This is an important indicator for SOC managers as it may not be sustainable for the long run and it causes staff burnouts, and reduced quality of work, by the end of the day, we are human! A good Staff Utilization Rate falls between 70% to 80% to ensure enough flexibility and maintain a healthy work environment.

4-Knowledge management: it is a process rather than a metric. SOC managers should have a defined knowledge management process, which includes, documenting security incidents and their resolutions, maintaining up-to-date cybersecurity policies and procedures, and providing ongoing training to SOC staff to ensure they are equipped with necessary skills and knowledge to detect, respond to, and prevent any cybersecurity threats. As the saying goes "There is no more profitable investment than investing in your human resources"



Optimize your KPIs:

1-Prioritize alerting: this ensures that alerts are categorized based on their severity and potential impact on the organization, SOC team should be notified of the highest priority alerts first. This ensures that critical alerts are addressed quickly to reduce the risk of major incidents.

2-Improve SIEM rules: this is to ensure that SIEM rules are optimized to detect relevant threats. This can involve creating new rules, modifying existing ones or adapting a detection framework such as MITRE attack.

3-Expand Visibility: Periodically evaluate your current visibility and identify gaps. Remember, you can't detect what you can't see!

4-Automate: Use automation to reduce the time it takes to analyze and respond to alerts. This includes automating investigations, threat hunting & response.

a.Arguably, threat hunting can't be fully automated as human intuition and creativity are often required to uncover new or previously unknown threats.

5-Train, Train, Train: Training is the key element in optimizing your SOC KPIs. All your KPIs heavily depend on your team abilities to DETECT & RESPOND to cybersecurity incidents. Without proper training and knowledge transfer to your team you will not be able to optimize your SOC KPIs.

6-Improve Staff Utilization: proper staffing, workflow optimization & automation to identify and eliminate any unnecessary or redundant tasks will help you improve your Staff Utilization Rate KPI.

Conclusion:

In conclusion, monitoring and optimizing SOC KPIs such as MTTD, MTTR, false positive rate, incident closure rate, staff utilization, and knowledge management is essential to ensure effective incident response and maintain a high level of security posture. By calculating and analyzing these metrics, SOC managers can identify areas of improvement, optimize processes and workflows, and allocate resources efficiently. Implementing best practices such as automation, continuous training, and collaboration between teams can help reduce MTTD and MTTR and improve incident response time. By continuously monitoring and improving SOC KPIs, organizations can enhance their security capabilities and better protect against cyber threats.

Share this blog
Follow us
Advance your skills by reading the latest blog created by our team.
Other Blogs