Search | arXiv e-print repository

Automated CVE Analysis for Threat Prioritization and Impact Prediction

Authors: Ehsan Aghaei, Ehab Al-Shaer, Waseem Shadid, Xi Niu

Abstract: The Common Vulnerabilities and Exposures (CVE) are pivotal information for proactive cybersecurity measures, including service patching, security hardening, and more. However, CVEs typically offer low-level, product-oriented descriptions of publicly disclosed cybersecurity vulnerabilities, often lacking the essential attack semantic information required for comprehensive weakness characterization… ▽ More The Common Vulnerabilities and Exposures (CVE) are pivotal information for proactive cybersecurity measures, including service patching, security hardening, and more. However, CVEs typically offer low-level, product-oriented descriptions of publicly disclosed cybersecurity vulnerabilities, often lacking the essential attack semantic information required for comprehensive weakness characterization and threat impact estimation. This critical insight is essential for CVE prioritization and the identification of potential countermeasures, particularly when dealing with a large number of CVEs. Current industry practices involve manual evaluation of CVEs to assess their attack severities using the Common Vulnerability Scoring System (CVSS) and mapping them to Common Weakness Enumeration (CWE) for potential mitigation identification. Unfortunately, this manual analysis presents a major bottleneck in the vulnerability analysis process, leading to slowdowns in proactive cybersecurity efforts and the potential for inaccuracies due to human errors. In this research, we introduce our novel predictive model and tool (called CVEDrill) which revolutionizes CVE analysis and threat prioritization. CVEDrill accurately estimates the CVSS vector for precise threat mitigation and priority ranking and seamlessly automates the classification of CVEs into the appropriate CWE hierarchy classes. By harnessing CVEDrill, organizations can now implement cybersecurity countermeasure mitigation with unparalleled accuracy and timeliness, surpassing in this domain the capabilities of state-of-the-art tools like ChaptGPT. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.02785 [pdf, ps, other]

CVE-driven Attack Technique Prediction with Semantic Information Extraction and a Domain-specific Language Model

Authors: Ehsan Aghaei, Ehab Al-Shaer

Abstract: This paper addresses a critical challenge in cybersecurity: the gap between vulnerability information represented by Common Vulnerabilities and Exposures (CVEs) and the resulting cyberattack actions. CVEs provide insights into vulnerabilities, but often lack details on potential threat actions (tactics, techniques, and procedures, or TTPs) within the ATT&CK framework. This gap hinders accurate CVE… ▽ More This paper addresses a critical challenge in cybersecurity: the gap between vulnerability information represented by Common Vulnerabilities and Exposures (CVEs) and the resulting cyberattack actions. CVEs provide insights into vulnerabilities, but often lack details on potential threat actions (tactics, techniques, and procedures, or TTPs) within the ATT&CK framework. This gap hinders accurate CVE categorization and proactive countermeasure initiation. The paper introduces the TTPpredictor tool, which uses innovative techniques to analyze CVE descriptions and infer plausible TTP attacks resulting from CVE exploitation. TTPpredictor overcomes challenges posed by limited labeled data and semantic disparities between CVE and TTP descriptions. It initially extracts threat actions from unstructured cyber threat reports using Semantic Role Labeling (SRL) techniques. These actions, along with their contextual attributes, are correlated with MITRE's attack functionality classes. This automated correlation facilitates the creation of labeled data, essential for categorizing novel threat actions into threat functionality classes and TTPs. The paper presents an empirical assessment, demonstrating TTPpredictor's effectiveness with accuracy rates of approximately 98% and F1-scores ranging from 95% to 98% in precise CVE classification to ATT&CK techniques. TTPpredictor outperforms state-of-the-art language model tools like ChatGPT. Overall, this paper offers a robust solution for linking CVEs to potential attack techniques, enhancing cybersecurity practitioners' ability to proactively identify and mitigate threats. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2204.02685 [pdf, other]

SecureBERT: A Domain-Specific Language Model for Cybersecurity

Authors: Ehsan Aghaei, Xi Niu, Waseem Shadid, Ehab Al-Shaer

Abstract: Natural Language Processing (NLP) has recently gained wide attention in cybersecurity, particularly in Cyber Threat Intelligence (CTI) and cyber automation. Increased connection and automation have revolutionized the world's economic and cultural infrastructures, while they have introduced risks in terms of cyber attacks. CTI is information that helps cybersecurity analysts make intelligent securi… ▽ More Natural Language Processing (NLP) has recently gained wide attention in cybersecurity, particularly in Cyber Threat Intelligence (CTI) and cyber automation. Increased connection and automation have revolutionized the world's economic and cultural infrastructures, while they have introduced risks in terms of cyber attacks. CTI is information that helps cybersecurity analysts make intelligent security decisions, that is often delivered in the form of natural language text, which must be transformed to machine readable format through an automated procedure before it can be used for automated security measures. This paper proposes SecureBERT, a cybersecurity language model capable of capturing text connotations in cybersecurity text (e.g., CTI) and therefore successful in automation for many critical cybersecurity tasks that would otherwise rely on human expertise and time-consuming manual efforts. SecureBERT has been trained using a large corpus of cybersecurity text.To make SecureBERT effective not just in retaining general English understanding, but also when applied to text with cybersecurity implications, we developed a customized tokenizer as well as a method to alter pre-trained weights. The SecureBERT is evaluated using the standard Masked Language Model (MLM) test as well as two additional standard NLP tasks. Our evaluation studies show that SecureBERT\footnote{\url{https://github.com/ehsanaghaei/SecureBERT}} outperforms existing similar models, confirming its capability for solving crucial NLP tasks in cybersecurity. △ Less

Submitted 20 October, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: This is the initial draft of this work and it may contain errors and typos. The revised version has already been submitted to a venue

arXiv:2104.08994 [pdf, other]

Constraints Satisfiability Driven Reinforcement Learning for Autonomous Cyber Defense

Authors: Ashutosh Dutta, Ehab Al-Shaer, Samrat Chatterjee

Abstract: With the increasing system complexity and attack sophistication, the necessity of autonomous cyber defense becomes vivid for cyber and cyber-physical systems (CPSs). Many existing frameworks in the current state-of-the-art either rely on static models with unrealistic assumptions, or fail to satisfy the system safety and security requirements. In this paper, we present a new hybrid autonomous agen… ▽ More With the increasing system complexity and attack sophistication, the necessity of autonomous cyber defense becomes vivid for cyber and cyber-physical systems (CPSs). Many existing frameworks in the current state-of-the-art either rely on static models with unrealistic assumptions, or fail to satisfy the system safety and security requirements. In this paper, we present a new hybrid autonomous agent architecture that aims to optimize and verify defense policies of reinforcement learning (RL) by incorporating constraints verification (using satisfiability modulo theory (SMT)) into the agent's decision loop. The incorporation of SMT does not only ensure the satisfiability of safety and security requirements, but also provides constant feedback to steer the RL decision-making toward safe and effective actions. This approach is critically needed for CPSs that exhibit high risk due to safety or security violations. Our evaluation of the presented approach in a simulated CPS environment shows that the agent learns the optimal policy fast and defeats diversified attack strategies in 99\% cases. △ Less

Submitted 18 April, 2021; originally announced April 2021.

Comments: 11 pages

arXiv:2102.11498 [pdf, other]

V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities

Authors: Siddhartha Shankar Das, Edoardo Serra, Mahantesh Halappanavar, Alex Pothen, Ehab Al-Shaer

Abstract: Weaknesses in computer systems such as faults, bugs and errors in the architecture, design or implementation of software provide vulnerabilities that can be exploited by attackers to compromise the security of a system. Common Weakness Enumerations (CWE) are a hierarchically designed dictionary of software weaknesses that provide a means to understand software flaws, potential impact of their expl… ▽ More Weaknesses in computer systems such as faults, bugs and errors in the architecture, design or implementation of software provide vulnerabilities that can be exploited by attackers to compromise the security of a system. Common Weakness Enumerations (CWE) are a hierarchically designed dictionary of software weaknesses that provide a means to understand software flaws, potential impact of their exploitation, and means to mitigate these flaws. Common Vulnerabilities and Exposures (CVE) are brief low-level descriptions that uniquely identify vulnerabilities in a specific product or protocol. Classifying or mapping of CVEs to CWEs provides a means to understand the impact and mitigate the vulnerabilities. Since manual mapping of CVEs is not a viable option, automated approaches are desirable but challenging. We present a novel Transformer-based learning framework (V2W-BERT) in this paper. By using ideas from natural language processing, link prediction and transfer learning, our method outperforms previous approaches not only for CWE instances with abundant data to train, but also rare CWE classes with little or no data to train. Our approach also shows significant improvements in using historical data to predict links for future instances of CVEs, and therefore, provides a viable approach for practical applications. Using data from MITRE and National Vulnerability Database, we achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data. We believe that our work will influence the design of better methods and training models, as well as applications to solve increasingly harder problems in cybersecurity. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: Under submission to KDD 2021 Applied Data Science Track

arXiv:2009.11501 [pdf, other]

doi 10.1007/978-3-030-63086-7_2

ThreatZoom: CVE2CWE using Hierarchical Neural Network

Authors: Ehsan Aghaei, Waseem Shadid, Ehab Al-Shaer

Abstract: The Common Vulnerabilities and Exposures (CVE) represent standard means for sharing publicly known information security vulnerabilities. One or more CVEs are grouped into the Common Weakness Enumeration (CWE) classes for the purpose of understanding the software or configuration flaws and potential impacts enabled by these vulnerabilities and identifying means to detect or prevent exploitation. As… ▽ More The Common Vulnerabilities and Exposures (CVE) represent standard means for sharing publicly known information security vulnerabilities. One or more CVEs are grouped into the Common Weakness Enumeration (CWE) classes for the purpose of understanding the software or configuration flaws and potential impacts enabled by these vulnerabilities and identifying means to detect or prevent exploitation. As the CVE-to-CWE classification is mostly performed manually by domain experts, thousands of critical and new CVEs remain unclassified, yet they are unpatchable. This significantly limits the utility of CVEs and slows down proactive threat mitigation. This paper presents the first automatic tool to classify CVEs to CWEs. ThreatZoom uses a novel learning algorithm that employs an adaptive hierarchical neural network which adjusts its weights based on text analytic scores and classification errors. It automatically estimates the CWE classes corresponding to a CVE instance using both statistical and semantic features extracted from the description of a CVE. This tool is rigorously tested by various datasets provided by MITRE and the National Vulnerability Database (NVD). The accuracy of classifying CVE instances to their correct CWE classes are 92% (fine-grain) and 94% (coarse-grain) for NVD dataset, and 75% (fine-grain) and 90% (coarse-grain) for MITRE dataset, despite the small corpus. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: This is accepted paper in EAI SecureComm 2020, 16th EAI International Conference on Security and Privacy in Communication Networks

Journal ref: EAI SecureComm 2020, 16th EAI International Conference on Security and Privacy in Communication Networks

arXiv:2004.09662 [pdf, other]

The Panacea Threat Intelligence and Active Defense Platform

Authors: Adam Dalton, Ehsan Aghaei, Ehab Al-Shaer, Archna Bhatia, Esteban Castillo, Zhuo Cheng, Sreekar Dhaduvai, Qi Duan, Md Mazharul Islam, Younes Karimi, Amir Masoumzadeh, Brodie Mather, Sashank Santhanam, Samira Shaikh, Tomek Strzalkowski, Bonnie J. Dorr

Abstract: We describe Panacea, a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. Panacea processes modern message formats through a plug-in architecture to accommodate innovative appro… ▽ More We describe Panacea, a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. Panacea processes modern message formats through a plug-in architecture to accommodate innovative approaches for message analysis, knowledge representation and dialogue generation. The novelty of the Panacea system is that uses NLP for cyber defense and engages the attacker using bots to elicit evidence to attribute to the attacker and to waste the attacker's time and resources. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: Accepted at STOC

arXiv:1907.00540 [pdf, other]

A Formal Approach for Efficient Navigation Management of Hybrid Electric Vehicles on Long Trips

Authors: Mohammad Ashiqur Rahman, Md Hasan Shahriar, Ehab Al-Shaer, Quanyan Zhu

Abstract: Plug-in Hybrid Electric Vehicles (PHEVs) are gaining popularity due to their economic efficiency as well as their contribution to green management. PHEVs allow the driver to use electric power exclusively for driving and then switch to gasoline as needed. The more gasoline a vehicle uses, the higher cost is required for the trip. However, a PHEV cannot last for a long period on stored electricity… ▽ More Plug-in Hybrid Electric Vehicles (PHEVs) are gaining popularity due to their economic efficiency as well as their contribution to green management. PHEVs allow the driver to use electric power exclusively for driving and then switch to gasoline as needed. The more gasoline a vehicle uses, the higher cost is required for the trip. However, a PHEV cannot last for a long period on stored electricity without being recharged. Thus, it needs frequent recharging compared to traditional gasoline-powered vehicles. Moreover, the battery recharging time is usually long, which leads to longer delays on a trip. Therefore, it is necessary to provide a flexible navigation management scheme along with an efficient recharging schedule, which allows the driver to choose an optimal route based on the fuel-cost and time-to-destination constraints. In this paper, we present a formal model to solve this PHEV navigation management problem. The model is solved to provide a driver with a comprehensive routing plan including the potential recharging and refueling points that satisfy the given requirements, particularly the maximum fuel cost and the maximum trip time. In addition, we propose a price-based navigation control technique to achieve better load balance for the traffic system. Evaluation results show that the proposed formal models can be solved efficiently even with large road networks. △ Less

Submitted 1 July, 2019; originally announced July 2019.

MSC Class: 68Q60

arXiv:1812.03966 [pdf, other]

IoTC2: A Formal Method Approach for Detecting Conflicts in Large Scale IoT Systems

Authors: Abdullah Al Farooq, Ehab Al-Shaer, Thomas Moyer, Krishna Kant

Abstract: Internet of Things (IoT) has become a common paradigm for different domains such as health care, transportation infrastructure, smart home, smart shopping, and e-commerce. With its interoperable functionality, it is now possible to connect all domains of IoT together for providing competent services to the users. Because numerous IoT devices can connect and communicate at the same time, there can… ▽ More Internet of Things (IoT) has become a common paradigm for different domains such as health care, transportation infrastructure, smart home, smart shopping, and e-commerce. With its interoperable functionality, it is now possible to connect all domains of IoT together for providing competent services to the users. Because numerous IoT devices can connect and communicate at the same time, there can be events that trigger conflicting actions to an actuator or an environmental feature. However, there have been very few research efforts made to detect conflicting situation in IoT system using formal method. This paper provides a formal method approach, IoT Confict Checker (IoTC2), to ensure safety of controller and actuators' behavior with respect to conflicts. Any policy violation results in detection of the conflicts. We defined the safety policies for controller, actions, and triggering events and implemented the those with Prolog to prove the logical completeness and soundness. In addition to that, we have implemented the detection policies in Matlab Simulink Environment with its built-in Model Verification blocks. We created smart home environment in Simulink and showed how the conflicts affect actions and corresponding features. We have also experimented the scalability, efficiency, and accuracy of our method in the simulated environment. △ Less

Submitted 10 December, 2018; originally announced December 2018.

arXiv:1412.3359 [pdf, ps, other]

On DDoS Attack Related Minimum Cut Problems

Authors: Qi Duan, Haadi Jafarian, Ehab Al-Shaer, Jinhui Xu

Abstract: In this paper, we study two important extensions of the classical minimum cut problem, called {\em Connectivity Preserving Minimum Cut (CPMC)} problem and {\em Threshold Minimum Cut (TMC)} problem, which have important applications in large-scale DDoS attacks. In CPMC problem, a minimum cut is sought to separate a of source from a destination node and meanwhile preserve the connectivity between th… ▽ More In this paper, we study two important extensions of the classical minimum cut problem, called {\em Connectivity Preserving Minimum Cut (CPMC)} problem and {\em Threshold Minimum Cut (TMC)} problem, which have important applications in large-scale DDoS attacks. In CPMC problem, a minimum cut is sought to separate a of source from a destination node and meanwhile preserve the connectivity between the source and its partner node(s). The CPMC problem also has important applications in many other areas such as emergency responding, image processing, pattern recognition, and medical sciences. In TMC problem, a minimum cut is sought to isolate a target node from a threshold number of partner nodes. TMC problem is an important special case of network inhibition problem and has important applications in network security. We show that the general CPMC problem cannot be approximated within $logn$ unless $NP=P$ has quasi-polynomial algorithms. We also show that a special case of two group CPMC problem in planar graphs can be solved in polynomial time. The corollary of this result is that the network diversion problem in planar graphs is in $P$, a previously open problem. We show that the threshold minimum node cut (TMNC) problem can be approximated within ratio $O(\sqrt{n})$ and the threshold minimum edge cut problem (TMEC) can be approximated within ratio $O(\log^2{n})$. \emph{We also answer another long standing open problem: the hardness of the network inhibition problem and network interdiction problem. We show that both of them cannot be approximated within any constant ratio. unless $NP \nsubseteq \cap_{δ>0} BPTIME(2^{n^δ})$. △ Less

Submitted 17 April, 2015; v1 submitted 10 December, 2014; originally announced December 2014.

Showing 1–10 of 10 results for author: Al-Shaer, E