Search | arXiv e-print repository

Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models

Authors: Romy Fieblinger, Md Tanvirul Alam, Nidhi Rastogi

Abstract: Cyber threats are constantly evolving. Extracting actionable insights from unstructured Cyber Threat Intelligence (CTI) data is essential to guide cybersecurity decisions. Increasingly, organizations like Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements… ▽ More Cyber threats are constantly evolving. Extracting actionable insights from unstructured Cyber Threat Intelligence (CTI) data is essential to guide cybersecurity decisions. Increasingly, organizations like Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs). We explore the application of state-of-the-art open-source LLMs, including the Llama 2 series, Mistral 7B Instruct, and Zephyr for extracting meaningful triples from CTI texts. Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring. The extracted data is then utilized to construct a KG, offering a structured and queryable representation of threat intelligence. Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering. However, while our methods prove effective in small-scale tests, applying LLMs to large-scale data for KG construction and Link Prediction presents ongoing challenges. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 6th Workshop on Attackers and Cyber-Crime Operations, 12 pages, 1 figure, 9 tables

arXiv:2406.07599 [pdf, other]

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Authors: Md Tanvirul Alam, Dipkamal Bhusal, Le Nguyen, Nidhi Rastogi

Abstract: Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there ar… ▽ More Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI. △ Less

Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.20441 [pdf, other]

SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory

Authors: Dipkamal Bhusal, Md Tanvirul Alam, Le Nguyen, Ashim Mahara, Zachary Lightcap, Rodney Frazier, Romy Fieblinger, Grace Long Torales, Nidhi Rastogi

Abstract: Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE… ▽ More Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding \& Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations for improving LLMs reliability as cyber advisory tools. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.10789 [pdf, other]

doi 10.1109/EuroSP60621.2024.00010

PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis

Authors: Dipkamal Bhusal, Md Tanvirul Alam, Monish K. Veerabhadran, Michael Clifford, Sara Rampazzi, Nidhi Rastogi

Abstract: Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. This susceptibility, combined with the black-box nature of such networks, limits their adoption in critical applications like autonomous driving. Feature-attribution-based explanation methods provide relevance of input features for model predictio… ▽ More Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. This susceptibility, combined with the black-box nature of such networks, limits their adoption in critical applications like autonomous driving. Feature-attribution-based explanation methods provide relevance of input features for model predictions on input samples, thus explaining model decisions. However, we observe that both model predictions and feature attributions for input samples are sensitive to noise. We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples. Our method, PASA, requires the computation of two test statistics using model prediction and feature attribution and can reliably detect adversarial samples using thresholds learned from benign samples. We validate our lightweight approach by evaluating the performance of PASA on varying strengths of FGSM, PGD, BIM, and CW attacks on multiple image and non-image datasets. On average, we outperform state-of-the-art statistical unsupervised adversarial detectors on CIFAR-10 and ImageNet by 14\% and 35\% ROC-AUC scores, respectively. Moreover, our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 9th IEEE European Symposium on Security and Privacy

arXiv:2401.12790 [pdf, other]

MORPH: Towards Automated Concept Drift Adaptation for Malware Detection

Authors: Md Tanvirul Alam, Romy Fieblinger, Ashim Mahara, Nidhi Rastogi

Abstract: Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate conc… ▽ More Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate concept drift. Self-training involves retraining the model using pseudo labels to adapt to shifting data distributions. In this research, we propose MORPH -- an effective pseudo-label-based concept drift adaptation method specifically designed for neural networks. Through extensive experimental analysis of Android and Windows malware datasets, we demonstrate the efficacy of our approach in mitigating the impact of concept drift. Our method offers the advantage of reducing annotation efforts when combined with active learning. Furthermore, our method significantly improves over existing works in automated concept drift adaptation for malware detection. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2311.01247 [pdf, other]

Emergent (In)Security of Multi-Cloud Environments

Authors: Morgan Reece, Theodore Lander Jr., Sudip Mittal, Nidhi Rastogi, Josiah Dykstra, Andy Sampson

Abstract: As organizations increasingly use cloud services to host their IT infrastructure, there is a need to share data among these cloud hosted services and systems. A majority of IT organizations have workloads spread across different cloud service providers, growing their multi-cloud environments. When an organization grows their multi-cloud environment, the threat vectors and vulnerabilities for their… ▽ More As organizations increasingly use cloud services to host their IT infrastructure, there is a need to share data among these cloud hosted services and systems. A majority of IT organizations have workloads spread across different cloud service providers, growing their multi-cloud environments. When an organization grows their multi-cloud environment, the threat vectors and vulnerabilities for their cloud systems and services grow as well. The increase in the number of attack vectors creates a challenge of how to prioritize mitigations and countermeasures to best defend a multi-cloud environment against attacks. Utilizing multiple industry standard risk analysis tools, we conducted an analysis of multi-cloud threat vectors enabling calculation and prioritization for the identified mitigations and countermeasures. The prioritizations from the analysis showed that authentication and architecture are the highest risk areas of threat vectors. Armed with this data, IT managers are able to more appropriately budget cybersecurity expenditure to implement the most impactful mitigations and countermeasures. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Journal ref: 39th ACM Annual Computer Security Applications Conference 2023 (ACM ACSAC 2023)

arXiv:2306.01862 [pdf, other]

Systemic Risk and Vulnerability Analysis of Multi-cloud Environments

Authors: Morgan Reece, Theodore Edward Lander Jr., Matthew Stoffolano, Andy Sampson, Josiah Dykstra, Sudip Mittal, Nidhi Rastogi

Abstract: With the increasing use of multi-cloud environments, security professionals face challenges in configuration, management, and integration due to uneven security capabilities and features among providers. As a result, a fragmented approach toward security has been observed, leading to new attack vectors and potential vulnerabilities. Other research has focused on single-cloud platforms or specific… ▽ More With the increasing use of multi-cloud environments, security professionals face challenges in configuration, management, and integration due to uneven security capabilities and features among providers. As a result, a fragmented approach toward security has been observed, leading to new attack vectors and potential vulnerabilities. Other research has focused on single-cloud platforms or specific applications of multi-cloud environments. Therefore, there is a need for a holistic security and vulnerability assessment and defense strategy that applies to multi-cloud platforms. We perform a risk and vulnerability analysis to identify attack vectors from software, hardware, and the network, as well as interoperability security issues in multi-cloud environments. Applying the STRIDE and DREAD threat modeling methods, we present an analysis of the ecosystem across six attack vectors: cloud architecture, APIs, authentication, automation, management differences, and cybersecurity legislation. We quantitatively determine and rank the threats in multi-cloud environments and suggest mitigation strategies. △ Less

Submitted 7 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: 27 pages, 9 figures

arXiv:2211.01753 [pdf, other]

Looking Beyond IoCs: Automatically Extracting Attack Patterns from External CTI

Authors: Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park, Nidhi Rastogi

Abstract: Public and commercial organizations extensively share cyberthreat intelligence (CTI) to prepare systems to defend against existing and emerging cyberattacks. However, traditional CTI has primarily focused on tracking known threat indicators such as IP addresses and domain names, which may not provide long-term value in defending against evolving attacks. To address this challenge, we propose to us… ▽ More Public and commercial organizations extensively share cyberthreat intelligence (CTI) to prepare systems to defend against existing and emerging cyberattacks. However, traditional CTI has primarily focused on tracking known threat indicators such as IP addresses and domain names, which may not provide long-term value in defending against evolving attacks. To address this challenge, we propose to use more robust threat intelligence signals called attack patterns. LADDER is a knowledge extraction framework that can extract text-based attack patterns from CTI reports at scale. The framework characterizes attack patterns by capturing the phases of an attack in Android and enterprise networks and systematically maps them to the MITRE ATT\&CK pattern framework. LADDER can be used by security analysts to determine the presence of attack vectors related to existing and emerging threats, enabling them to prepare defenses proactively. We also present several use cases to demonstrate the application of LADDER in real-world scenarios. Finally, we provide a new, open-access benchmark malware dataset to train future cyberthreat intelligence models. △ Less

Submitted 11 July, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.17376 [pdf, other]

doi 10.1145/3600160.3600193

SoK: Modeling Explainability in Security Analytics for Interpretability, Trustworthiness, and Usability

Authors: Dipkamal Bhusal, Rosalyn Shin, Ajay Ashok Shewale, Monish Kumar Manikya Veerabhadran, Michael Clifford, Sara Rampazzi, Nidhi Rastogi

Abstract: Interpretability, trustworthiness, and usability are key considerations in high-stake security applications, especially when utilizing deep learning models. While these models are known for their high accuracy, they behave as black boxes in which identifying important features and factors that led to a classification or a prediction is difficult. This can lead to uncertainty and distrust, especial… ▽ More Interpretability, trustworthiness, and usability are key considerations in high-stake security applications, especially when utilizing deep learning models. While these models are known for their high accuracy, they behave as black boxes in which identifying important features and factors that led to a classification or a prediction is difficult. This can lead to uncertainty and distrust, especially when an incorrect prediction results in severe consequences. Thus, explanation methods aim to provide insights into the inner working of deep learning models. However, most explanation methods provide inconsistent explanations, have low fidelity, and are susceptible to adversarial manipulation, which can reduce model trustworthiness. This paper provides a comprehensive analysis of explainable methods and demonstrates their efficacy in three distinct security applications: anomaly detection using system logs, malware prediction, and detection of adversarial images. Our quantitative and qualitative analysis reveals serious limitations and concerns in state-of-the-art explanation methods in all three applications. We show that explanation methods for security applications necessitate distinct characteristics, such as stability, fidelity, robustness, and usability, among others, which we outline as the prerequisites for trustworthy explanation methods. △ Less

Submitted 12 June, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: 12 pages, 4 figures

arXiv:2209.05440 [pdf, other]

Bias Impact Analysis of AI in Consumer Mobile Health Technologies: Legal, Technical, and Policy

Authors: Kristine Gloria, Nidhi Rastogi, Stevie DeGroff

Abstract: Today's large-scale algorithmic and automated deployment of decision-making systems threatens to exclude marginalized communities. Thus, the emergent danger comes from the effectiveness and the propensity of such systems to replicate, reinforce, or amplify harmful existing discriminatory acts. Algorithmic bias exposes a deeply entrenched encoding of a range of unwanted biases that can have profoun… ▽ More Today's large-scale algorithmic and automated deployment of decision-making systems threatens to exclude marginalized communities. Thus, the emergent danger comes from the effectiveness and the propensity of such systems to replicate, reinforce, or amplify harmful existing discriminatory acts. Algorithmic bias exposes a deeply entrenched encoding of a range of unwanted biases that can have profound real-world effects that manifest in domains from employment, to housing, to healthcare. The last decade of research and examples on these effects further underscores the need to examine any claim of a value-neutral technology. This work examines the intersection of algorithmic bias in consumer mobile health technologies (mHealth). We include mHealth, a term used to describe mobile technology and associated sensors to provide healthcare solutions through patient journeys. We also include mental and behavioral health (mental and physiological) as part of our study. Furthermore, we explore to what extent current mechanisms - legal, technical, and or normative - help mitigate potential risks associated with unwanted bias in intelligent systems that make up the mHealth domain. We provide additional guidance on the role and responsibilities technologists and policymakers have to ensure that such systems empower patients equitably. △ Less

Submitted 28 August, 2022; originally announced September 2022.

arXiv:2204.05754 [pdf, other]

CyNER: A Python Library for Cybersecurity Named Entity Recognition

Authors: Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park, Nidhi Rastogi

Abstract: Open Cyber threat intelligence (OpenCTI) information is available in an unstructured format from heterogeneous sources on the Internet. We present CyNER, an open-source python library for cybersecurity named entity recognition (NER). CyNER combines transformer-based models for extracting cybersecurity-related entities, heuristics for extracting different indicators of compromise, and publicly avai… ▽ More Open Cyber threat intelligence (OpenCTI) information is available in an unstructured format from heterogeneous sources on the Internet. We present CyNER, an open-source python library for cybersecurity named entity recognition (NER). CyNER combines transformer-based models for extracting cybersecurity-related entities, heuristics for extracting different indicators of compromise, and publicly available NER models for generic entity types. We provide models trained on a diverse corpus that users can readily use. Events are described as classes in previous research - MALOnt2.0 (Christian et al., 2021) and MALOnt (Rastogi et al., 2020) and together extract a wide range of malware attack details from a threat intelligence corpus. The user can combine predictions from multiple different approaches to suit their needs. The library is made publicly available. △ Less

Submitted 8 April, 2022; originally announced April 2022.

arXiv:2203.02121 [pdf, other]

Adversarial Patterns: Building Robust Android Malware Classifiers

Authors: Dipkamal Bhusal, Nidhi Rastogi

Abstract: Machine learning models are increasingly being adopted across various fields, such as medicine, business, autonomous vehicles, and cybersecurity, to analyze vast amounts of data, detect patterns, and make predictions or recommendations. In the field of cybersecurity, these models have made significant improvements in malware detection. However, despite their ability to understand complex patterns… ▽ More Machine learning models are increasingly being adopted across various fields, such as medicine, business, autonomous vehicles, and cybersecurity, to analyze vast amounts of data, detect patterns, and make predictions or recommendations. In the field of cybersecurity, these models have made significant improvements in malware detection. However, despite their ability to understand complex patterns from unstructured data, these models are susceptible to adversarial attacks that perform slight modifications in malware samples, leading to misclassification from malignant to benign. Numerous defense approaches have been proposed to either detect such adversarial attacks or improve model robustness. These approaches have resulted in a multitude of attack and defense techniques and the emergence of a field known as `adversarial machine learning.' In this survey paper, we provide a comprehensive review of adversarial machine learning in the context of Android malware classifiers. Android is the most widely used operating system globally and is an easy target for malicious agents. The paper first presents an extensive background on Android malware classifiers, followed by an examination of the latest advancements in adversarial attacks and defenses. Finally, the paper provides guidelines for designing robust malware classifiers and outlines research directions for the future. △ Less

Submitted 12 April, 2024; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: survey

arXiv:2203.00150 [pdf, other]

Explaining RADAR features for detecting spoofing attacks in Connected Autonomous Vehicles

Authors: Nidhi Rastogi, Sara Rampazzi, Michael Clifford, Miriam Heller, Matthew Bishop, Karl Levitt

Abstract: Connected autonomous vehicles (CAVs) are anticipated to have built-in AI systems for defending against cyberattacks. Machine learning (ML) models form the basis of many such AI systems. These models are notorious for acting like black boxes, transforming inputs into solutions with great accuracy, but no explanations support their decisions. Explanations are needed to communicate model performance,… ▽ More Connected autonomous vehicles (CAVs) are anticipated to have built-in AI systems for defending against cyberattacks. Machine learning (ML) models form the basis of many such AI systems. These models are notorious for acting like black boxes, transforming inputs into solutions with great accuracy, but no explanations support their decisions. Explanations are needed to communicate model performance, make decisions transparent, and establish trust in the models with stakeholders. Explanations can also indicate when humans must take control, for instance, when the ML model makes low confidence decisions or offers multiple or ambiguous alternatives. Explanations also provide evidence for post-incident forensic analysis. Research on explainable ML to security problems is limited, and more so concerning CAVs. This paper surfaces a critical yet under-researched sensor data \textit{uncertainty} problem for training ML attack detection models, especially in highly mobile and risk-averse platforms such as autonomous vehicles. We present a model that explains \textit{certainty} and \textit{uncertainty} in sensor input -- a missing characteristic in data collection. We hypothesize that model explanation is inaccurate for a given system without explainable input data quality. We estimate \textit{uncertainty} and mass functions for features in radar sensor data and incorporate them into the training model through experimental evaluation. The mass function allows the classifier to categorize all spoofed inputs accurately with an incorrect class label. △ Less

Submitted 28 February, 2022; originally announced March 2022.

Comments: Accepted at the AAAI 2022 Workshop on Explainable Agency in Artificial Intelligence Workshop, Virtual. 8 pages, 3 Figures, 4 tables

MSC Class: 68M25; 60B11; 68T05

arXiv:2109.01544 [pdf, other]

Ontology-driven Knowledge Graph for Android Malware

Authors: Ryan Christian, Sharmishtha Dutta, Youngja Park, Nidhi Rastogi

Abstract: We present MalONT2.0 -- an ontology for malware threat intelligence \cite{rastogi2020malont}. New classes (attack patterns, infrastructural resources to enable attacks, malware analysis to incorporate static analysis, and dynamic analysis of binaries) and relations have been added following a broadened scope of core competency questions. MalONT2.0 allows researchers to extensively capture all requ… ▽ More We present MalONT2.0 -- an ontology for malware threat intelligence \cite{rastogi2020malont}. New classes (attack patterns, infrastructural resources to enable attacks, malware analysis to incorporate static analysis, and dynamic analysis of binaries) and relations have been added following a broadened scope of core competency questions. MalONT2.0 allows researchers to extensively capture all requisite classes and relations that gather semantic and syntactic characteristics of an android malware attack. This ontology forms the basis for the malware threat intelligence knowledge graph, MalKG, which we exemplify using three different, non-overlapping demonstrations. Malware features have been extracted from CTI reports on android threat intelligence shared on the Internet and written in the form of unstructured text. Some of these sources are blogs, threat intelligence reports, tweets, and news articles. The smallest unit of information that captures malware features is written as triples comprising head and tail entities, each connected with a relation. In the poster and demonstration, we discuss MalONT2.0, MalKG, as well as the dynamically growing knowledge graph, TINKER. △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: 3 pages, 5 figures

arXiv:2102.05600 [pdf, other]

DANTE: Predicting Insider Threat using LSTM on system logs

Authors: Nidhi Rastogi, Qicheng Ma

Abstract: Insider threat is one of the most pernicious threat vectors to information and communication technologies (ICT)across the world due to the elevated level of trust and access that an insider is afforded. This type of threat can stem from both malicious users with a motive as well as negligent users who inadvertently reveal details about trade secrets, company information, or even access information… ▽ More Insider threat is one of the most pernicious threat vectors to information and communication technologies (ICT)across the world due to the elevated level of trust and access that an insider is afforded. This type of threat can stem from both malicious users with a motive as well as negligent users who inadvertently reveal details about trade secrets, company information, or even access information to malignant players. In this paper, we propose a novel approach that uses system logs to detect insider behavior using a special recurrent neural network (RNN) model. Ground truth is established using DANTE and used as the baseline for identifying anomalous behavior. For this, system logs are modeled as a natural language sequence and patterns are extracted from these sequences. We create workflows of sequences of actions that follow a natural language logic and control flow. These flows are assigned various categories of behaviors - malignant or benign. Any deviation from these sequences indicates the presence of a threat. We further classify threats into one of the five categories provided in the CERT insider threat dataset. Through experimental evaluation, we show that the proposed model can achieve 99% prediction accuracy. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 6 pages

arXiv:2102.05583 [pdf, other]

doi 10.13140/RG.2.2.27340.95367

Malware Knowledge Graph Generation

Authors: Sharmishtha Dutta, Nidhi Rastogi, Destin Yee, Chuqiao Gu, Qicheng Ma

Abstract: Cyber threat and attack intelligence information are available in non-standard format from heterogeneous sources. Comprehending them and utilizing them for threat intelligence extraction requires engaging security experts. Knowledge graphs enable converting this unstructured information from heterogeneous sources into a structured representation of data and factual knowledge for several downstream… ▽ More Cyber threat and attack intelligence information are available in non-standard format from heterogeneous sources. Comprehending them and utilizing them for threat intelligence extraction requires engaging security experts. Knowledge graphs enable converting this unstructured information from heterogeneous sources into a structured representation of data and factual knowledge for several downstream tasks such as predicting missing information and future threat trends. Existing large-scale knowledge graphs mainly focus on general classes of entities and relationships between them. Open-source knowledge graphs for the security domain do not exist. To fill this gap, we've built \textsf{TINKER} - a knowledge graph for threat intelligence (\textbf{T}hreat \textbf{IN}telligence \textbf{K}nowl\textbf{E}dge g\textbf{R}aph). \textsf{TINKER} is generated using RDF triples describing entities and relations from tokenized unstructured natural language text from 83 threat reports published between 2006-2021. We built \textsf{TINKER} using classes and properties defined by open-source malware ontology and using hand-annotated RDF triples. We also discuss ongoing research and challenges faced while creating \textsf{TINKER}. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 5 pages

arXiv:2102.05571 [pdf, other]

TINKER: A framework for Open source Cyberthreat Intelligence

Authors: Nidhi Rastogi, Sharmishtha Dutta, Mohammed J. Zaki, Alex Gittens, Charu Aggarwal

Abstract: Threat intelligence on malware attacks and campaigns is increasingly being shared with other security experts for a cost or for free. Other security analysts use this intelligence to inform them of indicators of compromise, attack techniques, and preventative actions. Security analysts prepare threat analysis reports after investigating an attack, an emerging cyber threat, or a recently discovered… ▽ More Threat intelligence on malware attacks and campaigns is increasingly being shared with other security experts for a cost or for free. Other security analysts use this intelligence to inform them of indicators of compromise, attack techniques, and preventative actions. Security analysts prepare threat analysis reports after investigating an attack, an emerging cyber threat, or a recently discovered vulnerability. Collectively known as cyber threat intelligence (CTI), the reports are typically in an unstructured format and, therefore, challenging to integrate seamlessly into existing intrusion detection systems. This paper proposes a framework that uses the aggregated CTI for analysis and defense at scale. The information is extracted and stored in a structured format using knowledge graphs such that the semantics of the threat intelligence can be preserved and shared at scale with other security analysts. Specifically, we propose the first semi-supervised open-source knowledge graph-based framework, TINKER, to capture cyber threat information and its context. Following TINKER, we generate a Cyberthreat Intelligence Knowledge Graph (CTI-KG) and demonstrate the usage using different use cases. △ Less

Submitted 19 January, 2023; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: 9 pages

arXiv:2006.11446 [pdf, other]

doi 10.13140/RG.2.2.16426.64962

MALOnt: An Ontology for Malware Threat Intelligence

Authors: Nidhi Rastogi, Sharmishtha Dutta, Mohammed J. Zaki, Alex Gittens, Charu Aggarwal

Abstract: Malware threat intelligence uncovers deep information about malware, threat actors, and their tactics, Indicators of Compromise(IoC), and vulnerabilities in different platforms from scattered threat sources. This collective information can guide decision making in cyber defense applications utilized by security operation centers(SoCs). In this paper, we introduce an open-source malware ontology -… ▽ More Malware threat intelligence uncovers deep information about malware, threat actors, and their tactics, Indicators of Compromise(IoC), and vulnerabilities in different platforms from scattered threat sources. This collective information can guide decision making in cyber defense applications utilized by security operation centers(SoCs). In this paper, we introduce an open-source malware ontology - MALOnt that allows the structured extraction of information and knowledge graph generation, especially for threat intelligence. The knowledge graph that uses MALOnt is instantiated from a corpus comprising hundreds of annotated malware threat reports. The knowledge graph enables the analysis, detection, classification, and attribution of cyber threats caused by malware. We also demonstrate the annotation process using MALOnt on exemplar threat intelligence reports. A work in progress, this research is part of a larger effort towards auto-generation of knowledge graphs (KGs)for gathering malware threat intelligence from heterogeneous online resources. △ Less

Submitted 19 June, 2020; originally announced June 2020.

arXiv:2004.00071 [pdf, ps, other]

Personal Health Knowledge Graphs for Patients

Authors: Nidhi Rastogi, Mohammed J. Zaki

Abstract: Existing patient data analytics platforms fail to incorporate information that has context, is personal, and topical to patients. For a recommendation system to give a suitable response to a query or to derive meaningful insights from patient data, it should consider personal information about the patient's health history, including but not limited to their preferences, locations, and life choices… ▽ More Existing patient data analytics platforms fail to incorporate information that has context, is personal, and topical to patients. For a recommendation system to give a suitable response to a query or to derive meaningful insights from patient data, it should consider personal information about the patient's health history, including but not limited to their preferences, locations, and life choices that are currently applicable to them. In this review paper, we critique existing literature in this space and also discuss the various research challenges that come with designing, building, and operationalizing a personal health knowledge graph (PHKG) for patients. △ Less

Submitted 7 May, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

Comments: 3 pages, workshop paper

ACM Class: I.2.4

arXiv:1904.12138 [pdf]

Exploring Information Centrality for Intrusion Detection in Large Networks

Authors: Nidhi Rastogi

Abstract: Modern networked systems are constantly under threat from systemic attacks. There has been a massive upsurge in the number of devices connected to a network as well as the associated traffic volume. This has intensified the need to better understand all possible attack vectors during system design and implementation. Further, it has increased the need to mine large data sets, analyzing which has b… ▽ More Modern networked systems are constantly under threat from systemic attacks. There has been a massive upsurge in the number of devices connected to a network as well as the associated traffic volume. This has intensified the need to better understand all possible attack vectors during system design and implementation. Further, it has increased the need to mine large data sets, analyzing which has become a daunting task. It is critical to scale monitoring infrastructures to match this need, but a difficult goal for the small and medium organization. Hence, there is a need to propose novel approaches that address the big data problem in security. Information Centrality (IC) labels network nodes with better vantage points for detecting network-based anomalies as central nodes and uses them for detecting a category of attacks called systemic attacks. The main idea is that since these central nodes already see a lot of information flowing through the network, they are in a good position to detect anomalies before other nodes. This research first dives into the importance of using graphs in understanding the topology and information flow. We then introduce the usage of information centrality, a centrality-based index, to reduce data collection in existing communication networks. Using IC-identified central nodes can accelerate outlier detection when armed with a suitable anomaly detection technique. We also come up with a more efficient way to compute Information centrality for large networks. Finally, we demonstrate that central nodes detect anomalous behavior much faster than other non-central nodes, given the anomalous behavior is systemic in nature. △ Less

Submitted 12 June, 2020; v1 submitted 27 April, 2019; originally announced April 2019.

Comments: 14 pages, 4 figures, 18th Annual Security Conference

ACM Class: D.4.6; E.3

Journal ref: In Proceedings of the Annual Information Institute Conference, March 26-28, 2018. Las Vegas, USA. ISBN: 978-1-935160-19-9

arXiv:1701.06828 [pdf]

Security and Privacy of performing Data Analytics in the cloud - A three-way handshake of Technology, Policy, and Management

Authors: Nidhi Rastogi, Marie Joan Kristine Gloria, James Hendler

Abstract: Cloud platform came into existence primarily to accelerate IT delivery and to promote innovation. To this point, it has performed largely well to the expectations of technologists, businesses and customers. The service aspect of this technology has paved the road for a faster set up of infrastructure and related goals for both startups and established organizations. This has further led to quicker… ▽ More Cloud platform came into existence primarily to accelerate IT delivery and to promote innovation. To this point, it has performed largely well to the expectations of technologists, businesses and customers. The service aspect of this technology has paved the road for a faster set up of infrastructure and related goals for both startups and established organizations. This has further led to quicker delivery of many user-friendly applications to the market while proving to be a commercially viable option to companies with limited resources. On the technology front, the creation and adoption of this ecosystem has allowed easy collection of massive data from various sources at one place, where the place is sometimes referred as just the cloud. Efficient data mining can be performed on raw data to extract potentially useful information, which was not possible at this scale before. Targeted advertising is a common example that can help businesses. Despite these promising offerings, concerns around security and privacy of user information suppressed wider acceptance and an all-encompassing deployment of the cloud platform. In this paper, we discuss security and privacy concerns that occur due to data exchanging hands between a cloud servicer provider (CSP) and the primary cloud user - the data collector, from the content generator. We offer solutions that encompass technology, policy and sound management of the cloud service asserting that this approach has the potential to provide a holistic solution. △ Less

Submitted 24 January, 2017; originally announced January 2017.

Comments: 28 pages, 3 figures, Journal of Information Privacy

ACM Class: C.2.0; D.4.6; K.4.1; C.2.4

Journal ref: Journal of Information Policy 5 (2015): 129-154

arXiv:1701.06823 [pdf, other]

Graph Analytics for anomaly detection in homogeneous wireless networks - A Simulation Approach

Authors: Nidhi Rastogi, James Hendler

Abstract: In the Internet of Things (IoT) devices are exposed to various kinds of attacks when connected to the Internet. An attack detection mechanism that understands the limitations of these severely resource-constrained devices is necessary. This is important since current approaches are either customized for wireless networks or for the conventional Internet with heavy data transmission. Also, the dete… ▽ More In the Internet of Things (IoT) devices are exposed to various kinds of attacks when connected to the Internet. An attack detection mechanism that understands the limitations of these severely resource-constrained devices is necessary. This is important since current approaches are either customized for wireless networks or for the conventional Internet with heavy data transmission. Also, the detection mechanism need not always be as sophisticated. Simply signaling that an attack is taking place may be enough in some situations, for example in NIDS using anomaly detection. In graph networks, central nodes are the nodes that bear the most influence in the network. The purpose of this research is to explore experimentally the relationship between the behavior of central nodes and anomaly detection when an attack spreads through a network. As a result, we propose a novel anomaly detection approach using this unique methodology which has been unexplored so far in communication networks. Also, in the experiment, we identify presence of an attack originating and propagating throughout a network of IoT using our methodology. △ Less

Submitted 24 January, 2017; originally announced January 2017.

Comments: 5 pages, 4 figures, ICCWS

ACM Class: C.2.0; D.4.6

arXiv:1701.06817 [pdf]

WhatsApp security and role of metadata in preserving privacy

Authors: Nidhi Rastogi, James Hendler

Abstract: WhatsApp messenger is arguably the most popular mobile app available on all smart-phones. Over one billion people worldwide for free messaging, calling, and media sharing use it. In April 2016, WhatsApp switched to a default end-to-end encrypted service. This means that all messages (SMS), phone calls, videos, audios, and any other form of information exchanged cannot be read by any unauthorized e… ▽ More WhatsApp messenger is arguably the most popular mobile app available on all smart-phones. Over one billion people worldwide for free messaging, calling, and media sharing use it. In April 2016, WhatsApp switched to a default end-to-end encrypted service. This means that all messages (SMS), phone calls, videos, audios, and any other form of information exchanged cannot be read by any unauthorized entity since WhatsApp. In this paper we analyze the WhatsApp messaging platform and critique its security architecture along with a focus on its privacy preservation mechanisms. We report that the Signal Protocol, which forms the basis of WhatsApp end-to-end encryption, does offer protection against forward secrecy, and MITM to a large extent. Finally, we argue that simply encrypting the end-to-end channel cannot preserve privacy. The metadata can reveal just enough information to show connections between people, their patterns, and personal information. This paper elaborates on the security architecture of WhatsApp and performs an analysis on the various protocols used. This enlightens us on the status quo of the app security and what further measures can be used to fill existing gaps without compromising the usability. We start by describing the following (i) important concepts that need to be understood to properly understand security, (ii) the security architecture, (iii) security evaluation, (iv) followed by a summary of our work. Some of the important concepts that we cover in this paper before evaluating the architecture are - end-to-end encryption (E2EE), signal protocol, and curve25519. The description of the security architecture covers key management, end-to-end encryption in WhatsApp, Authentication Mechanism, Message Exchange, and finally the security evaluation. We then cover importance of metadata and role it plays in conserving privacy with respect to whatsapp. △ Less

Submitted 24 January, 2017; originally announced January 2017.

Comments: 8 pages, 2 figures

ACM Class: C.2.0; D.4.6

Showing 1–23 of 23 results for author: Rastogi, N