-
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions
Authors:
Yiran Cheng,
Lwin Khin Shar,
Ting Zhang,
Shouguo Yang,
Chaopeng Dong,
David Lo,
Shichao Lv,
Zhiqiang Shi,
Limin Sun
Abstract:
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vul…
▽ More
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules. They then use syntactic-level code clone detection to identify the vulnerable versions. These methods are hindered by imprecisions due to (1) the inclusion of vulnerability-irrelevant code in the analysis and (2) the inadequacy of syntactic-level code clone detection. This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++. Vercation combines program slicing with a Large Language Model (LLM) to identify vulnerability-relevant code from vulnerability patches. It then backtraces historical commits to gather previous modifications of identified vulnerability-relevant code. We propose semantic-level code clone detection to compare the differences between pre-modification and post-modification code, thereby locating the vulnerability-introducing commit (vic) and enabling to identify the vulnerable versions between the patch commit and the vic. We curate a dataset linking 74 OSS vulnerabilities and 1013 versions to evaluate Vercation. On this dataset, our approach achieves the F1 score of 92.4%, outperforming current state-of-the-art methods. More importantly, Vercation detected 134 incorrect vulnerable OSS versions in NVD reports.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
The Price of Prompting: Profiling Energy Use in Large Language Models Inference
Authors:
Erik Johannes Husom,
Arda Goknil,
Lwin Khin Shar,
Sagar Sen
Abstract:
In the rapidly evolving realm of artificial intelligence, deploying large language models (LLMs) poses increasingly pressing computational and environmental challenges. This paper introduces MELODI - Monitoring Energy Levels and Optimization for Data-driven Inference - a multifaceted framework crafted to monitor and analyze the energy consumed during LLM inference processes. MELODI enables detaile…
▽ More
In the rapidly evolving realm of artificial intelligence, deploying large language models (LLMs) poses increasingly pressing computational and environmental challenges. This paper introduces MELODI - Monitoring Energy Levels and Optimization for Data-driven Inference - a multifaceted framework crafted to monitor and analyze the energy consumed during LLM inference processes. MELODI enables detailed observations of power consumption dynamics and facilitates the creation of a comprehensive dataset reflective of energy efficiency across varied deployment scenarios. The dataset, generated using MELODI, encompasses a broad spectrum of LLM deployment frameworks, multiple language models, and extensive prompt datasets, enabling a comparative analysis of energy use. Using the dataset, we investigate how prompt attributes, including length and complexity, correlate with energy expenditure. Our findings indicate substantial disparities in energy efficiency, suggesting ample scope for optimization and adoption of sustainable measures in LLM deployment. Our contribution lies not only in the MELODI framework but also in the novel dataset, a resource that can be expanded by other researchers. Thus, MELODI is a foundational tool and dataset for advancing research into energy-conscious LLM deployment, steering the field toward a more sustainable future.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Security Modelling for Cyber-Physical Systems: A Systematic Literature Review
Authors:
Shaofei Huang,
Christopher M. Poskitt,
Lwin Khin Shar
Abstract:
Cyber-physical systems (CPS) are at the intersection of digital technology and engineering domains, rendering them high-value targets of sophisticated and well-funded cybersecurity threat actors. Prominent cybersecurity attacks on CPS have brought attention to the vulnerability of these systems, and the soft underbelly of critical infrastructure reliant on CPS. Security modelling for CPS is an imp…
▽ More
Cyber-physical systems (CPS) are at the intersection of digital technology and engineering domains, rendering them high-value targets of sophisticated and well-funded cybersecurity threat actors. Prominent cybersecurity attacks on CPS have brought attention to the vulnerability of these systems, and the soft underbelly of critical infrastructure reliant on CPS. Security modelling for CPS is an important mechanism to systematically identify and assess vulnerabilities, threats, and risks throughout system lifecycles, and to ultimately ensure system resilience, safety, and reliability. This literature review delves into state-of-the-art research in CPS security modelling, encompassing both threat and attack modelling. While these terms are sometimes used interchangeably, they are different concepts. This article elaborates on the differences between threat and attack modelling, examining their implications for CPS security. A systematic search yielded 428 articles, from which 15 were selected and categorised into three clusters: those focused on threat modelling methods, attack modelling methods, and literature reviews. Specifically, we sought to examine what security modelling methods exist today, and how they address real-world cybersecurity threats and CPS-specific attacker capabilities throughout the lifecycle of CPS, which typically span longer durations compared to traditional IT systems. This article also highlights several limitations in existing research, wherein security models adopt simplistic approaches that do not adequately consider the dynamic, multi-layer, multi-path, and multi-agent characteristics of real-world cyber-physical attacks.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Decentralized Multimedia Data Sharing in IoV: A Learning-based Equilibrium of Supply and Demand
Authors:
Jiani Fan,
Minrui Xu,
Jiale Guo,
Lwin Khin Shar,
Jiawen Kang,
Dusit Niyato,
Kwok-Yan Lam
Abstract:
The Internet of Vehicles (IoV) has great potential to transform transportation systems by enhancing road safety, reducing traffic congestion, and improving user experience through onboard infotainment applications. Decentralized data sharing can improve security, privacy, reliability, and facilitate infotainment data sharing in IoVs. However, decentralized data sharing may not achieve the expected…
▽ More
The Internet of Vehicles (IoV) has great potential to transform transportation systems by enhancing road safety, reducing traffic congestion, and improving user experience through onboard infotainment applications. Decentralized data sharing can improve security, privacy, reliability, and facilitate infotainment data sharing in IoVs. However, decentralized data sharing may not achieve the expected efficiency if there are IoV users who only want to consume the shared data but are not willing to contribute their own data to the community, resulting in incomplete information observed by other vehicles and infrastructure, which can introduce additional transmission latency. Therefore, in this article, by modeling the data sharing ecosystem as a data trading market, we propose a decentralized data-sharing incentive mechanism based on multi-intelligent reinforcement learning to learn the supply-demand balance in markets and minimize transmission latency. Our proposed mechanism takes into account the dynamic nature of IoV markets, which can experience frequent fluctuations in supply and demand. We propose a time-sensitive Key-Policy Attribute-Based Encryption (KP-ABE) mechanism coupled with Named Data Networking (NDN) to protect data in IoVs, which adds a layer of security to our proposed solution. Additionally, we design a decentralized market for efficient data sharing in IoVs, where continuous double auctions are adopted. The proposed mechanism based on multi-agent deep reinforcement learning can learn the supply-demand equilibrium in markets, thus improving the efficiency and sustainability of markets. Theoretical analysis and experimental results show that our proposed learning-based incentive mechanism outperforms baselines by 10% in determining the equilibrium of supply and demand while reducing transmission latency by 20%.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Differentiated Security Architecture for Secure and Efficient Infotainment Data Communication in IoV Networks
Authors:
Jiani Fan,
Lwin Khin Shar,
Jiale Guo,
Wenzhuo Yang,
Dusit Niyato,
Kwok-Yan Lam
Abstract:
This paper aims to provide differentiated security protection for infotainment data communication in Internet-of-Vehicle (IoV) networks. The IoV is a network of vehicles that uses various sensors, software, built-in hardware, and communication technologies to enable information exchange between pedestrians, cars, and urban infrastructure. Negligence on the security of infotainment data communicati…
▽ More
This paper aims to provide differentiated security protection for infotainment data communication in Internet-of-Vehicle (IoV) networks. The IoV is a network of vehicles that uses various sensors, software, built-in hardware, and communication technologies to enable information exchange between pedestrians, cars, and urban infrastructure. Negligence on the security of infotainment data communication in IoV networks can unintentionally open an easy access point for social engineering attacks. The attacker can spread false information about traffic conditions, mislead drivers in their directions, and interfere with traffic management. Such attacks can also cause distractions to the driver, which has a potential implication for the safety of driving. The existing literature on IoV communication and network security focuses mainly on generic solutions. In a heterogeneous communication network where different types of communication coexist, we can improve the efficiency of security solutions by considering the different security and efficiency requirements of data communications. Hence, we propose a differentiated security mechanism for protecting infotainment data communication in IoV networks. In particular, we first classify data communication in the IoV network, examine the security focus of each data communication, and then develop a differentiated security architecture to provide security protection on a file-to-file basis. Our architecture leverages Named Data Networking (NDN) so that infotainment files can be efficiently circulated throughout the network where any node can own a copy of the file, thus improving the hit ratio for user file requests. In addition, we propose a time-sensitive Key-Policy Attribute-Based Encryption (KP-ABE) scheme for sharing subscription-based infotainment data...
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
XSS for the Masses: Integrating Security in a Web Programming Course using a Security Scanner
Authors:
Lwin Khin Shar,
Christopher M. Poskitt,
Kyong Jin Shim,
Li Ying Leonard Wong
Abstract:
Cybersecurity education is considered an important part of undergraduate computing curricula, but many institutions teach it only in dedicated courses or tracks. This optionality risks students graduating with limited exposure to secure coding practices that are expected in industry. An alternative approach is to integrate cybersecurity concepts across non-security courses, so as to expose student…
▽ More
Cybersecurity education is considered an important part of undergraduate computing curricula, but many institutions teach it only in dedicated courses or tracks. This optionality risks students graduating with limited exposure to secure coding practices that are expected in industry. An alternative approach is to integrate cybersecurity concepts across non-security courses, so as to expose students to the interplay between security and other sub-areas of computing. In this paper, we report on our experience of applying the security integration approach to an undergraduate web programming course. In particular, we added a practical introduction to secure coding, which highlighted the OWASP Top 10 vulnerabilities by example, and demonstrated how to identify them using out-of-the-box security scanner tools (e.g. ZAP). Furthermore, we incentivised students to utilise these tools in their own course projects by offering bonus marks. To assess the impact of this intervention, we scanned students' project code over the last three years, finding a reduction in the number of vulnerabilities. Finally, in focus groups and a survey, students shared that our intervention helped to raise awareness, but they also highlighted the importance of grading incentives and the need to teach security content earlier.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
AnFlo: Detecting Anomalous Sensitive Information Flows in Android Apps
Authors:
Biniam Fisseha Demissie,
Mariano Ceccato,
Lwin Khin Shar
Abstract:
Smartphone apps usually have access to sensitive user data such as contacts, geo-location, and account credentials and they might share such data to external entities through the Internet or with other apps. Confidentiality of user data could be breached if there are anomalies in the way sensitive data is handled by an app which is vulnerable or malicious. Existing approaches that detect anomalous…
▽ More
Smartphone apps usually have access to sensitive user data such as contacts, geo-location, and account credentials and they might share such data to external entities through the Internet or with other apps. Confidentiality of user data could be breached if there are anomalies in the way sensitive data is handled by an app which is vulnerable or malicious. Existing approaches that detect anomalous sensitive data flows have limitations in terms of accuracy because the definition of anomalous flows may differ for different apps with different functionalities; it is normal for "Health" apps to share heart rate information through the Internet but is anomalous for "Travel" apps.
In this paper, we propose a novel approach to detect anomalous sensitive data flows in Android apps, with improved accuracy. To achieve this objective, we first group trusted apps according to the topics inferred from their functional descriptions. We then learn sensitive information flows with respect to each group of trusted apps. For a given app under analysis, anomalies are identified by comparing sensitive information flows in the app against those flows learned from trusted apps grouped under the same topic. In the evaluation, information flow is learned from 11,796 trusted apps. We then checked for anomalies in 596 new (benign) apps and identified 2 previously-unknown vulnerable apps related to anomalous flows. We also analyzed 18 malware apps and found anomalies in 6 of them.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.