-
Using Large Language Models in Public Transit Systems, San Antonio as a case study
Authors:
Ramya Jonnala,
Gongbo Liang,
Jeong Yang,
Izzat Alsmadi
Abstract:
The integration of large language models into public transit systems represents a significant advancement in urban transportation management and passenger experience. This study examines the impact of LLMs within San Antonio's public transit system, leveraging their capabilities in natural language processing, data analysis, and real time communication. By utilizing GTFS and other public transport…
▽ More
The integration of large language models into public transit systems represents a significant advancement in urban transportation management and passenger experience. This study examines the impact of LLMs within San Antonio's public transit system, leveraging their capabilities in natural language processing, data analysis, and real time communication. By utilizing GTFS and other public transportation information, the research highlights the transformative potential of LLMs in enhancing route planning, reducing wait times, and providing personalized travel assistance. Our case study is the city of San Antonio as part of a project aiming to demonstrate how LLMs can optimize resource allocation, improve passenger satisfaction, and support decision making processes in transit management. We evaluated LLM responses to questions related to both information retrieval and also understanding. Ultimately, we believe that the adoption of LLMs in public transit systems can lead to more efficient, responsive, and user-friendly transportation networks, providing a model for other cities to follow.
△ Less
Submitted 25 June, 2024;
originally announced July 2024.
-
Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models
Authors:
Garrett Crumrine,
Izzat Alsmadi,
Jesus Guerrero,
Yuvaraj Munian
Abstract:
Large language models (LLMs) have revolutionized how we interact with machines. However, this technological advancement has been paralleled by the emergence of "Mallas," malicious services operating underground that exploit LLMs for nefarious purposes. Such services create malware, phishing attacks, and deceptive websites, escalating the cyber security threats landscape. This paper delves into the…
▽ More
Large language models (LLMs) have revolutionized how we interact with machines. However, this technological advancement has been paralleled by the emergence of "Mallas," malicious services operating underground that exploit LLMs for nefarious purposes. Such services create malware, phishing attacks, and deceptive websites, escalating the cyber security threats landscape. This paper delves into the proliferation of Mallas by examining the use of various pre-trained language models and their efficiency and vulnerabilities when misused. Building on a dataset from the Common Vulnerabilities and Exposures (CVE) program, it explores fine-tuning methodologies to generate code and explanatory text related to identified vulnerabilities. This research aims to shed light on the operational strategies and exploitation techniques of Mallas, leading to the development of more secure and trustworthy AI applications. The paper concludes by emphasizing the need for further research, enhanced safeguards, and ethical guidelines to mitigate the risks associated with the malicious application of LLMs.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Predicting Question Quality on StackOverflow with Neural Networks
Authors:
Mohammad Al-Ramahi,
Izzat Alsmadi,
Abdullah Wahbeh
Abstract:
The wealth of information available through the Internet and social media is unprecedented. Within computing fields, websites such as Stack Overflow are considered important sources for users seeking solutions to their computing and programming issues. However, like other social media platforms, Stack Overflow contains a mixture of relevant and irrelevant information. In this paper, we evaluated n…
▽ More
The wealth of information available through the Internet and social media is unprecedented. Within computing fields, websites such as Stack Overflow are considered important sources for users seeking solutions to their computing and programming issues. However, like other social media platforms, Stack Overflow contains a mixture of relevant and irrelevant information. In this paper, we evaluated neural network models to predict the quality of questions on Stack Overflow, as an example of Question Answering (QA) communities. Our results demonstrate the effectiveness of neural network models compared to baseline machine learning models, achieving an accuracy of 80%. Furthermore, our findings indicate that the number of layers in the neural network model can significantly impact its performance.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
A Review on Searchable Encryption Functionality and the Evaluation of Homomorphic Encryption
Authors:
Brian Kishiyama,
Izzat Alsmadi
Abstract:
Cloud Service Providers, such as Google Cloud Platform, Microsoft Azure, or Amazon Web Services, offer continuously evolving cloud services. It is a growing industry. Businesses, such as Netflix and PayPal, rely on the Cloud for data storage, computing power, and other services. For businesses, the cloud reduces costs, provides flexibility, and allows for growth. However, there are security and pr…
▽ More
Cloud Service Providers, such as Google Cloud Platform, Microsoft Azure, or Amazon Web Services, offer continuously evolving cloud services. It is a growing industry. Businesses, such as Netflix and PayPal, rely on the Cloud for data storage, computing power, and other services. For businesses, the cloud reduces costs, provides flexibility, and allows for growth. However, there are security and privacy concerns regarding the Cloud. Because Cloud services are accessed through the internet, hackers and attackers could possibly access the servers from anywhere. To protect data in the Cloud, it should be encrypted before it is uploaded, it should be protected in storage and also in transit. On the other hand, data owners may need to access their encrypted data. It may also need to be altered, updated, deleted, read, searched, or shared with others. If data is decrypted in the Cloud, sensitive data is exposed and could be exposed and misused. One solution is to leave the data in its encrypted form and use Searchable Encryption (SE) which operates on encrypted data. The functionality of SE has improved since its inception and research continues to explore ways to improve SE. This paper reviews the functionality of Searchable Encryption, mostly related to Cloud services, in the years 2019 to 2023, and evaluates one of its schemes, Fully Homomorphic Encryption. Overall, it seems that research is at the point where SE efficiency is increased as multiple functionalities are aggregated and tested.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Mutation-Based Adversarial Attacks on Neural Text Detectors
Authors:
Gongbo Liang,
Jesus Guerrero,
Izzat Alsmadi
Abstract:
Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose characte…
▽ More
Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
Balanced Datasets for IoT IDS
Authors:
Alaa Alhowaide,
Izzat Alsmadi,
Jian Tang
Abstract:
As the Internet of Things (IoT) continues to grow, cyberattacks are becoming increasingly common. The security of IoT networks relies heavily on intrusion detection systems (IDSs). The development of an IDS that is accurate and efficient is a challenging task. As a result, this challenge is made more challenging by the absence of balanced datasets for training and testing the proposed IDS. In this…
▽ More
As the Internet of Things (IoT) continues to grow, cyberattacks are becoming increasingly common. The security of IoT networks relies heavily on intrusion detection systems (IDSs). The development of an IDS that is accurate and efficient is a challenging task. As a result, this challenge is made more challenging by the absence of balanced datasets for training and testing the proposed IDS. In this study, four commonly used datasets are visualized and analyzed visually. Moreover, it proposes a sampling algorithm that generates a sample that represents the original dataset. In addition, it proposes an algorithm to generate a balanced dataset. Researchers can use this paper as a starting point when investigating cybersecurity and machine learning. The proposed sampling algorithms showed reliability in generating well-representing and balanced samples from NSL-KDD, UNSW-NB15, BotNetIoT-01, and BoTIoT datasets.
△ Less
Submitted 15 December, 2022;
originally announced January 2023.
-
A Mutation-based Text Generation for Adversarial Machine Learning Applications
Authors:
Jesus Guerrero,
Gongbo Liang,
Izzat Alsmadi
Abstract:
Many natural language related applications involve text generation, created by humans or machines. While in many of those applications machines support humans, yet in few others, (e.g. adversarial machine learning, social bots and trolls) machines try to impersonate humans. In this scope, we proposed and evaluated several mutation-based text generation approaches. Unlike machine-based generated te…
▽ More
Many natural language related applications involve text generation, created by humans or machines. While in many of those applications machines support humans, yet in few others, (e.g. adversarial machine learning, social bots and trolls) machines try to impersonate humans. In this scope, we proposed and evaluated several mutation-based text generation approaches. Unlike machine-based generated text, mutation-based generated text needs human text samples as inputs. We showed examples of mutation operators but this work can be extended in many aspects such as proposing new text-based mutation operators based on the nature of the application.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Synthetic Text Detection: Systemic Literature Review
Authors:
Jesus Guerrero,
Izzat Alsmadi
Abstract:
Within the text analysis and processing fields, generated text attacks have been made easier to create than ever before. To combat these attacks open sourcing models and datasets have become a major trend to create automated detection algorithms in defense of authenticity. For this purpose, synthetic text detection has become an increasingly viable topic of research. This review is written for the…
▽ More
Within the text analysis and processing fields, generated text attacks have been made easier to create than ever before. To combat these attacks open sourcing models and datasets have become a major trend to create automated detection algorithms in defense of authenticity. For this purpose, synthetic text detection has become an increasingly viable topic of research. This review is written for the purpose of creating a snapshot of the state of current literature and easing the barrier to entry for future authors. Towards that goal, we identified few research trends and challenges in this field.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
Benchmark Assessment for DeepSpeed Optimization Library
Authors:
Gongbo Liang,
Izzat Alsmadi
Abstract:
Deep Learning (DL) models are widely used in machine learning due to their performance and ability to deal with large datasets while producing high accuracy and performance metrics. The size of such datasets and the complexity of DL models cause such models to be complex, consuming large amount of resources and time to train. Many recent libraries and applications are introduced to deal with DL co…
▽ More
Deep Learning (DL) models are widely used in machine learning due to their performance and ability to deal with large datasets while producing high accuracy and performance metrics. The size of such datasets and the complexity of DL models cause such models to be complex, consuming large amount of resources and time to train. Many recent libraries and applications are introduced to deal with DL complexity and efficiency issues. In this paper, we evaluated one example, Microsoft DeepSpeed library through classification tasks. DeepSpeed public sources reported classification performance metrics on the LeNet architecture. We extended this through evaluating the library on several modern neural network architectures, including convolutional neural networks (CNNs) and Vision Transformer (ViT). Results indicated that DeepSpeed, while can make improvements in some of those cases, it has no or negative impact on others.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Event Detection in Twitter: A Content and Time-Based Analysis
Authors:
Izzat Alsmadi,
Michael O'Brien
Abstract:
The detection of events from online social networks is a recent, evolving field that attracts researchers from across a spectrum of disciplines and domains. Here we report a time-series analysis for predicting events. In particular, we evaluated the frequency distribution of top n-grams of terms over time, focusing on two indicators: high-frequency n-grams over both short and long periods of time.…
▽ More
The detection of events from online social networks is a recent, evolving field that attracts researchers from across a spectrum of disciplines and domains. Here we report a time-series analysis for predicting events. In particular, we evaluated the frequency distribution of top n-grams of terms over time, focusing on two indicators: high-frequency n-grams over both short and long periods of time. Both indicators can refer to certain aspects of events as they evolve. To evaluate the models accuracy in detecting events, we built and used a Twitter dataset of the most popular hashtags that surrounded the well-documented protests that occurred at the University of Missouri (Mizzou) in late 2015.
△ Less
Submitted 18 October, 2021;
originally announced November 2021.
-
Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions
Authors:
Izzat Alsmadi,
Kashif Ahmad,
Mahmoud Nazzal,
Firoj Alam,
Ala Al-Fuqaha,
Abdallah Khreishah,
Abdulelah Algosaibi
Abstract:
The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing(NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these MLand NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adve…
▽ More
The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing(NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these MLand NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adversarial attacks on these algorithms in different applications of social media text processing. In this paper, we provide a comprehensive review of the main approaches for adversarial attacks and defenses in the context of social media applications with a particular focus on key challenges and future research directions. In detail, we cover literature on six key applications, namely (i) rumors detection, (ii) satires detection, (iii) clickbait & spams identification, (iv) hate speech detection, (v)misinformation detection, and (vi) sentiment analysis. We then highlight the concurrent and anticipated future research questions and provide recommendations and directions for future work.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
An ontological analysis of misinformation in online social networks
Authors:
Izzat Alsmadi,
Iyad Alazzam,
Mohammad A. AlRamahi
Abstract:
The internet, Online Social Networks (OSNs) and smart phones enable users to create tremendous amount of information. Users who search for general or specific knowledge may not have these days problems of information scarce but misinformation. Misinformation nowadays can refer to a continuous spectrum between what can be seen as "facts" or "truth", if humans agree on the existence of such, to fals…
▽ More
The internet, Online Social Networks (OSNs) and smart phones enable users to create tremendous amount of information. Users who search for general or specific knowledge may not have these days problems of information scarce but misinformation. Misinformation nowadays can refer to a continuous spectrum between what can be seen as "facts" or "truth", if humans agree on the existence of such, to false information that everyone agree that it is false. In this paper, we will look at this spectrum of information/misinformation and compare between some of the major relevant concepts. While few fact-checking websites exist to evaluate news articles or some of the popular claims people exchange, nonetheless this can be seen as a little effort in the mission to tag online information with their "proper" category or label.
△ Less
Submitted 22 February, 2021;
originally announced February 2021.
-
Adversarial Machine Learning in Text Analysis and Generation
Authors:
Izzat Alsmadi
Abstract:
The research field of adversarial machine learning witnessed a significant interest in the last few years. A machine learner or model is secure if it can deliver main objectives with acceptable accuracy, efficiency, etc. while at the same time, it can resist different types and/or attempts of adversarial attacks. This paper focuses on studying aspects and research trends in adversarial machine lea…
▽ More
The research field of adversarial machine learning witnessed a significant interest in the last few years. A machine learner or model is secure if it can deliver main objectives with acceptable accuracy, efficiency, etc. while at the same time, it can resist different types and/or attempts of adversarial attacks. This paper focuses on studying aspects and research trends in adversarial machine learning specifically in text analysis and generation. The paper summarizes main research trends in the field such as GAN algorithms, models, types of attacks, and defense against those attacks.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
RBAC for Healthcare-Infrastructure and data storage
Authors:
Ramesh Narasimman,
Izzat Alsmadi
Abstract:
Role based Access control (RBAC) is the cornerstone of security for any modern organization. In this report, we defined a health-care access control structure based on RBAC. We used Alloy formal logic modeling tool to model and validate system functions. We modeled system static and dynamic or temporal behaviours. We focused on evaluating properties such as integrity, conformance and progress.
Role based Access control (RBAC) is the cornerstone of security for any modern organization. In this report, we defined a health-care access control structure based on RBAC. We used Alloy formal logic modeling tool to model and validate system functions. We modeled system static and dynamic or temporal behaviours. We focused on evaluating properties such as integrity, conformance and progress.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
Measuring device suitable for linear distances
Authors:
Zaid A. I. Alsmadi,
Ahmad B. B. Badry,
Irfan A. Badruddin,
T. M. Indra Mahlia
Abstract:
Measuring device is proposed for determining a linear dimension. The device comprises three associated longitudinally moving parts one of which is a scale. The integer part of the device reading is being taken from the standard millimeter or inches scale, and The fine measurement (smaller than the minimum scale division) is being done by a setup of two sliders coupled to the device. The first slid…
▽ More
Measuring device is proposed for determining a linear dimension. The device comprises three associated longitudinally moving parts one of which is a scale. The integer part of the device reading is being taken from the standard millimeter or inches scale, and The fine measurement (smaller than the minimum scale division) is being done by a setup of two sliders coupled to the device. The first slider includes measuring points. And the other slider includes measuring line. The decimal part of the reading is being taken in such a way that the measuring points reading is related to the fractions of the displacement between the graduated scale and the corresponding measuring line.
△ Less
Submitted 24 November, 2014;
originally announced November 2014.
-
Annotations, Collaborative Tagging, and Searching Mathematics in E-Learning
Authors:
Iyad Abu Doush,
Faisal Alkhateeb,
Eslam Al Maghayreh,
Izzat Alsmadi,
Samer Samarah
Abstract:
This paper presents a new framework for adding semantics into e-learning system. The proposed approach relies on two principles. The first principle is the automatic addition of semantic information when creating the mathematical contents. The second principle is the collaborative tagging and annotation of the e-learning contents and the use of an ontology to categorize the e-learning contents. Th…
▽ More
This paper presents a new framework for adding semantics into e-learning system. The proposed approach relies on two principles. The first principle is the automatic addition of semantic information when creating the mathematical contents. The second principle is the collaborative tagging and annotation of the e-learning contents and the use of an ontology to categorize the e-learning contents. The proposed system encodes the mathematical contents using presentation MathML with RDFa annotations. The system allows students to highlight and annotate specific parts of the e-learning contents. The objective is to add meaning into the e-learning contents, to add relationships between contents, and to create a framework to facilitate searching the contents. This semantic information can be used to answer semantic queries (e.g., SPARQL) to retrieve information request of a user. This work is implemented as an embedded code into Moodle e-learning system.
△ Less
Submitted 8 November, 2012;
originally announced November 2012.
-
Indexing of Arabic documents automatically based on lexical analysis
Authors:
Abdulrahman Al Molijy,
Ismail Hmeidi,
Izzat Alsmadi
Abstract:
The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes t…
▽ More
The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes to collect main topics and statements in the book. The process is developed in terms of accuracy and performance and results showed that this process can effectively replace the effort of manually indexing books and document, a process that can be very useful in all information processing and retrieval applications.
△ Less
Submitted 8 May, 2012;
originally announced May 2012.