Search | arXiv e-print repository

Using Large Language Models in Public Transit Systems, San Antonio as a case study

Authors: Ramya Jonnala, Gongbo Liang, Jeong Yang, Izzat Alsmadi

Abstract: The integration of large language models into public transit systems represents a significant advancement in urban transportation management and passenger experience. This study examines the impact of LLMs within San Antonio's public transit system, leveraging their capabilities in natural language processing, data analysis, and real time communication. By utilizing GTFS and other public transport… ▽ More The integration of large language models into public transit systems represents a significant advancement in urban transportation management and passenger experience. This study examines the impact of LLMs within San Antonio's public transit system, leveraging their capabilities in natural language processing, data analysis, and real time communication. By utilizing GTFS and other public transportation information, the research highlights the transformative potential of LLMs in enhancing route planning, reducing wait times, and providing personalized travel assistance. Our case study is the city of San Antonio as part of a project aiming to demonstrate how LLMs can optimize resource allocation, improve passenger satisfaction, and support decision making processes in transit management. We evaluated LLM responses to questions related to both information retrieval and also understanding. Ultimately, we believe that the adoption of LLMs in public transit systems can lead to more efficient, responsive, and user-friendly transportation networks, providing a model for other cities to follow. △ Less

Submitted 25 June, 2024; originally announced July 2024.

arXiv:2406.00628 [pdf, other]

Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models

Authors: Garrett Crumrine, Izzat Alsmadi, Jesus Guerrero, Yuvaraj Munian

Abstract: Large language models (LLMs) have revolutionized how we interact with machines. However, this technological advancement has been paralleled by the emergence of "Mallas," malicious services operating underground that exploit LLMs for nefarious purposes. Such services create malware, phishing attacks, and deceptive websites, escalating the cyber security threats landscape. This paper delves into the… ▽ More Large language models (LLMs) have revolutionized how we interact with machines. However, this technological advancement has been paralleled by the emergence of "Mallas," malicious services operating underground that exploit LLMs for nefarious purposes. Such services create malware, phishing attacks, and deceptive websites, escalating the cyber security threats landscape. This paper delves into the proliferation of Mallas by examining the use of various pre-trained language models and their efficiency and vulnerabilities when misused. Building on a dataset from the Common Vulnerabilities and Exposures (CVE) program, it explores fine-tuning methodologies to generate code and explanatory text related to identified vulnerabilities. This research aims to shed light on the operational strategies and exploitation techniques of Mallas, leading to the development of more secure and trustworthy AI applications. The paper concludes by emphasizing the need for further research, enhanced safeguards, and ethical guidelines to mitigate the risks associated with the malicious application of LLMs. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: A preprint, 17 pages. 11 images

ACM Class: B.8.0; I.2.7; I.2.8; I.2.11; J.0; K.4.2; K.4.1

arXiv:2404.14449 [pdf]

Predicting Question Quality on StackOverflow with Neural Networks

Authors: Mohammad Al-Ramahi, Izzat Alsmadi, Abdullah Wahbeh

Abstract: The wealth of information available through the Internet and social media is unprecedented. Within computing fields, websites such as Stack Overflow are considered important sources for users seeking solutions to their computing and programming issues. However, like other social media platforms, Stack Overflow contains a mixture of relevant and irrelevant information. In this paper, we evaluated n… ▽ More The wealth of information available through the Internet and social media is unprecedented. Within computing fields, websites such as Stack Overflow are considered important sources for users seeking solutions to their computing and programming issues. However, like other social media platforms, Stack Overflow contains a mixture of relevant and irrelevant information. In this paper, we evaluated neural network models to predict the quality of questions on Stack Overflow, as an example of Question Answering (QA) communities. Our results demonstrate the effectiveness of neural network models compared to baseline machine learning models, achieving an accuracy of 80%. Furthermore, our findings indicate that the number of layers in the neural network model can significantly impact its performance. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2312.14434 [pdf, other]

A Review on Searchable Encryption Functionality and the Evaluation of Homomorphic Encryption

Authors: Brian Kishiyama, Izzat Alsmadi

Abstract: Cloud Service Providers, such as Google Cloud Platform, Microsoft Azure, or Amazon Web Services, offer continuously evolving cloud services. It is a growing industry. Businesses, such as Netflix and PayPal, rely on the Cloud for data storage, computing power, and other services. For businesses, the cloud reduces costs, provides flexibility, and allows for growth. However, there are security and pr… ▽ More Cloud Service Providers, such as Google Cloud Platform, Microsoft Azure, or Amazon Web Services, offer continuously evolving cloud services. It is a growing industry. Businesses, such as Netflix and PayPal, rely on the Cloud for data storage, computing power, and other services. For businesses, the cloud reduces costs, provides flexibility, and allows for growth. However, there are security and privacy concerns regarding the Cloud. Because Cloud services are accessed through the internet, hackers and attackers could possibly access the servers from anywhere. To protect data in the Cloud, it should be encrypted before it is uploaded, it should be protected in storage and also in transit. On the other hand, data owners may need to access their encrypted data. It may also need to be altered, updated, deleted, read, searched, or shared with others. If data is decrypted in the Cloud, sensitive data is exposed and could be exposed and misused. One solution is to leave the data in its encrypted form and use Searchable Encryption (SE) which operates on encrypted data. The functionality of SE has improved since its inception and research continues to explore ways to improve SE. This paper reviews the functionality of Searchable Encryption, mostly related to Cloud services, in the years 2019 to 2023, and evaluates one of its schemes, Fully Homomorphic Encryption. Overall, it seems that research is at the point where SE efficiency is increased as multiple functionalities are aggregated and tested. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 15 pages

arXiv:2302.05794 [pdf, other]

Mutation-Based Adversarial Attacks on Neural Text Detectors

Authors: Gongbo Liang, Jesus Guerrero, Izzat Alsmadi

Abstract: Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose characte… ▽ More Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2301.04008 [pdf]

Balanced Datasets for IoT IDS

Authors: Alaa Alhowaide, Izzat Alsmadi, Jian Tang

Abstract: As the Internet of Things (IoT) continues to grow, cyberattacks are becoming increasingly common. The security of IoT networks relies heavily on intrusion detection systems (IDSs). The development of an IDS that is accurate and efficient is a challenging task. As a result, this challenge is made more challenging by the absence of balanced datasets for training and testing the proposed IDS. In this… ▽ More As the Internet of Things (IoT) continues to grow, cyberattacks are becoming increasingly common. The security of IoT networks relies heavily on intrusion detection systems (IDSs). The development of an IDS that is accurate and efficient is a challenging task. As a result, this challenge is made more challenging by the absence of balanced datasets for training and testing the proposed IDS. In this study, four commonly used datasets are visualized and analyzed visually. Moreover, it proposes a sampling algorithm that generates a sample that represents the original dataset. In addition, it proposes an algorithm to generate a balanced dataset. Researchers can use this paper as a starting point when investigating cybersecurity and machine learning. The proposed sampling algorithms showed reliability in generating well-representing and balanced samples from NSL-KDD, UNSW-NB15, BotNetIoT-01, and BoTIoT datasets. △ Less

Submitted 15 December, 2022; originally announced January 2023.

arXiv:2212.11808 [pdf, ps, other]

A Mutation-based Text Generation for Adversarial Machine Learning Applications

Authors: Jesus Guerrero, Gongbo Liang, Izzat Alsmadi

Abstract: Many natural language related applications involve text generation, created by humans or machines. While in many of those applications machines support humans, yet in few others, (e.g. adversarial machine learning, social bots and trolls) machines try to impersonate humans. In this scope, we proposed and evaluated several mutation-based text generation approaches. Unlike machine-based generated te… ▽ More Many natural language related applications involve text generation, created by humans or machines. While in many of those applications machines support humans, yet in few others, (e.g. adversarial machine learning, social bots and trolls) machines try to impersonate humans. In this scope, we proposed and evaluated several mutation-based text generation approaches. Unlike machine-based generated text, mutation-based generated text needs human text samples as inputs. We showed examples of mutation operators but this work can be extended in many aspects such as proposing new text-based mutation operators based on the nature of the application. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2210.06336 [pdf, other]

Synthetic Text Detection: Systemic Literature Review

Authors: Jesus Guerrero, Izzat Alsmadi

Abstract: Within the text analysis and processing fields, generated text attacks have been made easier to create than ever before. To combat these attacks open sourcing models and datasets have become a major trend to create automated detection algorithms in defense of authenticity. For this purpose, synthetic text detection has become an increasingly viable topic of research. This review is written for the… ▽ More Within the text analysis and processing fields, generated text attacks have been made easier to create than ever before. To combat these attacks open sourcing models and datasets have become a major trend to create automated detection algorithms in defense of authenticity. For this purpose, synthetic text detection has become an increasingly viable topic of research. This review is written for the purpose of creating a snapshot of the state of current literature and easing the barrier to entry for future authors. Towards that goal, we identified few research trends and challenges in this field. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2202.12831 [pdf, other]

Benchmark Assessment for DeepSpeed Optimization Library

Authors: Gongbo Liang, Izzat Alsmadi

Abstract: Deep Learning (DL) models are widely used in machine learning due to their performance and ability to deal with large datasets while producing high accuracy and performance metrics. The size of such datasets and the complexity of DL models cause such models to be complex, consuming large amount of resources and time to train. Many recent libraries and applications are introduced to deal with DL co… ▽ More Deep Learning (DL) models are widely used in machine learning due to their performance and ability to deal with large datasets while producing high accuracy and performance metrics. The size of such datasets and the complexity of DL models cause such models to be complex, consuming large amount of resources and time to train. Many recent libraries and applications are introduced to deal with DL complexity and efficiency issues. In this paper, we evaluated one example, Microsoft DeepSpeed library through classification tasks. DeepSpeed public sources reported classification performance metrics on the LeNet architecture. We extended this through evaluating the library on several modern neural network architectures, including convolutional neural networks (CNNs) and Vision Transformer (ViT). Results indicated that DeepSpeed, while can make improvements in some of those cases, it has no or negative impact on others. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2111.05274 [pdf, other]

Event Detection in Twitter: A Content and Time-Based Analysis

Authors: Izzat Alsmadi, Michael O'Brien

Abstract: The detection of events from online social networks is a recent, evolving field that attracts researchers from across a spectrum of disciplines and domains. Here we report a time-series analysis for predicting events. In particular, we evaluated the frequency distribution of top n-grams of terms over time, focusing on two indicators: high-frequency n-grams over both short and long periods of time.… ▽ More The detection of events from online social networks is a recent, evolving field that attracts researchers from across a spectrum of disciplines and domains. Here we report a time-series analysis for predicting events. In particular, we evaluated the frequency distribution of top n-grams of terms over time, focusing on two indicators: high-frequency n-grams over both short and long periods of time. Both indicators can refer to certain aspects of events as they evolve. To evaluate the models accuracy in detecting events, we built and used a Twitter dataset of the most popular hashtags that surrounded the well-documented protests that occurred at the University of Missouri (Mizzou) in late 2015. △ Less

Submitted 18 October, 2021; originally announced November 2021.

arXiv:2110.13980 [pdf, other]

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

Authors: Izzat Alsmadi, Kashif Ahmad, Mahmoud Nazzal, Firoj Alam, Ala Al-Fuqaha, Abdallah Khreishah, Abdulelah Algosaibi

Abstract: The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing(NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these MLand NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adve… ▽ More The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing(NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these MLand NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adversarial attacks on these algorithms in different applications of social media text processing. In this paper, we provide a comprehensive review of the main approaches for adversarial attacks and defenses in the context of social media applications with a particular focus on key challenges and future research directions. In detail, we cover literature on six key applications, namely (i) rumors detection, (ii) satires detection, (iii) clickbait & spams identification, (iv) hate speech detection, (v)misinformation detection, and (vi) sentiment analysis. We then highlight the concurrent and anticipated future research questions and provide recommendations and directions for future work. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: 21 pages, 6 figures, 10 tables

arXiv:2102.11362 [pdf, other]

An ontological analysis of misinformation in online social networks

Authors: Izzat Alsmadi, Iyad Alazzam, Mohammad A. AlRamahi

Abstract: The internet, Online Social Networks (OSNs) and smart phones enable users to create tremendous amount of information. Users who search for general or specific knowledge may not have these days problems of information scarce but misinformation. Misinformation nowadays can refer to a continuous spectrum between what can be seen as "facts" or "truth", if humans agree on the existence of such, to fals… ▽ More The internet, Online Social Networks (OSNs) and smart phones enable users to create tremendous amount of information. Users who search for general or specific knowledge may not have these days problems of information scarce but misinformation. Misinformation nowadays can refer to a continuous spectrum between what can be seen as "facts" or "truth", if humans agree on the existence of such, to false information that everyone agree that it is false. In this paper, we will look at this spectrum of information/misinformation and compare between some of the major relevant concepts. While few fact-checking websites exist to evaluate news articles or some of the popular claims people exchange, nonetheless this can be seen as a little effort in the mission to tag online information with their "proper" category or label. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2101.08675 [pdf, other]

Adversarial Machine Learning in Text Analysis and Generation

Authors: Izzat Alsmadi

Abstract: The research field of adversarial machine learning witnessed a significant interest in the last few years. A machine learner or model is secure if it can deliver main objectives with acceptable accuracy, efficiency, etc. while at the same time, it can resist different types and/or attempts of adversarial attacks. This paper focuses on studying aspects and research trends in adversarial machine lea… ▽ More The research field of adversarial machine learning witnessed a significant interest in the last few years. A machine learner or model is secure if it can deliver main objectives with acceptable accuracy, efficiency, etc. while at the same time, it can resist different types and/or attempts of adversarial attacks. This paper focuses on studying aspects and research trends in adversarial machine learning specifically in text analysis and generation. The paper summarizes main research trends in the field such as GAN algorithms, models, types of attacks, and defense against those attacks. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2010.11096 [pdf, other]

RBAC for Healthcare-Infrastructure and data storage

Authors: Ramesh Narasimman, Izzat Alsmadi

Abstract: Role based Access control (RBAC) is the cornerstone of security for any modern organization. In this report, we defined a health-care access control structure based on RBAC. We used Alloy formal logic modeling tool to model and validate system functions. We modeled system static and dynamic or temporal behaviours. We focused on evaluating properties such as integrity, conformance and progress. Role based Access control (RBAC) is the cornerstone of security for any modern organization. In this report, we defined a health-care access control structure based on RBAC. We used Alloy formal logic modeling tool to model and validate system functions. We modeled system static and dynamic or temporal behaviours. We focused on evaluating properties such as integrity, conformance and progress. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:1411.6611 [pdf]

Measuring device suitable for linear distances

Authors: Zaid A. I. Alsmadi, Ahmad B. B. Badry, Irfan A. Badruddin, T. M. Indra Mahlia

Abstract: Measuring device is proposed for determining a linear dimension. The device comprises three associated longitudinally moving parts one of which is a scale. The integer part of the device reading is being taken from the standard millimeter or inches scale, and The fine measurement (smaller than the minimum scale division) is being done by a setup of two sliders coupled to the device. The first slid… ▽ More Measuring device is proposed for determining a linear dimension. The device comprises three associated longitudinally moving parts one of which is a scale. The integer part of the device reading is being taken from the standard millimeter or inches scale, and The fine measurement (smaller than the minimum scale division) is being done by a setup of two sliders coupled to the device. The first slider includes measuring points. And the other slider includes measuring line. The decimal part of the reading is being taken in such a way that the measuring points reading is related to the fractions of the displacement between the graduated scale and the corresponding measuring line. △ Less

Submitted 24 November, 2014; originally announced November 2014.

arXiv:1211.1780 [pdf]

Annotations, Collaborative Tagging, and Searching Mathematics in E-Learning

Authors: Iyad Abu Doush, Faisal Alkhateeb, Eslam Al Maghayreh, Izzat Alsmadi, Samer Samarah

Abstract: This paper presents a new framework for adding semantics into e-learning system. The proposed approach relies on two principles. The first principle is the automatic addition of semantic information when creating the mathematical contents. The second principle is the collaborative tagging and annotation of the e-learning contents and the use of an ontology to categorize the e-learning contents. Th… ▽ More This paper presents a new framework for adding semantics into e-learning system. The proposed approach relies on two principles. The first principle is the automatic addition of semantic information when creating the mathematical contents. The second principle is the collaborative tagging and annotation of the e-learning contents and the use of an ontology to categorize the e-learning contents. The proposed system encodes the mathematical contents using presentation MathML with RDFa annotations. The system allows students to highlight and annotate specific parts of the e-learning contents. The objective is to add meaning into the e-learning contents, to add relationships between contents, and to create a framework to facilitate searching the contents. This semantic information can be used to answer semantic queries (e.g., SPARQL) to retrieve information request of a user. This work is implemented as an embedded code into Moodle e-learning system. △ Less

Submitted 8 November, 2012; originally announced November 2012.

arXiv:1205.1602 [pdf]

Indexing of Arabic documents automatically based on lexical analysis

Authors: Abdulrahman Al Molijy, Ismail Hmeidi, Izzat Alsmadi

Abstract: The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes t… ▽ More The continuous information explosion through the Internet and all information sources makes it necessary to perform all information processing activities automatically in quick and reliable manners. In this paper, we proposed and implemented a method to automatically create and Index for books written in Arabic language. The process depends largely on text summarization and abstraction processes to collect main topics and statements in the book. The process is developed in terms of accuracy and performance and results showed that this process can effectively replace the effort of manually indexing books and document, a process that can be very useful in all information processing and retrieval applications. △ Less

Submitted 8 May, 2012; originally announced May 2012.

Showing 1–17 of 17 results for author: Alsmadi, I