Search | arXiv e-print repository

arXiv:2403.03832 [pdf]

Your device may know you better than you know yourself -- continuous authentication on novel dataset using machine learning

Authors: Pedro Gomes do Nascimento, Pidge Witiak, Tucker MacCallum, Zachary Winterfeldt, Rushit Dave

Abstract: This research aims to further understanding in the field of continuous authentication using behavioral biometrics. We are contributing a novel dataset that encompasses the gesture data of 15 users playing Minecraft with a Samsung Tablet, each for a duration of 15 minutes. Utilizing this dataset, we employed machine learning (ML) binary classifiers, being Random Forest (RF), K-Nearest Neighbors (KN… ▽ More This research aims to further understanding in the field of continuous authentication using behavioral biometrics. We are contributing a novel dataset that encompasses the gesture data of 15 users playing Minecraft with a Samsung Tablet, each for a duration of 15 minutes. Utilizing this dataset, we employed machine learning (ML) binary classifiers, being Random Forest (RF), K-Nearest Neighbors (KNN), and Support Vector Classifier (SVC), to determine the authenticity of specific user actions. Our most robust model was SVC, which achieved an average accuracy of approximately 90%, demonstrating that touch dynamics can effectively distinguish users. However, further studies are needed to make it viable option for authentication systems △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03828 [pdf]

From Clicks to Security: Investigating Continuous Authentication via Mouse Dynamics

Authors: Rushit Dave, Marcho Handoko, Ali Rashid, Cole Schoenbauer

Abstract: In the realm of computer security, the importance of efficient and reliable user authentication methods has become increasingly critical. This paper examines the potential of mouse movement dynamics as a consistent metric for continuous authentication. By analyzing user mouse movement patterns in two contrasting gaming scenarios, "Team Fortress" and Poly Bridge we investigate the distinctive behav… ▽ More In the realm of computer security, the importance of efficient and reliable user authentication methods has become increasingly critical. This paper examines the potential of mouse movement dynamics as a consistent metric for continuous authentication. By analyzing user mouse movement patterns in two contrasting gaming scenarios, "Team Fortress" and Poly Bridge we investigate the distinctive behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond conventional methodologies by employing a range of machine learning models. These models are carefully selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as reflected in their mouse movements. This multifaceted approach allows for a more nuanced and comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine learning models employed in this study demonstrate competent performance in user verification, marking an improvement over previous methods used in this field. This research contributes to the ongoing efforts to enhance computer security and highlights the potential of leveraging user behavior, specifically mouse dynamics, in developing robust authentication systems. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.10478 [pdf, other]

CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes

Authors: Ishan Rajendrakumar Dave, Tristan de Blegiers, Chen Chen, Mubarak Shah

Abstract: Malaria is a major health issue worldwide, and its diagnosis requires scalable solutions that can work effectively with low-cost microscopes (LCM). Deep learning-based methods have shown success in computer-aided diagnosis from microscopic images. However, these methods need annotated images that show cells affected by malaria parasites and their life stages. Annotating images from LCM significant… ▽ More Malaria is a major health issue worldwide, and its diagnosis requires scalable solutions that can work effectively with low-cost microscopes (LCM). Deep learning-based methods have shown success in computer-aided diagnosis from microscopic images. However, these methods need annotated images that show cells affected by malaria parasites and their life stages. Annotating images from LCM significantly increases the burden on medical experts compared to annotating images from high-cost microscopes (HCM). For this reason, a practical solution would be trained on HCM images which should generalize well on LCM images during testing. While earlier methods adopted a multi-stage learning process, they did not offer an end-to-end approach. In this work, we present an end-to-end learning framework, named CodaMal (Contrastive Domain Adpation for Malaria). In order to bridge the gap between HCM (training) and LCM (testing), we propose a domain adaptive contrastive loss. It reduces the domain shift by promoting similarity between the representations of HCM and its corresponding LCM image, without imposing an additional annotation burden. In addition, the training objective includes object detection objectives with carefully designed augmentations, ensuring the accurate detection of malaria parasites. On the publicly available large-scale M5-dataset, our proposed method shows a significant improvement of 16% over the state-of-the-art methods in terms of the mean average precision metric (mAP), provides 21x speed up during inference, and requires only half learnable parameters than the prior methods. Our code is publicly available. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Under Review. Project Page: https://daveishan.github.io/codamal-webpage/

arXiv:2312.13008 [pdf, other]

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

Authors: Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah

Abstract: Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image domain (e.g., contrastive learning) that do not explicitly promote the learning of temporal features. We identify two factors that limit existing temporal self-su… ▽ More Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image domain (e.g., contrastive learning) that do not explicitly promote the learning of temporal features. We identify two factors that limit existing temporal self-supervision: 1) tasks are too simple, resulting in saturated training performance, and 2) we uncover shortcuts based on local appearance statistics that hinder the learning of high-level features. To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts. Our model extends a representation of single video frames, pre-trained through contrastive learning, with a transformer that we train through temporal self-supervision. We demonstrate experimentally that our more challenging frame-level task formulations and the removal of shortcuts drastically improve the quality of features learned through temporal self-supervision. The generalization capability of our self-supervised video method is evidenced by its state-of-the-art performance in a wide range of high-level semantic tasks, including video retrieval, action classification, and video attribute recognition (such as object and scene identification), as well as low-level temporal correspondence tasks like video object segmentation and pose tracking. Additionally, we show that the video representations learned through our method exhibit increased robustness to the input perturbations. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: AAAI 2024 (Main Technical Track)

arXiv:2310.10862 [pdf, other]

The Invisible Map: Visual-Inertial SLAM with Fiducial Markers for Smartphone-based Indoor Navigation

Authors: Paul Ruvolo, Ayush Chakraborty, Rucha Dave, Richard Li, Duncan Mazza, Xierui Shen, Raiyan Siddique, Krishna Suresh

Abstract: We present a system for creating building-scale, easily navigable 3D maps using mainstream smartphones. In our approach, we formulate the 3D-mapping problem as an instance of Graph SLAM and infer the position of both building landmarks (fiducial markers) and navigable paths through the environment (phone poses). Our results demonstrate the system's ability to create accurate 3D maps. Further, we h… ▽ More We present a system for creating building-scale, easily navigable 3D maps using mainstream smartphones. In our approach, we formulate the 3D-mapping problem as an instance of Graph SLAM and infer the position of both building landmarks (fiducial markers) and navigable paths through the environment (phone poses). Our results demonstrate the system's ability to create accurate 3D maps. Further, we highlight the importance of careful selection of mapping hyperparameters and provide a novel technique for tuning these hyperparameters to adapt our algorithm to new environments. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2308.13711 [pdf, other]

EventTransAct: A video transformer-based framework for Event-camera based action recognition

Authors: Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, Mubarak Shah

Abstract: Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing. Event cameras, with their ability to capture fast-moving objects at a high temporal resolution, offer new opportunities compared to standard action recognition in RGB videos… ▽ More Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing. Event cameras, with their ability to capture fast-moving objects at a high temporal resolution, offer new opportunities compared to standard action recognition in RGB videos. However, previous research on event camera action recognition has primarily focused on sensor-specific network architectures and image encoding, which may not be suitable for new sensors and limit the use of recent advancements in transformer-based architectures. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame and then utilizes a temporal self-attention mechanism. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($\mathcal{L}_{EC}$) and event-specific augmentations. Proposed $\mathcal{L}_{EC}$ promotes learning fine-grained spatial cues in the spatial backbone of VTN by contrasting temporally misaligned frames. We evaluate our method on real-world action recognition of N-EPIC Kitchens dataset, and achieve state-of-the-art results on both protocols - testing in seen kitchen (\textbf{74.9\%} accuracy) and testing in unseen kitchens (\textbf{42.43\% and 46.66\% Accuracy}). Our approach also takes less computation time compared to competitive prior approaches, which demonstrates the potential of our framework \textit{EventTransAct} for real-world applications of event-camera based action recognition. Project Page: \url{https://tristandb8.github.io/EventTransAct_webpage/} △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: IROS 2023; The first two authors contributed equally

arXiv:2308.11072 [pdf, other]

TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection

Authors: Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah

Abstract: Video anomaly detection (VAD) without human monitoring is a complex computer vision task that can have a positive impact on society if implemented successfully. While recent advances have made significant progress in solving this task, most existing approaches overlook a critical real-world concern: privacy. With the increasing popularity of artificial intelligence technologies, it becomes crucial… ▽ More Video anomaly detection (VAD) without human monitoring is a complex computer vision task that can have a positive impact on society if implemented successfully. While recent advances have made significant progress in solving this task, most existing approaches overlook a critical real-world concern: privacy. With the increasing popularity of artificial intelligence technologies, it becomes crucial to implement proper AI ethics into their development. Privacy leakage in VAD allows models to pick up and amplify unnecessary biases related to people's personal information, which may lead to undesirable decision making. In this paper, we propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner. In particular, we propose the use of a temporally-distinct triplet loss to promote temporally discriminative features, which complements current weakly-supervised VAD methods. Using TeD-SPAD, we achieve a positive trade-off between privacy protection and utility anomaly detection performance on three popular weakly supervised VAD datasets: UCF-Crime, XD-Violence, and ShanghaiTech. Our proposed anonymization model reduces private attribute prediction by 32.25% while only reducing frame-level ROC AUC on the UCF-Crime anomaly detection dataset by 3.69%. Project Page: https://joefioresi718.github.io/TeD-SPAD_webpage/ △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2308.05563 [pdf]

Recent Advancements In The Field Of Deepfake Detection

Authors: Natalie Krueger, Mounika Vanamala, Rushit Dave

Abstract: A deepfake is a photo or video of a person whose image has been digitally altered or partially replaced with an image of someone else. Deepfakes have the potential to cause a variety of problems and are often used maliciously. A common usage is altering videos of prominent political figures and celebrities. These deepfakes can portray them making offensive, problematic, and/or untrue statements. C… ▽ More A deepfake is a photo or video of a person whose image has been digitally altered or partially replaced with an image of someone else. Deepfakes have the potential to cause a variety of problems and are often used maliciously. A common usage is altering videos of prominent political figures and celebrities. These deepfakes can portray them making offensive, problematic, and/or untrue statements. Current deepfakes can be very realistic, and when used in this way, can spread panic and even influence elections and political opinions. There are many deepfake detection strategies currently in use but finding the most comprehensive and universal method is critical. So, in this survey we will address the problems of malicious deepfake creation and the lack of universal deepfake detection methods. Our objective is to survey and analyze a variety of current methods and advances in the field of deepfake detection. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2305.09482 [pdf]

Your Identity is Your Behavior -- Continuous User Authentication based on Machine Learning and Touch Dynamics

Authors: Brendan Pelto, Mounika Vanamala, Rushit Dave

Abstract: The aim of this research paper is to look into the use of continuous authentication with mobile touch dynamics, using three different algorithms: Neural Network, Extreme Gradient Boosting, and Support Vector Machine. Mobile devices are constantly increasing in popularity in the world, today smartphone subscriptions have surpassed 6 billion. Mobile touch dynamics refer to the distinct patterns of h… ▽ More The aim of this research paper is to look into the use of continuous authentication with mobile touch dynamics, using three different algorithms: Neural Network, Extreme Gradient Boosting, and Support Vector Machine. Mobile devices are constantly increasing in popularity in the world, today smartphone subscriptions have surpassed 6 billion. Mobile touch dynamics refer to the distinct patterns of how a user interacts with their mobile device, this includes factors such as touch pressure, swipe speed, and touch duration. Continuous authentication refers to the process of continuously verifying a user's identity while they are using a device, rather than just at the initial login. This research used a dataset of touch dynamics collected from 40 subjects using the LG V30+. The participants played four mobile games, PUBG, Diep.io, Slither, and Minecraft, for 10 minutes each game. The three algorithms were trained and tested on the extracted dataset, and their performance was evaluated based on metrics such as accuracy, precision, false negative rate, and false positive rate. The results of the research showed that all three algorithms were able to effectively classify users based on their individual touch dynamics, with accuracy ranging from 80% to 95%. The Neural Network algorithm performed the best, achieving the highest accuracy and precision scores, followed closely by XGBoost and SVC. The data shows that continuous authentication using mobile touch dynamics has the potential to be a useful method for enhancing security and reducing the risk of unauthorized access to personal devices. This research also notes the importance of choosing the correct algorithm for a given dataset and use case, as different algorithms may have varying levels of performance depending on the specific task. △ Less

Submitted 24 April, 2023; originally announced May 2023.

arXiv:2304.14504 [pdf]

Hybrid Deepfake Detection Utilizing MLP and LSTM

Authors: Jacob Mallet, Natalie Krueger, Mounika Vanamala, Rushit Dave

Abstract: The growing reliance of society on social media for authentic information has done nothing but increase over the past years. This has only raised the potential consequences of the spread of misinformation. One of the growing methods in popularity is to deceive users using a deepfake. A deepfake is an invention that has come with the latest technological advancements, which enables nefarious online… ▽ More The growing reliance of society on social media for authentic information has done nothing but increase over the past years. This has only raised the potential consequences of the spread of misinformation. One of the growing methods in popularity is to deceive users using a deepfake. A deepfake is an invention that has come with the latest technological advancements, which enables nefarious online users to replace their face with a computer generated, synthetic face of numerous powerful members of society. Deepfake images and videos now provide the means to mimic important political and cultural figures to spread massive amounts of false information. Models that can detect these deepfakes to prevent the spread of misinformation are now of tremendous necessity. In this paper, we propose a new deepfake detection schema utilizing two deep learning algorithms: long short term memory and multilayer perceptron. We evaluate our model using a publicly available dataset named 140k Real and Fake Faces to detect images altered by a deepfake with accuracies achieved as high as 74.7% △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: 5 Pages

arXiv:2304.02096 [pdf, other]

The CAMELS project: Expanding the galaxy formation model space with new ASTRID and 28-parameter TNG and SIMBA suites

Authors: Yueying Ni, Shy Genel, Daniel Anglés-Alcázar, Francisco Villaescusa-Navarro, Yongseok Jo, Simeon Bird, Tiziana Di Matteo, Rupert Croft, Nianyi Chen, Natalí S. M. de Santi, Matthew Gebhardt, Helen Shao, Shivam Pandey, Lars Hernquist, Romeel Dave

Abstract: We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies.… ▽ More We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies. CAMELS-ASTRID employs the galaxy formation model following the ASTRID simulation and contains 2,124 hydrodynamic simulation runs that vary 3 cosmological parameters ($Ω_m$, $σ_8$, $Ω_b$) and 4 parameters controlling stellar and AGN feedback. Compared to the existing TNG and SIMBA simulation suites in CAMELS, the fiducial model of ASTRID features the mildest AGN feedback and predicts the least baryonic effect on the matter power spectrum. The training set of ASTRID covers a broader variation in the galaxy populations and the baryonic impact on the matter power spectrum compared to its TNG and SIMBA counterparts, which can make machine-learning models trained on the ASTRID suite exhibit better extrapolation performance when tested on other hydrodynamic simulation sets. We also introduce extension simulation sets in CAMELS that widely explore 28 parameters in the TNG and SIMBA models, demonstrating the enormity of the overall galaxy formation model parameter space and the complex non-linear interplay between cosmology and astrophysical processes. With the new simulation suites, we show that building robust machine-learning models favors training and testing on the largest possible diversity of galaxy formation models. We also demonstrate that it is possible to train accurate neural networks to infer cosmological parameters using the high-dimensional TNG-SB28 simulation set. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2304.01908 [pdf]

Leveraging Deep Learning Approaches for Deepfake Detection: A Review

Authors: Aniruddha Tiwari, Rushit Dave, Mounika Vanamala

Abstract: Conspicuous progression in the field of machine learning and deep learning have led the jump of highly realistic fake media, these media oftentimes referred as deepfakes. Deepfakes are fabricated media which are generated by sophisticated AI that are at times very difficult to set apart from the real media. So far, this media can be uploaded to the various social media platforms, hence advertising… ▽ More Conspicuous progression in the field of machine learning and deep learning have led the jump of highly realistic fake media, these media oftentimes referred as deepfakes. Deepfakes are fabricated media which are generated by sophisticated AI that are at times very difficult to set apart from the real media. So far, this media can be uploaded to the various social media platforms, hence advertising it to the world got easy, calling for an efficacious countermeasure. Thus, one of the optimistic counter steps against deepfake would be deepfake detection. To undertake this threat, researchers in the past have created models to detect deepfakes based on ML/DL techniques like Convolutional Neural Networks. This paper aims to explore different methodologies with an intention to achieve a cost-effective model with a higher accuracy with different types of the datasets, which is to address the generalizability of the dataset. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.16268 [pdf, other]

TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

Authors: Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah

Abstract: Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inducti… ▽ More Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inductive biases like using two-modalities (RGB and Optical-flow) or two-stream of different playback rates. Instead of utilizing unlabeled videos through diverse input streams, we rely on self-supervised video representations, particularly, we utilize temporally-invariant and temporally-distinctive representations. We observe that these representations complement each other depending on the nature of the action. Based on this observation, we propose a student-teacher semi-supervised learning framework, TimeBalance, where we distill the knowledge from a temporally-invariant and a temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. Our method achieves state-of-the-art performance on three action recognition benchmarks: UCF101, HMDB51, and Kinetics400. Code: https://github.com/DAVEISHAN/TimeBalance △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: CVPR-2023

arXiv:2303.14090 [pdf, ps, other]

Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter

Authors: Zhenyu Dai, Ben Moews, Ricardo Vilalta, Romeel Dave

Abstract: Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are… ▽ More Physics-informed neural networks have emerged as a coherent framework for building predictive models that combine statistical patterns with domain knowledge. The underlying notion is to enrich the optimization loss function with known relationships to constrain the space of possible solutions. Hydrodynamic simulations are a core constituent of modern cosmology, while the required computations are both expensive and time-consuming. At the same time, the comparatively fast simulation of dark matter requires fewer resources, which has led to the emergence of machine learning algorithms for baryon inpainting as an active area of research; here, recreating the scatter found in hydrodynamic simulations is an ongoing challenge. This paper presents the first application of physics-informed neural networks to baryon inpainting by combining advances in neural network architectures with physical constraints, injecting theory on baryon conversion efficiency into the model loss function. We also introduce a punitive prediction comparison based on the Kullback-Leibler divergence, which enforces scatter reproduction. By simultaneously extracting the complete set of baryonic properties for the Simba suite of cosmological simulations, our results demonstrate improved accuracy of baryonic predictions based on dark matter halo properties, successful recovery of the fundamental metallicity relation, and retrieve scatter that traces the target simulation's distribution. △ Less

Submitted 19 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2302.10280 [pdf]

Deepfake Detection Analyzing Hybrid Dataset Utilizing CNN and SVM

Authors: Jacob mallet, Laura Pryor, Rushit Dave, Mounika Vanamala

Abstract: Social media is currently being used by many individuals online as a major source of information. However, not all information shared online is true, even photos and videos can be doctored. Deepfakes have recently risen with the rise of technological advancement and have allowed nefarious online users to replace one face with a computer generated face of anyone they would like, including important… ▽ More Social media is currently being used by many individuals online as a major source of information. However, not all information shared online is true, even photos and videos can be doctored. Deepfakes have recently risen with the rise of technological advancement and have allowed nefarious online users to replace one face with a computer generated face of anyone they would like, including important political and cultural figures. Deepfakes are now a tool to be able to spread mass misinformation. There is now an immense need to create models that are able to detect deepfakes and keep them from being spread as seemingly real images or videos. In this paper, we propose a new deepfake detection schema using two popular machine learning algorithms. △ Less

Submitted 26 January, 2023; originally announced February 2023.

arXiv:2302.05530 [pdf]

Machine Learning Based Approach to Recommend MITRE ATT&CK Framework for Software Requirements and Design Specifications

Authors: Nicholas Lasky, Benjamin Hallis, Mounika Vanamala, Rushit Dave, Jim Seliya

Abstract: Engineering more secure software has become a critical challenge in the cyber world. It is very important to develop methodologies, techniques, and tools for developing secure software. To develop secure software, software developers need to think like an attacker through mining software repositories. These aim to analyze and understand the data repositories related to software development. The ma… ▽ More Engineering more secure software has become a critical challenge in the cyber world. It is very important to develop methodologies, techniques, and tools for developing secure software. To develop secure software, software developers need to think like an attacker through mining software repositories. These aim to analyze and understand the data repositories related to software development. The main goal is to use these software repositories to support the decision-making process of software development. There are different vulnerability databases like Common Weakness Enumeration (CWE), Common Vulnerabilities and Exposures database (CVE), and CAPEC. We utilized a database called MITRE. MITRE ATT&CK tactics and techniques have been used in various ways and methods, but tools for utilizing these tactics and techniques in the early stages of the software development life cycle (SDLC) are lacking. In this paper, we use machine learning algorithms to map requirements to the MITRE ATT&CK database and determine the accuracy of each mapping depending on the data split. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2210.08423 [pdf, other]

TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

Authors: Tushar Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah

Abstract: Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. However, existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. In this work, we propose a sim… ▽ More Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. However, existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. In this work, we propose a simple yet effective framework, \textit{TransVisDrone}, that provides an end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to improve drone detection in challenging scenarios by learning spatio-temporal dependencies of drone motion. Our method achieves state-of-the-art performance on three challenging real-world datasets (Average [email protected]): NPS 0.95, FLDrones 0.75, and AOT 0.80, and a higher throughput than previous methods. We also demonstrate its deployment capability on edge devices and its usefulness in detecting drone-collision (encounter). Project: \url{https://tusharsangam.github.io/TransVisDrone-project-page/}. △ Less

Submitted 25 August, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: ICRA 2023

arXiv:2207.13648 [pdf]

Continuous User Authentication Using Machine Learning and Multi-Finger Mobile Touch Dynamics with a Novel Dataset

Authors: Zachary Deridder, Nyle Siddiqui, Thomas Reither, Rushit Dave, Brendan Pelto, Naeem Seliya, Mounika Vanamala

Abstract: As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more commonly used for sensitive matters than ever before. A need to authenticate users continuously is sought after as a single-factor or multi factor authentication may only initially validate a user, which does not help if an impostor can bypass this initial validation. The field of touch dynamics emerges… ▽ More As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more commonly used for sensitive matters than ever before. A need to authenticate users continuously is sought after as a single-factor or multi factor authentication may only initially validate a user, which does not help if an impostor can bypass this initial validation. The field of touch dynamics emerges as a clear way to non intrusively collect data about a user and their behaviors in order to develop and make imperative security related decisions in real time. In this paper we present a novel dataset consisting of tracking 25 users playing two mobile games Snake.io and Minecraft each for 10 minutes, along with their relevant gesture data. From this data, we ran machine learning binary classifiers namely Random Forest and K Nearest Neighbor to attempt to authenticate whether a sample of a particular users actions were genuine. Our strongest model returned an average accuracy of roughly 93% for both games, showing touch dynamics can differentiate users effectively and is a feasible consideration for authentication schemes. Our dataset can be observed at https://github.com/zderidder/MC-Snake-Results △ Less

Submitted 27 July, 2022; originally announced July 2022.

arXiv:2207.13644 [pdf]

Using Deep Learning to Detecting Deepfakes

Authors: Jacob Mallet, Rushit Dave, Naeem Seliya, Mounika Vanamala

Abstract: In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one persons face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological exp… ▽ More In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one persons face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper. △ Less

Submitted 27 July, 2022; originally announced July 2022.

arXiv:2207.00161 [pdf]

Mitigating Presentation Attack using DCGAN and Deep CNN

Authors: Nyle Siddiqui, Rushit Dave

Abstract: Biometric based authentication is currently playing an essential role over conventional authentication system; however, the risk of presentation attacks subsequently rising. Our research aims at identifying the areas where presentation attack can be prevented even though adequate biometric image samples of users are limited. Our work focusses on generating photorealistic synthetic images from the… ▽ More Biometric based authentication is currently playing an essential role over conventional authentication system; however, the risk of presentation attacks subsequently rising. Our research aims at identifying the areas where presentation attack can be prevented even though adequate biometric image samples of users are limited. Our work focusses on generating photorealistic synthetic images from the real image sets by implementing Deep Convolution Generative Adversarial Net (DCGAN). We have implemented the temporal and spatial augmentation during the fake image generation. Our work detects the presentation attacks on facial and iris images using our deep CNN, inspired by VGGNet [1]. We applied the deep neural net techniques on three different biometric image datasets, namely MICHE I [2], VISOB [3], and UBIPr [4]. The datasets, used in this research, contain images that are captured both in controlled and uncontrolled environment along with different resolutions and sizes. We obtained the best test accuracy of 97% on UBI-Pr [4] Iris datasets. For MICHE-I [2] and VISOB [3] datasets, we achieved the test accuracies of 95% and 96% respectively. △ Less

Submitted 22 June, 2022; originally announced July 2022.

arXiv:2205.13646 [pdf]

doi 10.3390/make4020023

Machine and Deep Learning Applications to Mouse Dynamics for Continuous User Authentication

Authors: Nyle Siddiqui, Rushit Dave, Naeem Seliya, Mounika Vanamala

Abstract: Static authentication methods, like passwords, grow increasingly weak with advancements in technology and attack strategies. Continuous authentication has been proposed as a solution, in which users who have gained access to an account are still monitored in order to continuously verify that the user is not an imposter who had access to the user credentials. Mouse dynamics is the behavior of a use… ▽ More Static authentication methods, like passwords, grow increasingly weak with advancements in technology and attack strategies. Continuous authentication has been proposed as a solution, in which users who have gained access to an account are still monitored in order to continuously verify that the user is not an imposter who had access to the user credentials. Mouse dynamics is the behavior of a users mouse movements and is a biometric that has shown great promise for continuous authentication schemes. This article builds upon our previous published work by evaluating our dataset of 40 users using three machine learning and deep learning algorithms. Two evaluation scenarios are considered: binary classifiers are used for user authentication, with the top performer being a 1-dimensional convolutional neural network with a peak average test accuracy of 85.73% across the top 10 users. Multi class classification is also examined using an artificial neural network which reaches an astounding peak accuracy of 92.48% the highest accuracy we have seen for any classifier on this dataset. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Journal ref: Mach. Learn. Knowl. Extr. 2022

arXiv:2205.08371 [pdf]

Evaluation of a User Authentication Schema Using Behavioral Biometrics and Machine Learning

Authors: Laura Pryor, Jacob Mallet, Rushit Dave, Naeem Seliya, Mounika Vanamala, Evelyn Sowells Boone

Abstract: The amount of secure data being stored on mobile devices has grown immensely in recent years. However, the security measures protecting this data have stayed static, with few improvements being done to the vulnerabilities of current authentication methods such as physiological biometrics or passwords. Instead of these methods, behavioral biometrics has recently been researched as a solution to the… ▽ More The amount of secure data being stored on mobile devices has grown immensely in recent years. However, the security measures protecting this data have stayed static, with few improvements being done to the vulnerabilities of current authentication methods such as physiological biometrics or passwords. Instead of these methods, behavioral biometrics has recently been researched as a solution to these vulnerable authentication methods. In this study, we aim to contribute to the research being done on behavioral biometrics by creating and evaluating a user authentication scheme using behavioral biometrics. The behavioral biometrics used in this study include touch dynamics and phone movement, and we evaluate the performance of different single-modal and multi-modal combinations of the two biometrics. Using two publicly available datasets - BioIdent and Hand Movement Orientation and Grasp (H-MOG), this study uses seven common machine learning algorithms to evaluate performance. The algorithms used in the evaluation include Random Forest, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, Logistic Regression, Multilayer Perceptron, and Long Short-Term Memory Recurrent Neural Networks, with accuracy rates reaching as high as 86%. △ Less

Submitted 7 May, 2022; originally announced May 2022.

arXiv:2204.13589 [pdf]

A Close Look into Human Activity Recognition Models using Deep Learning

Authors: Wei Zhong Tee, Rushit Dave, Naeem Seliya, Mounika Vanamala

Abstract: Human activity recognition using deep learning techniques has become increasing popular because of its high effectivity with recognizing complex tasks, as well as being relatively low in costs compared to more traditional machine learning techniques. This paper surveys some state-of-the-art human activity recognition models that are based on deep learning architecture and has layers containing Con… ▽ More Human activity recognition using deep learning techniques has become increasing popular because of its high effectivity with recognizing complex tasks, as well as being relatively low in costs compared to more traditional machine learning techniques. This paper surveys some state-of-the-art human activity recognition models that are based on deep learning architecture and has layers containing Convolution Neural Networks (CNN), Long Short-Term Memory (LSTM), or a mix of more than one type for a hybrid system. The analysis outlines how the models are implemented to maximize its effectivity and some of the potential limitations it faces. △ Less

Submitted 26 April, 2022; originally announced April 2022.

arXiv:2204.09088 [pdf]

Exploration of Machine Learning Classification Models Used for Behavioral Biometrics Authentication

Authors: Sara Kokal, Laura Pryor, Rushit Dave

Abstract: Mobile devices have been manufactured and enhanced at growing rates in the past decades. While this growth has significantly evolved the capability of these devices, their security has been falling behind. This contrast in development between capability and security of mobile devices is a significant problem with the sensitive information of the public at risk. Continuing the previous work in this… ▽ More Mobile devices have been manufactured and enhanced at growing rates in the past decades. While this growth has significantly evolved the capability of these devices, their security has been falling behind. This contrast in development between capability and security of mobile devices is a significant problem with the sensitive information of the public at risk. Continuing the previous work in this field, this study identifies key Machine Learning algorithms currently being used for behavioral biometric mobile authentication schemes and aims to provide a comprehensive review of these algorithms when used with touch dynamics and phone movement. Throughout this paper the benefits, limitations, and recommendations for future work will be discussed. △ Less

Submitted 19 April, 2022; originally announced April 2022.

arXiv:2203.15205 [pdf, other]

SPAct: Self-supervised Privacy Preservation for Action Recognition

Authors: Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah

Abstract: Visual private information leakage is an emerging key issue for the fast growing applications of video understanding like activity recognition. Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset. However, annotating frames of video dataset for privacy labels is not feasible. Recent developments of self… ▽ More Visual private information leakage is an emerging key issue for the fast growing applications of video understanding like activity recognition. Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset. However, annotating frames of video dataset for privacy labels is not feasible. Recent developments of self-supervised learning (SSL) have unleashed the untapped potential of the unlabeled data. For the first time, we present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels. Our training framework consists of three main components: anonymization function, self-supervised privacy removal branch, and action recognition branch. We train our framework using a minimax optimization strategy to minimize the action recognition cost function and maximize the privacy cost function through a contrastive self-supervised loss. Employing existing protocols of known-action and privacy attributes, our framework achieves a competitive action-privacy trade-off to the existing state-of-the-art supervised methods. In addition, we introduce a new protocol to evaluate the generalization of learned the anonymization function to novel-action and privacy attributes and show that our self-supervised framework outperforms existing supervised methods. Code available at: https://github.com/DAVEISHAN/SPAct △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: CVPR-2022

arXiv:2202.02456 [pdf]

doi 10.1007/978-981-16-3246-4_52

Application of Machine Learning-Based Pattern Recognition in IoT Devices: Review

Authors: Zachary Menter, Wei Tee, Rushit Dave

Abstract: The Internet of things (IoT) is a rapidly advancing area of technology that has quickly become more widespread in recent years. With greater numbers of everyday objects being connected to the Internet, many different innovations have been presented to make our everyday lives more straightforward. Pattern recognition is extremely prevalent in IoT devices because of the many applications and benefit… ▽ More The Internet of things (IoT) is a rapidly advancing area of technology that has quickly become more widespread in recent years. With greater numbers of everyday objects being connected to the Internet, many different innovations have been presented to make our everyday lives more straightforward. Pattern recognition is extremely prevalent in IoT devices because of the many applications and benefits that can come from it. A multitude of studies has been conducted with the intention of improving speed and accuracy, decreasing complexity, and reducing the overall required processing power of pattern recognition algorithms in IoT devices. After reviewing the applications of different machine learning algorithms, results vary from case to case, but a general conclusion can be drawn that the optimal machine learning-based pattern recognition algorithms to be used with IoT devices are support vector machine, k-nearest neighbor, and random forest. △ Less

Submitted 9 January, 2022; originally announced February 2022.

arXiv:2201.12705 [pdf]

A Robust Framework for Deep Learning Approaches to Facial Emotion Recognition and Evaluation

Authors: Nyle Siddiqui, Rushit Dave, Tyler Bauer, Thomas Reither, Dylan Black, Mitchell Hanson

Abstract: Facial emotion recognition is a vast and complex problem space within the domain of computer vision and thus requires a universally accepted baseline method with which to evaluate proposed models. While test datasets have served this purpose in the academic sphere real world application and testing of such models lacks any real comparison. Therefore we propose a framework in which models developed… ▽ More Facial emotion recognition is a vast and complex problem space within the domain of computer vision and thus requires a universally accepted baseline method with which to evaluate proposed models. While test datasets have served this purpose in the academic sphere real world application and testing of such models lacks any real comparison. Therefore we propose a framework in which models developed for FER can be compared and contrasted against one another in a constant standardized fashion. A lightweight convolutional neural network is trained on the AffectNet dataset a large variable dataset for facial emotion recognition and a web application is developed and deployed with our proposed framework as a proof of concept. The CNN is embedded into our application and is capable of instant real time facial emotion recognition. When tested on the AffectNet test set this model achieves high accuracy for emotion classification of eight different emotions. Using our framework the validity of this model and others can be properly tested by evaluating a model efficacy not only based on its accuracy on a sample test dataset, but also on in the wild experiments. Additionally, our application is built with the ability to save and store any image captured or uploaded to it for emotion recognition, allowing for the curation of more quality and diverse facial emotion recognition datasets. △ Less

Submitted 29 January, 2022; originally announced January 2022.

arXiv:2201.08565 [pdf]

Human Activity Recognition models using Limited Consumer Device Sensors and Machine Learning

Authors: Rushit Dave, Naeem Seliya, Mounika Vanamala, Wei Tee

Abstract: Human activity recognition has grown in popularity with its increase of applications within daily lifestyles and medical environments. The goal of having efficient and reliable human activity recognition brings benefits such as accessible use and better allocation of resources; especially in the medical industry. Activity recognition and classification can be obtained using many sophisticated data… ▽ More Human activity recognition has grown in popularity with its increase of applications within daily lifestyles and medical environments. The goal of having efficient and reliable human activity recognition brings benefits such as accessible use and better allocation of resources; especially in the medical industry. Activity recognition and classification can be obtained using many sophisticated data recording setups, but there is also a need in observing how performance varies among models that are strictly limited to using sensor data from easily accessible devices: smartphones and smartwatches. This paper presents the findings of different models that are limited to train using such sensors. The models are trained using either the k-Nearest Neighbor, Support Vector Machine, or Random Forest classifier algorithms. Performance and evaluations are done by comparing various model performances using different combinations of mobile sensors and how they affect recognitive performances of models. Results show promise for models trained strictly using limited sensor data collected from only smartphones and smartwatches coupled with traditional machine learning concepts and algorithms. △ Less

Submitted 21 January, 2022; originally announced January 2022.

arXiv:2201.08564 [pdf]

Hold On and Swipe: A Touch-Movement Based Continuous Authentication Schema based on Machine Learning

Authors: Rushit Dave, Naeem Seliya, Laura Pryor, Mounika Vanamala, Evelyn Sowells, Jacob mallet

Abstract: In recent years the amount of secure information being stored on mobile devices has grown exponentially. However, current security schemas for mobile devices such as physiological biometrics and passwords are not secure enough to protect this information. Behavioral biometrics have been heavily researched as a possible solution to this security deficiency for mobile devices. This study aims to con… ▽ More In recent years the amount of secure information being stored on mobile devices has grown exponentially. However, current security schemas for mobile devices such as physiological biometrics and passwords are not secure enough to protect this information. Behavioral biometrics have been heavily researched as a possible solution to this security deficiency for mobile devices. This study aims to contribute to this innovative research by evaluating the performance of a multimodal behavioral biometric based user authentication scheme using touch dynamics and phone movement. This study uses a fusion of two popular publicly available datasets the Hand Movement Orientation and Grasp dataset and the BioIdent dataset. This study evaluates our model performance using three common machine learning algorithms which are Random Forest Support Vector Machine and K-Nearest Neighbor reaching accuracy rates as high as 82% with each algorithm performing respectively for all success metrics reported. △ Less

Submitted 21 January, 2022; originally announced January 2022.

arXiv:2201.01300 [pdf, other]

doi 10.3847/1538-4365/acbf47

The CAMELS project: public data release

Authors: Francisco Villaescusa-Navarro, Shy Genel, Daniel Anglés-Alcázar, Lucia A. Perez, Pablo Villanueva-Domingo, Digvijay Wadekar, Helen Shao, Faizan G. Mohammad, Sultan Hassan, Emily Moser, Erwin T. Lau, Luis Fernando Machado Poletti Valle, Andrina Nicola, Leander Thiele, Yongseok Jo, Oliver H. E. Philcox, Benjamin D. Oppenheimer, Megan Tillman, ChangHoon Hahn, Neerav Kaushal, Alice Pisani, Matthew Gebhardt, Ana Maria Delgado, Joyce Caliendo, Christina Kreisch , et al. (22 additional authors not shown)

Abstract: The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present… ▽ More The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogues, power spectra, bispectra, Lyman-$α$ spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over one thousand catalogues that contain billions of galaxies from CAMELS-SAM: a large collection of N-body simulations that have been combined with the Santa Cruz Semi-Analytic Model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies and summary statistics. We provide further technical details on how to access, download, read, and process the data at \url{https://camels.readthedocs.io}. △ Less

Submitted 4 January, 2022; originally announced January 2022.

Comments: 18 pages, 3 figures. More than 350 Tb of data from thousands of simulations publicly available at https://www.camel-simulations.org

arXiv:2112.01250 [pdf]

doi 10.12691/jcsa-9-1-3

The Benefits of Edge Computing in Healthcare, Smart Cities, and IoT

Authors: Rushit Dave, Naeem Seliya, Nyle Siddiqui

Abstract: Recent advancements in technology now allow for the generation of massive quantities of data. There is a growing need to transmit this data faster and more securely such that it cannot be accessed by malicious individuals. Edge computing has emerged in previous research as a method capable of improving data transmission times and security before the data ends up in the cloud. Edge computing has an… ▽ More Recent advancements in technology now allow for the generation of massive quantities of data. There is a growing need to transmit this data faster and more securely such that it cannot be accessed by malicious individuals. Edge computing has emerged in previous research as a method capable of improving data transmission times and security before the data ends up in the cloud. Edge computing has an impressive transmission speed based on fifth generation (5G) communication which transmits data with low latency and high bandwidth. While edge computing is sufficient to extract important features from the raw data to prevent large amounts of data requiring excessive bandwidth to be transmitted, cloud computing is used for the computational processes required for developing algorithms and modeling the data. Edge computing also improves the quality of the user experience by saving time and integrating quality of life (QoL) features. QoL features are important for the healthcare sector by helping to provide real-time feedback of data produced by healthcare devices back to patients for a faster recovery. Edge computing has better energy efficiency, can reduce the electricity cost, and in turn help people reduce their living expenses. This paper will take a detailed look into edge computing applications around Internet of Things (IoT) devices, smart city infrastructure, and benefits to healthcare. △ Less

Submitted 22 November, 2021; originally announced December 2021.

Journal ref: Journal of Computer Sciences and Applications. 2021, 9(1), 23-34

arXiv:2111.08683 [pdf, other]

doi 10.3847/1538-4357/ac7aa3

Inferring halo masses with Graph Neural Networks

Authors: Pablo Villanueva-Domingo, Francisco Villaescusa-Navarro, Daniel Anglés-Alcázar, Shy Genel, Federico Marinacci, David N. Spergel, Lars Hernquist, Mark Vogelsberger, Romeel Dave, Desika Narayanan

Abstract: Understanding the halo-galaxy connection is fundamental in order to improve our knowledge on the nature and properties of dark matter. In this work we build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts. In order to capture information from correlations among galaxy properties and their phase-space, we use Graph Neural Ne… ▽ More Understanding the halo-galaxy connection is fundamental in order to improve our knowledge on the nature and properties of dark matter. In this work we build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts. In order to capture information from correlations among galaxy properties and their phase-space, we use Graph Neural Networks (GNNs), that are designed to work with irregular and sparse data. We train our models on galaxies from more than 2,000 state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. Our model, that accounts for cosmological and astrophysical uncertainties, is able to constrain the masses of the halos with a $\sim$0.2 dex accuracy. Furthermore, a GNN trained on a suite of simulations is able to preserve part of its accuracy when tested on simulations run with a different code that utilizes a distinct subgrid physics model, showing the robustness of our method. The PyTorch Geometric implementation of the GNN is publicly available on Github at https://github.com/PabloVD/HaloGraphNet △ Less

Submitted 8 February, 2023; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: 20 pages, 8 figures, code publicly available at https://github.com/PabloVD/HaloGraphNet

Journal ref: ApJ 935 30 (2022)

arXiv:2110.15732 [pdf]

Named Entity Recognition in Unstructured Medical Text Documents

Authors: Cole Pearson, Naeem Seliya, Rushit Dave

Abstract: Physicians provide expert opinion to legal courts on the medical state of patients, including determining if a patient is likely to have permanent or non-permanent injuries or ailments. An independent medical examination (IME) report summarizes a physicians medical opinion about a patients health status based on the physicians expertise. IME reports contain private and sensitive information (Perso… ▽ More Physicians provide expert opinion to legal courts on the medical state of patients, including determining if a patient is likely to have permanent or non-permanent injuries or ailments. An independent medical examination (IME) report summarizes a physicians medical opinion about a patients health status based on the physicians expertise. IME reports contain private and sensitive information (Personally Identifiable Information or PII) that needs to be removed or randomly encoded before further research work can be conducted. In our study the IME is an orthopedic surgeon from a private practice in the United States. The goal of this research is to perform named entity recognition (NER) to identify and subsequently remove/encode PII information from IME reports prepared by the physician. We apply the NER toolkits of OpenNLP and spaCy, two freely available natural language processing platforms, and compare their precision, recall, and f-measure performance at identifying five categories of PII across trials of randomly selected IME reports using each models common default parameters. We find that both platforms achieve high performance (f-measure > 0.9) at de-identification and that a spaCy model trained with a 70-30 train-test data split is most performant. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.11080 [pdf]

Continuous Authentication Using Mouse Movements, Machine Learning, and Minecraft

Authors: Nyle Siddiqui, Rushit Dave, Naeem Seliya

Abstract: Mouse dynamics has grown in popularity as a novel irreproducible behavioral biometric. Datasets which contain general unrestricted mouse movements from users are sparse in the current literature. The Balabit mouse dynamics dataset produced in 2016 was made for a data science competition and despite some of its shortcomings, is considered to be the first publicly available mouse dynamics dataset. C… ▽ More Mouse dynamics has grown in popularity as a novel irreproducible behavioral biometric. Datasets which contain general unrestricted mouse movements from users are sparse in the current literature. The Balabit mouse dynamics dataset produced in 2016 was made for a data science competition and despite some of its shortcomings, is considered to be the first publicly available mouse dynamics dataset. Collecting mouse movements in a dull administrative manner as Balabit does may unintentionally homogenize data and is also not representative of realworld application scenarios. This paper presents a novel mouse dynamics dataset that has been collected while 10 users play the video game Minecraft on a desktop computer. Binary Random Forest (RF) classifiers are created for each user to detect differences between a specific users movements and an imposters movements. Two evaluation scenarios are proposed to evaluate the performance of these classifiers; one scenario outperformed previous works in all evaluation metrics, reaching average accuracy rates of 92%, while the other scenario successfully reported reduced instances of false authentications of imposters. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.07832 [pdf]

A Modern Analysis of Aging Machine Learning Based IoT Cybersecurity Methods

Authors: Sam Strecker, Rushit Dave, Nyle Siddiqui, Naeem Seliya

Abstract: Modern scientific advancements often contribute to the introduction and refinement of never-before-seen technologies. This can be quite the task for humans to maintain and monitor and as a result, our society has become reliant on machine learning to assist in this task. With new technology comes new methods and thus new ways to circumvent existing cyber security measures. This study examines the… ▽ More Modern scientific advancements often contribute to the introduction and refinement of never-before-seen technologies. This can be quite the task for humans to maintain and monitor and as a result, our society has become reliant on machine learning to assist in this task. With new technology comes new methods and thus new ways to circumvent existing cyber security measures. This study examines the effectiveness of three distinct Internet of Things cyber security algorithms currently used in industry today for malware and intrusion detection: Random Forest (RF), Support-Vector Machine (SVM), and K-Nearest Neighbor (KNN). Each algorithm was trained and tested on the Aposemat IoT-23 dataset which was published in January 2020 with the earliest of captures from 2018 and latest from 2019. The RF, SVM, and KNN reached peak accuracies of 92.96%, 86.23%, and 91.48%, respectively, in intrusion detection and 92.27%, 83.52%, and 89.80% in malware detection. It was found all three algorithms are capable of being effectively utilized for the current landscape of IoT cyber security in 2021. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.07826 [pdf]

Machine Learning Algorithms In User Authentication Schemes

Authors: Laura Pryor, Dr. Rushit Dave, Dr. Naeem Seliya, Dr. Evelyn R Sowells Boone

Abstract: In the past two decades, the number of mobile products being created by companies has grown exponentially. However, although these devices are constantly being upgraded with the newest features, the security measures used to protect these devices has stayed relatively the same over the past two decades. The vast difference in growth patterns between devices and their security is opening up the ris… ▽ More In the past two decades, the number of mobile products being created by companies has grown exponentially. However, although these devices are constantly being upgraded with the newest features, the security measures used to protect these devices has stayed relatively the same over the past two decades. The vast difference in growth patterns between devices and their security is opening up the risk for more and more devices to easily become infiltrated by nefarious users. Working off of previous work in the field, this study looks at the different Machine Learning algorithms used in user authentication schemes involving touch dynamics and device movement. This study aims to give a comprehensive overview of the current uses of different machine learning algorithms that are frequently used in user authentication schemas involving touch dynamics and device movement. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2109.10915 [pdf, other]

doi 10.3847/1538-4365/ac5ab0

The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence

Authors: Francisco Villaescusa-Navarro, Shy Genel, Daniel Angles-Alcazar, Leander Thiele, Romeel Dave, Desika Narayanan, Andrina Nicola, Yin Li, Pablo Villanueva-Domingo, Benjamin Wandelt, David N. Spergel, Rachel S. Somerville, Jose Manuel Zorrilla Matilla, Faizan G. Mohammad, Sultan Hassan, Helen Shao, Digvijay Wadekar, Michael Eickenberg, Kaze W. K. Wong, Gabriella Contardo, Yongseok Jo, Emily Moser, Erwin T. Lau, Luis Fernando Machado Poletti Valle, Lucia A. Perez , et al. (3 additional authors not shown)

Abstract: We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light year… ▽ More We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project. Designed to train machine learning models, CMD is the largest dataset of its kind containing more than 70 Terabytes of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: 17 pages, 1 figure. Third paper of a series of four. Hundreds of thousands of labeled 2D maps and 3D grids from thousands of simulated universes publicly available at https://camels-multifield-dataset.readthedocs.io

arXiv:2109.10360 [pdf, other]

Robust marginalization of baryonic effects for cosmological inference at the field level

Authors: Francisco Villaescusa-Navarro, Shy Genel, Daniel Angles-Alcazar, David N. Spergel, Yin Li, Benjamin Wandelt, Leander Thiele, Andrina Nicola, Jose Manuel Zorrilla Matilla, Helen Shao, Sultan Hassan, Desika Narayanan, Romeel Dave, Mark Vogelsberger

Abstract: We train neural networks to perform likelihood-free inference from $(25\,h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project. We show that the networks can extract information beyond one-point functions and power spectra from all resolved scales ($\gtrsim 100\,h^{-1}{\rm kpc}$) while performing a robust marginalizat… ▽ More We train neural networks to perform likelihood-free inference from $(25\,h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project. We show that the networks can extract information beyond one-point functions and power spectra from all resolved scales ($\gtrsim 100\,h^{-1}{\rm kpc}$) while performing a robust marginalization over baryonic physics at the field level: the model can infer the value of $Ω_{\rm m} (\pm 4\%)$ and $σ_8 (\pm 2.5\%)$ from simulations completely different to the ones used to train it. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: 7 pages, 4 figures. Second paper of a series of four. The 2D maps, codes, and network weights used in this paper are publicly available at https://camels-multifield-dataset.readthedocs.io

arXiv:2109.09747 [pdf, other]

Multifield Cosmology with Artificial Intelligence

Authors: Francisco Villaescusa-Navarro, Daniel Anglés-Alcázar, Shy Genel, David N. Spergel, Yin Li, Benjamin Wandelt, Andrina Nicola, Leander Thiele, Sultan Hassan, Jose Manuel Zorrilla Matilla, Desika Narayanan, Romeel Dave, Mark Vogelsberger

Abstract: Astrophysical processes such as feedback from supernovae and active galactic nuclei modify the properties and spatial distribution of dark matter, gas, and galaxies in a poorly understood way. This uncertainty is one of the main theoretical obstacles to extract information from cosmological surveys. We use 2,000 state-of-the-art hydrodynamic simulations from the CAMELS project spanning a wide vari… ▽ More Astrophysical processes such as feedback from supernovae and active galactic nuclei modify the properties and spatial distribution of dark matter, gas, and galaxies in a poorly understood way. This uncertainty is one of the main theoretical obstacles to extract information from cosmological surveys. We use 2,000 state-of-the-art hydrodynamic simulations from the CAMELS project spanning a wide variety of cosmological and astrophysical models and generate hundreds of thousands of 2-dimensional maps for 13 different fields: from dark matter to gas and stellar properties. We use these maps to train convolutional neural networks to extract the maximum amount of cosmological information while marginalizing over astrophysical effects at the field level. Although our maps only cover a small area of $(25~h^{-1}{\rm Mpc})^2$, and the different fields are contaminated by astrophysical effects in very different ways, our networks can infer the values of $Ω_{\rm m}$ and $σ_8$ with a few percent level precision for most of the fields. We find that the marginalization performed by the network retains a wealth of cosmological information compared to a model trained on maps from gravity-only N-body simulations that are not contaminated by astrophysical effects. Finally, we train our networks on multifields -- 2D maps that contain several fields as different colors or channels -- and find that not only they can infer the value of all parameters with higher accuracy than networks trained on individual fields, but they can constrain the value of $Ω_{\rm m}$ with higher accuracy than the maps from the N-body simulations. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: 11 pages, 7 figures. First paper of a series of four. All 2D maps, codes, and networks weights publicly available at https://camels-multifield-dataset.readthedocs.io

arXiv:2109.02695 [pdf]

IoT Security and Authentication schemes Based on Machine Learning: Review

Authors: Rushit Dave

Abstract: With the latest developments in technology, extra and extra human beings depend on their private gadgets to keep their touchy information. Concurrently, the surroundings in which these gadgets are linked have grown to grow to be greater dynamic and complex. This opens the dialogue of if the modern day authentication strategies being used in these gadgets are dependable ample to preserve these user… ▽ More With the latest developments in technology, extra and extra human beings depend on their private gadgets to keep their touchy information. Concurrently, the surroundings in which these gadgets are linked have grown to grow to be greater dynamic and complex. This opens the dialogue of if the modern day authentication strategies being used in these gadgets are dependable ample to preserve these user's records safe. This paper examines the distinct consumer authentication schemes proposed to make bigger the protection of exceptional devices. This article is break up into two one of a kind avenues discussing authentication schemes that use both behavioral biometrics or physical layer authentication. This survey will talk about each the blessings and challenges that occur with the accuracy, usability, and standard protection of computing device getting to know strategies in these authentication systems. This article targets to enhance in addition lookup in this subject via exhibiting the more than a few present day authentication models, their schematics, and their results. △ Less

Submitted 6 September, 2021; originally announced September 2021.

arXiv:2109.02692 [pdf]

Machine Learning: Challenges, Limitations, and Compatibility for Audio Restoration Processes

Authors: Owen Casey, Rushit Dave, Naeem Seliya, Evelyn R Sowells Boone

Abstract: In this paper machine learning networks are explored for their use in restoring degraded and compressed speech audio. The project intent is to build a new trained model from voice data to learn features of compression artifacting distortion introduced by data loss from lossy compression and resolution loss with an existing algorithm presented in SEGAN: Speech Enhancement Generative Adversarial Net… ▽ More In this paper machine learning networks are explored for their use in restoring degraded and compressed speech audio. The project intent is to build a new trained model from voice data to learn features of compression artifacting distortion introduced by data loss from lossy compression and resolution loss with an existing algorithm presented in SEGAN: Speech Enhancement Generative Adversarial Network. The resulting generator from the model was then to be used to restore degraded speech audio. This paper details an examination of the subsequent compatibility and operational issues presented by working with deprecated code, which obstructed the trained model from successfully being developed. This paper further serves as an examination of the challenges, limitations, and compatibility in the current state of machine learning. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 6 pages, 2 figures

arXiv:1803.07288

Face Recognition Techniques: A Survey

Authors: Raunak Dave, Ankit Vyas, Nikita P Desai

Abstract: Nowadays research has expanded to extracting auxiliary information from various biometric techniques like fingerprints, face, iris, palm and voice . This information contains some major features like gender, age, beard, mustache, scars, height, hair, skin color, glasses, weight, facial marks and tattoos. All this information contributes strongly to identification of human. The major challenges tha… ▽ More Nowadays research has expanded to extracting auxiliary information from various biometric techniques like fingerprints, face, iris, palm and voice . This information contains some major features like gender, age, beard, mustache, scars, height, hair, skin color, glasses, weight, facial marks and tattoos. All this information contributes strongly to identification of human. The major challenges that come across face recognition are to find age & gender of the person. This paper contributes a survey of various face recognition techniques for finding the age and gender. The existing techniques are discussed based on their performances. This paper also provides future directions for further research. △ Less

Submitted 30 January, 2021; v1 submitted 20 March, 2018; originally announced March 2018.

Comments: Work in progress

arXiv:1503.04194 [pdf, other]

ADS: The Next Generation Search Platform

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Roman Chyla, James Luker, Carolyn S. Grant, Donna M. Thompson, Alexandra Holachek, Rahul Dave, Stephen S. Murray

Abstract: Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse… ▽ More Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse and index full-text documents for all the major publications in Physics and Astronomy as well as many smaller Astronomy journals and arXiv e-prints, for a total of over 3.5 million papers. Our citation coverage has doubled since 2010 and now consists of over 70 million citations. We are normalizing the affiliation information in our records and, in collaboration with the CfA library and NASA, we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform which affect all aspects of the system and its operations. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language which supports fielded, unfielded and functional searches. We are currently able to index acknowledgments, affiliations, citations, funding sources, and to the extent that these metadata are available to us they are now searchable under our new platform. The ADS private library system is being enhanced to support reading groups, collaborative editing of lists of papers, tagging, and a variety of privacy settings when managing one's paper collection. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/ △ Less

Submitted 13 March, 2015; originally announced March 2015.

Comments: Submitted to Library and Information Services in Astronomy VII, Naples, Italy

arXiv:1207.6448 [pdf]

Query Optimization Over Web Services Using A Mixed Approach

Authors: Debajyoti Mukhopadhyay, Dhaval Chandarana, Rutvi Dave, Sharyu Page, Shikha Gupta

Abstract: A Web Service Management System (WSMS) can be well-thought-out as a consistent and a secure way of managing the web services. Web Service has become a quintessential part of the web world, managing and sharing the resources of the business it is associated with. In this paper, we focus on the query optimization aspect of handling the "natural language" query, queried to the WSMS. The map-select-co… ▽ More A Web Service Management System (WSMS) can be well-thought-out as a consistent and a secure way of managing the web services. Web Service has become a quintessential part of the web world, managing and sharing the resources of the business it is associated with. In this paper, we focus on the query optimization aspect of handling the "natural language" query, queried to the WSMS. The map-select-composite operations are piloted to select specific web services. The main aftermath of our research is ensued in an algorithm which uses cost-based as well as heuristic based approach for query optimization. Query plan is formed after cost-based evaluation and using Greedy algorithm. The heuristic based approach further optimizes the evaluation plan. This scheme not only guarantees an optimal solution, which has a minimum diversion from the ideal solution, but also saves time which is otherwise utilized in generating various query plans using many mathematical models and then evaluating each one. △ Less

Submitted 27 July, 2012; originally announced July 2012.

Comments: 10 pages, 1 figure

Journal ref: Advances in Computing & Inf. Technology, AISC 178, pp.381-389, 2012

arXiv:1103.5958 [pdf, other]

Semantic Interlinking of Resources in the Virtual Observatory Era

Authors: Alberto Accomazzi, Rahul Dave

Abstract: In the coming era of data-intensive science, it will be increasingly important to be able to seamlessly move between scientific results, the data analyzed in them, and the processes used to produce them. As observations, derived data products, publications, and object metadata are curated by different projects and archived in different locations, establishing the proper linkages between these reso… ▽ More In the coming era of data-intensive science, it will be increasingly important to be able to seamlessly move between scientific results, the data analyzed in them, and the processes used to produce them. As observations, derived data products, publications, and object metadata are curated by different projects and archived in different locations, establishing the proper linkages between these resources and describing their relationships becomes an essential activity in their curation and preservation. In this paper we describe initial efforts to create a semantic knowledge base allowing easier integration and linking of the body of heterogeneous astronomical resources which we call the Virtual Observatory (VO). The ultimate goal of this effort is the creation of a semantic layer over existing resources, allowing applications to cross boundaries between archives. The proposed approach follows the current best practices in Semantic Computing and the architecture of the web, allowing the use of off-the-shelf technologies and providing a path for VO resources to become part of the global web of linked data. △ Less

Submitted 30 March, 2011; originally announced March 2011.

Comments: 10 pages, 3 figures, to appear in: ASPC 442 (2011), Proceedings of Astronomical Data Analysis Software and Systems XX

Showing 1–45 of 45 results for author: Dave, R