Search | arXiv e-print repository

Modularity in Transformers: Investigating Neuron Separability & Specialization

Authors: Nicholas Pochinkov, Thomas Jones, Mohammed Rashidur Rahman

Abstract: Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze… ▽ More Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets. Our findings reveal evidence of task-specific neuron clusters, with varying degrees of overlap between related tasks. We observe that neuron importance patterns persist to some extent even in randomly initialized models, suggesting an inherent structure that training refines. Additionally, we find that neuron clusters identified through MoEfication correspond more strongly to task-specific neurons in earlier and later layers of the models. This work contributes to a more nuanced understanding of transformer internals and offers insights into potential avenues for improving model interpretability and efficiency. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 11 pages, 6 figures

MSC Class: 68T07 (Primary) 68Q32; 68T05 (Secondary) ACM Class: I.2.4; I.2.6; I.2.7

arXiv:2408.16080 [pdf, other]

Striking the Right Balance: Systematic Assessment of Evaluation Method Distribution Across Contribution Types

Authors: Feng Lin, Arran Zeyu Wang, Md Dilshadur Rahman, Danielle Albers Szafir, Ghulam Jilani Quadri

Abstract: In the rapidly evolving field of information visualization, rigorous evaluation is essential for validating new techniques, understanding user interactions, and demonstrating the effectiveness and usability of visualizations. Faithful evaluations provide valuable insights into how users interact with and perceive the system, enabling designers to identify potential weaknesses and make informed dec… ▽ More In the rapidly evolving field of information visualization, rigorous evaluation is essential for validating new techniques, understanding user interactions, and demonstrating the effectiveness and usability of visualizations. Faithful evaluations provide valuable insights into how users interact with and perceive the system, enabling designers to identify potential weaknesses and make informed decisions about design choices and improvements. However, an emerging trend of multiple evaluations within a single research raises critical questions about the sustainability, feasibility, and methodological rigor of such an approach. New researchers and students, influenced by this trend, may believe -- multiple evaluations are necessary for a study, regardless of the contribution types. However, the number of evaluations in a study should depend on its contributions and merits, not on the trend of including multiple evaluations to strengthen a paper. So, how many evaluations are enough? This is a situational question and cannot be formulaically determined. Our objective is to summarize current trends and patterns to assess the distribution of evaluation methods over different paper contribution types. In this paper, we identify this trend through a non-exhaustive literature survey of evaluation patterns in 214 papers in the two most recent years' VIS issues in IEEE TVCG from 2023 and 2024. We then discuss various evaluation strategy patterns in the information visualization field to guide practical choices and how this paper will open avenues for further discussion. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15426 [pdf]

Electron FLASH platform for pre-clinical research: LINAC modification, simplification of pulse control and dosimetry

Authors: Banghao Zhou, Lixiang Guo, Weiguo Lu, Mahbubur Rahman, Rongxiao Zhang, Varghese Anto Chirayath, Yang Kyun Park, Strahinja Stojadinovic, Marvin Garza, Ken Kang-Hsin Wang

Abstract: Background: FLASH radiotherapy is a treatment regime that delivers therapeutic dose to tumors at an ultra-high dose rate while maintaining adequate normal tissue sparing. However, a comprehensive understanding of the underlying mechanisms, potential late toxicities, and optimal fractionation schemes is important for successful clinical translation. This has necessitated extensive pre-clinical inve… ▽ More Background: FLASH radiotherapy is a treatment regime that delivers therapeutic dose to tumors at an ultra-high dose rate while maintaining adequate normal tissue sparing. However, a comprehensive understanding of the underlying mechanisms, potential late toxicities, and optimal fractionation schemes is important for successful clinical translation. This has necessitated extensive pre-clinical investigations, leading several research institutions to initiate dedicated FLASH research programs. Purpose: This work describes a workflow for establishing an easily accessible electron FLASH (eFLASH) platform. The platform incorporates simplified pulse control, optimized dose rate delivery, and validated Monte Carlo (MC) dose engine for accurate in vivo dosimetry dedicated to FLASH pre-clinical studies. Methods: Adjustment of the automatic frequency control (AFC) module allowed us to optimize the LINAC pulse form to achieve a uniform dose rate. A MC model for the 6 MeV FLASH beam was commissioned to ensure accurate dose calculation necessary for reproducible in vivo studies. Results: Optimizing the AFC module enabled the generation of a uniform pulse form, ensuring consistent dose per pulse and a uniform dose rate throughout FLASH irradiation. The MC model closely agreed with film measurements. MC dose calculations indicated that 6 MeV FLASH is adequate to achieve a uniform dose distribution for mouse whole brain irradiation but may not be optimal for the spinal cord study. Conclusions: We present a novel workflow for establishing a LINAC-based eFLASH research platform, incorporating techniques for optimized dose rate delivery, a simplified pulse control system, and validated MC engine. This work provides researchers with valuable new approaches to facilitate the development of robust and accessible LINAC-based system for FLASH studies. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 29 pages, 6 figures

arXiv:2408.14080 [pdf, other]

SONICS: Synthetic Or Not -- Identifying Counterfeit Songs

Authors: Md Awsafur Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, Shaikh Anowarul Fattah

Abstract: The recent surge in AI-generated songs presents exciting possibilities and challenges. While these tools democratize music creation, they also necessitate the ability to distinguish between human-composed and AI-generated songs for safeguarding artistic integrity and content curation. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where… ▽ More The recent surge in AI-generated songs presents exciting possibilities and challenges. While these tools democratize music creation, they also necessitate the ability to distinguish between human-composed and AI-generated songs for safeguarding artistic integrity and content curation. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, this approach is inadequate for contemporary end-to-end AI-generated songs where all components (vocals, lyrics, music, and style) could be AI-generated. Additionally, existing datasets lack lyrics-music diversity, long-duration songs, and open fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs with over 49k synthetic songs from popular platforms like Suno and Udio. Furthermore, we highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection, an aspect overlooked in existing methods. To capture these patterns, we propose a novel model, SpecTTTra, that is up to 3 times faster and 6 times more memory efficient compared to popular CNN and Transformer-based models while maintaining competitive performance. Finally, we offer both AI-based and Human evaluation benchmarks, addressing another deficiency in current research. △ Less

Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13936 [pdf, other]

OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation

Authors: Muhammad Rameez ur Rahman, Piero Simonetto, Anna Polato, Francesco Pasti, Luca Tonin, Sebastiano Vascon

Abstract: Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, fo… ▽ More Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, followed by depth isolation and point cloud construction to create 3D bounding boxes. The smart wheelchair exploits these 3D bounding boxes to identify potential targets and navigate safely. We demonstrate OpenNav's performance through experiments on the Replica dataset and we report preliminary results with a real wheelchair. OpenNav improves state-of-the-art significantly on the Replica dataset at mAP25 (+9pts) and mAP50 (+5pts) with marginal improvement at mAP. The code is publicly available at this link: https://github.com/EasyWalk-PRIN/OpenNav. △ Less

Submitted 25 August, 2024; originally announced August 2024.

Comments: ECCVW

arXiv:2408.11260 [pdf]

Bi3+ Doped Nanocrystalline Ni-Co-Zn Spinel Ferrites: Tuning of Physical, Electrical, Dielectric and Magnetic Properties for Advanced Spintronics Applications

Authors: Md. Mahfuzur Rahman, Nazmul Hasan, Sumaiya Tabassum, M. Harun-Or-Rashid, Md. Harunur Rashid, Md. Arifuzzaman

Abstract: This study reports the synthesis and characterization of nanocrystalline Ni0.5Co0.2Zn0.3BixFe2-xO4 x varis by 0.0, 0.025, 0.050, 0.075, 0.100 ferrites synthesized via the sol-gel auto combustion method.The low coercivity values 23.68 to 87.71 Oe are observed,classifying the investigated materials as soft ferromagnetic.The increased magnetic anisotropy K through Bi3+ doping indicates tunable stabil… ▽ More This study reports the synthesis and characterization of nanocrystalline Ni0.5Co0.2Zn0.3BixFe2-xO4 x varis by 0.0, 0.025, 0.050, 0.075, 0.100 ferrites synthesized via the sol-gel auto combustion method.The low coercivity values 23.68 to 87.71 Oe are observed,classifying the investigated materials as soft ferromagnetic.The increased magnetic anisotropy K through Bi3+ doping indicates tunable stability in magnetic orientations,making them suitable for multifunctional applications. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.10329 [pdf, other]

Hysteretic response to different modes of ramping an external field in sparse and dense Ising spin glasses

Authors: Mahajabin Rahman, Stefan Boettcher

Abstract: We consider the hysteretic behavior of Ising spin glasses at $T=0$ for various modes of driving. Previous studies mostly focused on an infinitely slow speed $\dot{H}$ by which the external field $H$ was ramped to trigger avalanches of spin flips by starting with destabilizing a single spin while few have focused on the effect of different driving methods. First, we show that this conventional prot… ▽ More We consider the hysteretic behavior of Ising spin glasses at $T=0$ for various modes of driving. Previous studies mostly focused on an infinitely slow speed $\dot{H}$ by which the external field $H$ was ramped to trigger avalanches of spin flips by starting with destabilizing a single spin while few have focused on the effect of different driving methods. First, we show that this conventional protocol imposes a system size dependence. Then, we numerically analyze the response of Ising spin glasses at rates $\dot{H}$ that are fixed as well, to elucidate the differences in the response. Specifically, we compare three different modes of ramping ($\dot{H}=c/N$, $\dot{H}=c/\sqrt{N}$, and $\dot{H}=c$ for constant $c$) for two types of spin glass systems of size $N$, representing dense networks by the Sherrington-Kirkpatrick model and sparse networks by the lattice spin glass in $d=3$ dimensions known as the Edwards Anderson model. Depending on the mode of ramping, we find that the response of each system, in form of spin-flip avalanches and other observables, can vary considerably. In particular, in the $N$-independent mode applied to the lattice spin glass, which is closest to experimental reality, we observe a percolation transition with a broad avalanche distribution between phases of localized and system-spanning responses. We explore implications for combinatorial optimization problems pertaining to sparse systems. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 8 pages, 7 figures, RevTex, for related work, see https://physics.emory.edu/faculty/boettcher/

arXiv:2408.09513 [pdf, other]

ECG-Free Assessment of Cardiac Valve Events Using Seismocardiography

Authors: Mohammad Muntasir Rahman, Aysha Mann, Amirtaha Taebi

Abstract: Seismocardiogram (SCG) signals can play a crucial role in remote cardiac monitoring, capturing important events such as aortic valve opening (AO) and mitral valve closure (MC). However, existing SCG methods for detecting AO and MC typically rely on electrocardiogram (ECG) data. In this study, we propose an innovative approach to identify AO and MC events in SCG signals without the need for ECG inf… ▽ More Seismocardiogram (SCG) signals can play a crucial role in remote cardiac monitoring, capturing important events such as aortic valve opening (AO) and mitral valve closure (MC). However, existing SCG methods for detecting AO and MC typically rely on electrocardiogram (ECG) data. In this study, we propose an innovative approach to identify AO and MC events in SCG signals without the need for ECG information. Our method utilized a template bank, which consists of signal templates extracted from SCG waveforms of 5 healthy subjects. These templates represent characteristic features of a heart cycle. When analyzing new, unseen SCG signals from another group of 6 healthy subjects, we employ these templates to accurately detect cardiac cycles and subsequently pinpoint AO and MC events. Our results demonstrate the effectiveness of the proposed template bank approach in achieving ECG-independent AO and MC detection, laying the groundwork for more convenient remote cardiovascular assessment. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.09512 [pdf, other]

Contactless seismocardiography via Gunnar-Farneback optical flow

Authors: Mohammad Muntasir Rahman, Amirtaha Taebi

Abstract: Seismocardiography (SCG) has gained significant attention due to its potential applications in monitoring cardiac health and diagnosing cardiovascular conditions. Conventional SCG methods rely on accelerometers attached to the chest, which can be uncomfortable or inconvenient. In recent years, researchers have explored non-contact methods to capture SCG signals, and one promising approach involves… ▽ More Seismocardiography (SCG) has gained significant attention due to its potential applications in monitoring cardiac health and diagnosing cardiovascular conditions. Conventional SCG methods rely on accelerometers attached to the chest, which can be uncomfortable or inconvenient. In recent years, researchers have explored non-contact methods to capture SCG signals, and one promising approach involves analyzing video recordings of the chest. In this study, we investigate a vision-based method based on the Gunnar-Farneback optical flow to extract SCG signals from the chest skin movements recorded by a smartphone camera. We compared the SCG signals extracted from the chest videos of four healthy subjects with those obtained from accelerometers and our previous method based on sticker tracking. Our results demonstrated that the vision-based SCG signals extracted by the proposed method closely resembled those from accelerometers and stickers, although these signals were captured from slightly different locations. The mean squared error between the vision-based SCG signals and accelerometer-based signals was found to be within a reasonable range, especially between signals on head-to-foot direction (0.2$<$MSE$<$1.5). Additionally, heart rates derived from the vision-based SCG exhibited good agreement with the gold-standard ECG measurements, with a mean difference of 0.8 bpm. These results indicate the potential of this non-invasive method in health monitoring and diagnostics. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.09424 [pdf, other]

OVOSE: Open-Vocabulary Semantic Segmentation in Event-Based Cameras

Authors: Muhammad Rameez Ur Rahman, Jhony H. Giraldo, Indro Spinelli, Stéphane Lathuilière, Fabio Galasso

Abstract: Event cameras, known for low-latency operation and superior performance in challenging lighting conditions, are suitable for sensitive computer vision tasks such as semantic segmentation in autonomous driving. However, challenges arise due to limited event-based data and the absence of large-scale segmentation benchmarks. Current works are confined to closed-set semantic segmentation, limiting the… ▽ More Event cameras, known for low-latency operation and superior performance in challenging lighting conditions, are suitable for sensitive computer vision tasks such as semantic segmentation in autonomous driving. However, challenges arise due to limited event-based data and the absence of large-scale segmentation benchmarks. Current works are confined to closed-set semantic segmentation, limiting their adaptability to other applications. In this paper, we introduce OVOSE, the first Open-Vocabulary Semantic Segmentation algorithm for Event cameras. OVOSE leverages synthetic event data and knowledge distillation from a pre-trained image-based foundation model to an event-based counterpart, effectively preserving spatial context and transferring open-vocabulary semantic segmentation capabilities. We evaluate the performance of OVOSE on two driving semantic segmentation datasets DDD17, and DSEC-Semantic, comparing it with existing conventional image open-vocabulary models adapted for event-based data. Similarly, we compare OVOSE with state-of-the-art methods designed for closed-set settings in unsupervised domain adaptation for event-based semantic segmentation. OVOSE demonstrates superior performance, showcasing its potential for real-world applications. The code is available at https://github.com/ram95d/OVOSE. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: conference

arXiv:2408.09005 [pdf]

Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease

Authors: Nayeem Ahmed, Md Maruf Rahman, Md Fatin Ishrak, Md Imran Kabir Joy, Md Sanowar Hossain Sabuj, Md. Sadekur Rahman

Abstract: This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation wer… ▽ More This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation were used. The models were trained with similar parameters, activation function, classification function, and optimizer to compare performance. To determine class separation effectiveness, each model was evaluated on accuracy, precision, recall, and F1-score. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications. InceptionV3 and DenseNet121 both performed well in keratoconus detection, but they had trouble with questionable cases. In contrast, EfficientNetB0, ResNet50, and VGG19 had more difficulty distinguishing dubious cases from regular ones, indicating the need for model refining and development. A detailed comparison of state-of-the-art CNN architectures for automated keratoconus identification reveals each model's benefits and weaknesses. This study shows that advanced deep learning models can enhance keratoconus diagnosis and treatment planning. Future research should explore hybrid models and integrate clinical parameters to improve diagnostic accuracy and robustness in real-world clinical applications, paving the way for more effective AI-driven ophthalmology tools. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 14 pages, 3 tables, 27 figures

ACM Class: I.4.m

arXiv:2408.08327 [pdf, other]

Nanoscale Surfactant Transport: Bridging Molecular and Continuum Models

Authors: Muhammad Rizwanur Rahman, James P. Ewen, Li Shen, D. M. Heyes, Daniele Dini, E. R. Smith

Abstract: Surfactant transport is central to a diverse range of natural phenomena, and for many practical applications in physics and engineering. Surprisingly, this process remains relatively poorly understood at the molecular scale. This study investigates the mechanism behind the transport of surfactant monolayers on flat and curved liquid vapor interfaces using nonequilibrium molecular dynamics simulati… ▽ More Surfactant transport is central to a diverse range of natural phenomena, and for many practical applications in physics and engineering. Surprisingly, this process remains relatively poorly understood at the molecular scale. This study investigates the mechanism behind the transport of surfactant monolayers on flat and curved liquid vapor interfaces using nonequilibrium molecular dynamics simulations, which are compared with the continuum transport model. This approach not only provides fresh molecular level insight into surfactant dynamics, but also confirms the nanoscale mechanism of the lateral migration of surfactant molecules along a thin film that continuously deforms as surfactants spread. By connecting the continuum model where the long wave approximations prevail, to the molecular details where such approximations break down, we establish that the transport equation preserves substantial accuracy in capturing the underlying physics. Moreover, the relative importance of the different mechanisms of the transport process are identified. Consequently, we derive a novel, exact molecular equation for surfactant transport along a deforming surface. Finally, our findings demonstrate that the spreading of surfactants at the molecular scale adheres to expected scaling laws and aligns well with experimental observations. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: Submitted for journal publication

arXiv:2408.08261 [pdf, other]

mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis

Authors: Dae-young Kim, Rebecca Hwa, Muhammad Mahbubur Rahman

Abstract: This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT ou… ▽ More This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT outperformed larger models and matched the performance of models trained on significantly more data. The key contributions include integrating diverse mental health data, creating a custom tokenizer, and optimizing a smaller architecture for low-resource settings. This research could advance AI-driven mental health care, especially in areas with limited computing power. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.05449 [pdf]

Unidirectional imaging with partially coherent light

Authors: Guangdong Ma, Che-Yung Shen, Jingxi Li, Luzhe Huang, Cagatay Isil, Fazil Onuralp Ardic, Xilin Yang, Yuhang Li, Yuntian Wang, Md Sadman Sakib Rahman, Aydogan Ozcan

Abstract: Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting th… ▽ More Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting the image formation in the backward direction (B->A) along with low power efficiency. Our reciprocal design features a set of spatially engineered linear diffractive layers that are statistically optimized for partially coherent illumination with a given phase correlation length. Our analyses reveal that when illuminated by a partially coherent beam with a correlation length of ~1.5 w or larger, where w is the wavelength of light, diffractive unidirectional imagers achieve robust performance, exhibiting asymmetric imaging performance between the forward and backward directions - as desired. A partially coherent unidirectional imager designed with a smaller correlation length of less than 1.5 w still supports unidirectional image transmission, but with a reduced figure of merit. These partially coherent diffractive unidirectional imagers are compact (axially spanning less than 75 w), polarization-independent, and compatible with various types of illumination sources, making them well-suited for applications in asymmetric visual information processing and communication. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: 25 Pages, 8 Figures

arXiv:2408.02825 [pdf, other]

On the Variability of AI-based Software Systems Due to Environment Configurations

Authors: Musfiqur Rahman, SayedHassan Khatoonabadi, Ahmad Abdellatif, Haya Samaana, Emad Shihab

Abstract: [Context] Nowadays, many software systems include Artificial Intelligence (AI) components and changes in the development environment have been known to induce variability in an AI-based system. [Objective] However, how an environment configuration impacts the variability of these systems is yet to be explored. Understanding and quantifying the degree of variability due to such configurations can h… ▽ More [Context] Nowadays, many software systems include Artificial Intelligence (AI) components and changes in the development environment have been known to induce variability in an AI-based system. [Objective] However, how an environment configuration impacts the variability of these systems is yet to be explored. Understanding and quantifying the degree of variability due to such configurations can help practitioners decide the best environment configuration for the most stable AI products. [Method] To achieve this goal, we performed experiments with eight different combinations of three key environment variables (operating system, Python version, and CPU architecture) on 30 open-source AI-based systems using the Travis CI platform. We evaluate variability using three metrics: the output of an AI component like an ML model (performance), the time required to build and run a system (processing time), and the cost associated with building and running a system (expense). [Results] Our results indicate that variability exists in all three metrics; however, it is observed more frequently with respect to processing time and expense than performance. For example, between Linux and MacOS, variabilities are observed in 23%, 96.67%, and 100% of the studied projects in performance, processing time, and expense, respectively. [Conclusion] Our findings underscore the importance of identifying the optimal combination of configuration settings to mitigate performance drops and reduce retraining time and cost before deploying an AI-based system. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: Submitted to the Information and Software Technology journal for review

arXiv:2408.00984 [pdf, other]

GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging

Authors: Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman

Abstract: DNA methylation is a crucial epigenetic marker used in various clocks to predict epigenetic age. However, many existing clocks fail to account for crucial information about CpG sites and their interrelationships, such as co-methylation patterns. We present a novel approach to represent methylation data as a graph, using methylation values and relevant information about CpG sites as nodes, and rela… ▽ More DNA methylation is a crucial epigenetic marker used in various clocks to predict epigenetic age. However, many existing clocks fail to account for crucial information about CpG sites and their interrelationships, such as co-methylation patterns. We present a novel approach to represent methylation data as a graph, using methylation values and relevant information about CpG sites as nodes, and relationships like co-methylation, same gene, and same chromosome as edges. We then use a Graph Neural Network (GNN) to predict age. Thus our model, GraphAge, leverages both structural and positional information for prediction as well as better interpretation. Although we had to train in a constrained compute setting, GraphAge still showed competitive performance with a Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277, slightly outperforming the current state of the art. Perhaps more importantly, we utilized GNN explainer for interpretation purposes and were able to unearth interesting insights (e.g., key CpG sites, pathways, and their relationships through Methylation Regulated Networks in the context of aging), which were not possible to 'decode' without leveraging the unique capability of GraphAge to 'encode' various structural relationships. GraphAge has the potential to consume and utilize all relevant information (if available) about an individual that relates to the complex process of aging. So, in that sense, it is one of its kind and can be seen as the first benchmark for a multimodal model that can incorporate all this information in order to close the gap in our understanding of the true nature of aging. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2408.00118 [pdf, other]

Gemma 2: Improving Open Language Models at a Practical Size

Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community. △ Less

Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

arXiv:2407.18847 [pdf, other]

Enhancing material property prediction with ensemble deep graph convolutional networks

Authors: Chowdhury Mohammad Abid Rahman, Ghadendra Bhandari, Nasser M Nasrabadi, Aldo H. Romero, Prashnna K. Gyawali

Abstract: Machine learning (ML) models have emerged as powerful tools for accelerating materials discovery and design by enabling accurate predictions of properties from compositional and structural data. These capabilities are vital for developing advanced technologies across fields such as energy, electronics, and biomedicine, potentially reducing the time and resources needed for new material exploration… ▽ More Machine learning (ML) models have emerged as powerful tools for accelerating materials discovery and design by enabling accurate predictions of properties from compositional and structural data. These capabilities are vital for developing advanced technologies across fields such as energy, electronics, and biomedicine, potentially reducing the time and resources needed for new material exploration and promoting rapid innovation cycles. Recent efforts have focused on employing advanced ML algorithms, including deep learning - based graph neural network, for property prediction. Additionally, ensemble models have proven to enhance the generalizability and robustness of ML and DL. However, the use of such ensemble strategies in deep graph networks for material property prediction remains underexplored. Our research provides an in-depth evaluation of ensemble strategies in deep learning - based graph neural network, specifically targeting material property prediction tasks. By testing the Crystal Graph Convolutional Neural Network (CGCNN) and its multitask version, MT-CGCNN, we demonstrated that ensemble techniques, especially prediction averaging, substantially improve precision beyond traditional metrics for key properties like formation energy per atom ($ΔE^{f}$), band gap ($E_{g}$) and density ($ρ$) in 33,990 stable inorganic materials. These findings support the broader application of ensemble methods to enhance predictive accuracy in the field. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 9 pages, 6 figures, 2 tables

arXiv:2407.14671 [pdf]

DefTesPY: Cyber defense model with enhanced data modeling and analysis for Tesla company via Python Language

Authors: Naresh Kshetri, Irin Sultana, Mir Mehedi Rahman, Darshana Shah

Abstract: Several types of cyber-attacks on automobiles and business firms keep on rising as we are preparing to counter cybercrimes with several new technologies and defense models. Cyber defense (also, counter intelligence) is a computer network defense mechanism that involves response to activities, critical infrastructure protection, and information assurance for corporations, government bodies, and oth… ▽ More Several types of cyber-attacks on automobiles and business firms keep on rising as we are preparing to counter cybercrimes with several new technologies and defense models. Cyber defense (also, counter intelligence) is a computer network defense mechanism that involves response to activities, critical infrastructure protection, and information assurance for corporations, government bodies, and other conceivable networks. Cyber defense focuses on preventing, detecting, and responding to assaults or threats in a timely manner so that no infrastructure or information is compromised. With the increasing volume and complexity of cyber threats, most companies need cyber defense to protect sensitive information and assets. We can control attacker actions by utilizing firewalls at different levels, an intrusion detection system (IDS), with the intrusion prevention system (IPS) which can be installed independently or in combination with other protection approaches. Tesla is an American clean energy and automotive company in Austin, Texas, USA. The recent data breach at Tesla affected over 75,000 individuals as the company pinpoints two former employees as the offender revealing more than 23,000 internal files from 2015 to 2022. In this work, we will emphasize data modeling and data analysis using cyber defense model and python with a survey of the Tesla company. We have proposed a defense model, DefTesPY, with enhanced data modeling and data analysis based on the encountered cyber-attacks and cybercrimes for Tesla company till date. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: 11 pages, 4 figures

arXiv:2407.13742 [pdf, other]

CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications

Authors: Mirza Masfiqur Rahman, Imtiaz Karim, Elisa Bertino

Abstract: In recent years, there has been a growing focus on scrutinizing the security of cellular networks, often attributing security vulnerabilities to issues in the underlying protocol design descriptions. These protocol design specifications, typically extensive documents that are thousands of pages long, can harbor inaccuracies, underspecifications, implicit assumptions, and internal inconsistencies.… ▽ More In recent years, there has been a growing focus on scrutinizing the security of cellular networks, often attributing security vulnerabilities to issues in the underlying protocol design descriptions. These protocol design specifications, typically extensive documents that are thousands of pages long, can harbor inaccuracies, underspecifications, implicit assumptions, and internal inconsistencies. In light of the evolving landscape, we introduce CellularLint--a semi-automatic framework for inconsistency detection within the standards of 4G and 5G, capitalizing on a suite of natural language processing techniques. Our proposed method uses a revamped few-shot learning mechanism on domain-adapted large language models. Pre-trained on a vast corpus of cellular network protocols, this method enables CellularLint to simultaneously detect inconsistencies at various levels of semantics and practical use cases. In doing so, CellularLint significantly advances the automated analysis of protocol specifications in a scalable fashion. In our investigation, we focused on the Non-Access Stratum (NAS) and the security specifications of 4G and 5G networks, ultimately uncovering 157 inconsistencies with 82.67% accuracy. After verification of these inconsistencies on open-source implementations and 17 commercial devices, we confirm that they indeed have a substantial impact on design decisions, potentially leading to concerns related to privacy, integrity, availability, and interoperability. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted at USENIX Security 24

arXiv:2407.13699 [pdf, other]

A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice

Authors: Shaina Raza, Mizanur Rahman, Safiullah Kamawal, Armin Toroghi, Ananya Raval, Farshad Navah, Amirmohammad Kazemeini

Abstract: Recommender Systems (RS) play an integral role in enhancing user experiences by providing personalized item suggestions. This survey reviews the progress in RS inclusively from 2017 to 2024, effectively connecting theoretical advances with practical applications. We explore the development from traditional RS techniques like content-based and collaborative filtering to advanced methods involving d… ▽ More Recommender Systems (RS) play an integral role in enhancing user experiences by providing personalized item suggestions. This survey reviews the progress in RS inclusively from 2017 to 2024, effectively connecting theoretical advances with practical applications. We explore the development from traditional RS techniques like content-based and collaborative filtering to advanced methods involving deep learning, graph-based models, reinforcement learning, and large language models. We also discuss specialized systems such as context-aware, review-based, and fairness-aware RS. The primary goal of this survey is to bridge theory with practice. It addresses challenges across various sectors, including e-commerce, healthcare, and finance, emphasizing the need for scalable, real-time, and trustworthy solutions. Through this survey, we promote stronger partnerships between academic research and industry practices. The insights offered by this survey aim to guide industry professionals in optimizing RS deployment and to inspire future research directions, especially in addressing emerging technological and societal trends △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: we quarterly update of this literature

arXiv:2407.09187 [pdf]

Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings

Authors: Saad Ahmed Sazan, Mahdi H. Miraz, A B M Muntasir Rahman

Abstract: Due to massive adoption of social media, detection of users' depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by… ▽ More Due to massive adoption of social media, detection of users' depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts, ensuring high-quality data for model training and evaluation. To address the prevalent issue of class imbalance, we utilised random oversampling for the minority class, thereby enhancing the model's ability to accurately detect depressive posts. We explored various numerical representation techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT) embedding and FastText embedding, by integrating them with a deep learning-based Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model. The results obtained through extensive experimentation, indicate that the BERT approach performed better the others, achieving a F1-score of 84%. This indicates that BERT, in combination with the CNN-BiLSTM architecture, effectively recognises the nuances of Bangla texts relevant to depressive contents. Comparative analysis with the existing state-of-the-art methods demonstrates that our approach with BERT embedding performs better than others in terms of evaluation metrics and the reliability of dataset annotations. Our research significantly contribution to the development of reliable tools for detecting depressive posts in the Bangla language. By highlighting the efficacy of different embedding techniques and deep learning models, this study paves the way for improved mental health monitoring through social media platforms. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07452 [pdf]

Missile detection and destruction robot using detection algorithm

Authors: Md Kamrul Siam, Shafayet Ahmed, Md Habibur Rahman, Amir Hossain Mollah

Abstract: This research is based on the present missile detection technologies in the world and the analysis of these technologies to find a cost effective solution to implement the system in Bangladesh. The paper will give an idea of the missile detection technologies using the electro-optical sensor and the pulse doppler radar. The system is made to detect the target missile. Automatic detection and destr… ▽ More This research is based on the present missile detection technologies in the world and the analysis of these technologies to find a cost effective solution to implement the system in Bangladesh. The paper will give an idea of the missile detection technologies using the electro-optical sensor and the pulse doppler radar. The system is made to detect the target missile. Automatic detection and destruction with the help of ultrasonic sonar, a metal detector sensor, and a smoke detector sensor. The system is mainly based on an ultrasonic sonar sensor. It has a transducer, a transmitter, and a receiver. Transducer is connected with the connected with controller. When it detects an object by following the algorithm, it finds its distance and angle. It can also assure whether the system can destroy the object or not by using another algorithm's simulation. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: 67 pages

arXiv:2407.05461 [pdf, other]

CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks

Authors: Md Sazedur Rahman, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong

Abstract: The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Alth… ▽ More The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Although several anomaly detection (AD) approaches for CAV networks are proposed, they often fail to: i) detect multiple anomalies in specific sensor(s) with high accuracy or F1 score, and ii) identify the specific sensor being attacked. In response, this paper proposes a novel framework tailored to CAV networks, called CAV-AD, for distinguishing abnormal readings amidst multiple anomaly data while identifying malicious sensors. Specifically, CAV-AD comprises two main components: i) A novel CNN model architecture called optimized omni-scale CNN (O-OS-CNN), which optimally selects the time scale by generating all possible kernel sizes for input time series data; ii) An amplification block to increase the values of anomaly readings, enhancing sensitivity for detecting anomalies. Not only that, but CAV-AD integrates the proposed O-OS-CNN with a Kalman filter to instantly identify the malicious sensors. We extensively train CAV-AD using real-world datasets containing both instant and constant attacks, evaluating its performance in detecting intrusions from multiple anomalies, which presents a more challenging scenario. Our results demonstrate that CAV-AD outperforms state-of-the-art methods, achieving an average accuracy of 98% and an average F1 score of 89\%, while accurately identifying the malicious sensors. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.04831 [pdf, other]

Code Hallucination

Authors: Mirza Masfiqur Rahman, Ashish Kundu

Abstract: Generative models such as large language models are extensively used as code copilots and for whole program generation. However, the programs they generate often have questionable correctness, authenticity and reliability in terms of integration as they might not follow the user requirements, provide incorrect and/or nonsensical outputs, or even contain semantic/syntactic errors - overall known as… ▽ More Generative models such as large language models are extensively used as code copilots and for whole program generation. However, the programs they generate often have questionable correctness, authenticity and reliability in terms of integration as they might not follow the user requirements, provide incorrect and/or nonsensical outputs, or even contain semantic/syntactic errors - overall known as LLM hallucination. In this work, we present several types of code hallucination. We have generated such hallucinated code manually using large language models. We also present a technique - HallTrigger, in order to demonstrate efficient ways of generating arbitrary code hallucination. Our method leverages 3 different dynamic attributes of LLMs to craft prompts that can successfully trigger hallucinations from models without the need to access model architecture or parameters. Results from popular blackbox models suggest that HallTrigger is indeed effective and the pervasive LLM hallucination have sheer impact on software development. △ Less

Submitted 7 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04069 [pdf, other]

A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Authors: Md Tahmid Rahman Laskar, Sawsan Alqahtani, M Saiful Bari, Mizanur Rahman, Mohammad Abdullah Matin Khan, Haidar Khan, Israt Jahan, Amran Bhuiyan, Chee Wei Tan, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty, Jimmy Huang

Abstract: Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple… ▽ More Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03486 [pdf, other]

Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach

Authors: Pronay Debnath, Usafa Akther Rifa, Busra Kamal Rafa, Ali Haider Talukder Akib, Md. Aminur Rahman

Abstract: The scarcity of comprehensive datasets in surveillance, identification, image retrieval systems, and healthcare poses a significant challenge for researchers in exploring new methodologies and advancing knowledge in these respective fields. Furthermore, the need for full-body image datasets with detailed attributes like height, weight, age, and gender is particularly significant in areas such as f… ▽ More The scarcity of comprehensive datasets in surveillance, identification, image retrieval systems, and healthcare poses a significant challenge for researchers in exploring new methodologies and advancing knowledge in these respective fields. Furthermore, the need for full-body image datasets with detailed attributes like height, weight, age, and gender is particularly significant in areas such as fashion industry analytics, ergonomic design assessment, virtual reality avatar creation, and sports performance analysis. To address this gap, we have created the 'Celeb-FBI' dataset which contains 7,211 full-body images of individuals accompanied by detailed information on their height, age, weight, and gender. Following the dataset creation, we proceed with the preprocessing stages, including image cleaning, scaling, and the application of Synthetic Minority Oversampling Technique (SMOTE). Subsequently, utilizing this prepared dataset, we employed three deep learning approaches: Convolutional Neural Network (CNN), 50-layer ResNet, and 16-layer VGG, which are used for estimating height, weight, age, and gender from human full-body images. From the results obtained, ResNet-50 performed best for the system with an accuracy rate of 79.18% for age, 95.43% for gender, 85.60% for height and 81.91% for weight. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted for publication in 3rd International Conference on Advanced Communication and Intelligent Systems

arXiv:2407.01627 [pdf]

Potential Renovation of Information Search Process with the Power of Large Language Model for Healthcare

Authors: Forhan Bin Emdad, Mohammad Ishtiaque Rahman

Abstract: This paper explores the development of the Six Stages of Information Search Model and its enhancement through the application of the Large Language Model (LLM) powered Information Search Processes (ISP) in healthcare. The Six Stages Model, a foundational framework in information science, outlines the sequential phases individuals undergo during information seeking: initiation, selection, explorati… ▽ More This paper explores the development of the Six Stages of Information Search Model and its enhancement through the application of the Large Language Model (LLM) powered Information Search Processes (ISP) in healthcare. The Six Stages Model, a foundational framework in information science, outlines the sequential phases individuals undergo during information seeking: initiation, selection, exploration, formulation, collection, and presentation. Integrating LLM technology into this model significantly optimizes each stage, particularly in healthcare. LLMs enhance query interpretation, streamline information retrieval from complex medical databases, and provide contextually relevant responses, thereby improving the efficiency and accuracy of medical information searches. This fusion not only aids healthcare professionals in accessing critical data swiftly but also empowers patients with reliable and personalized health information, fostering a more informed and effective healthcare environment. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.17143 [pdf]

doi 10.1016/j.nima.2024.169574

A Hardware/Firmware-Based Switching Gate Multiplexing Method for Pulse Mode Radiation Detectors

Authors: Md Faisal Rahman, John Mattingly

Abstract: We present a hardware/firmware-based switching gate multiplexing method for pulse mode radiation detectors that can combine many detector signals into two readout channels. One readout channel passes the signal of the multiplexed detector that "fired" first, and the other channel provides a variable-width logic pulse, i.e., a pulse width modulation (PWM) signal, that identifies the active detector… ▽ More We present a hardware/firmware-based switching gate multiplexing method for pulse mode radiation detectors that can combine many detector signals into two readout channels. One readout channel passes the signal of the multiplexed detector that "fired" first, and the other channel provides a variable-width logic pulse, i.e., a pulse width modulation (PWM) signal, that identifies the active detector. The multiplexed output pulse is produced by passing the first active detector's signal to a fan-in circuit by gating on the corresponding channel for a fixed duration while blocking all other detector signals. It does this using individual analog switches for all the detector signals. Each switch is controlled by a fixed width logic pulse that is triggered by the arrival of the first active detector pulse. Both the fixed width logic pulse and the PWM signal are generated using a field-programmable gate array (FPGA). To demonstrate the proposed multiplexing method, a prototype four-channel multiplexer was developed for use with four NaI(Tl) detectors. The performance of the multiplexer was evaluated in terms of its ability to retain energy resolution, timing resolution, and original pulse shape. The proposed multiplexing method showed very little degradation in energy resolution and timing resolution or alteration of pulse shape. The switching gate feature of the proposed method enables the multiplexer output to have very low noise contribution from the inactive channels. This multiplexing technique also has the unique capability of isolating and recovering the first active detector's output pulse in cases where there is overlap between pulses from different detectors in a single digitized record. These features make the proposed hardware/firmware-based switching gate multiplexing method very promising for application to large radiation detector networks. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 18 pages, 27 figures

arXiv:2406.10708 [pdf, other]

MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

Authors: M. Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Perry Wang, Peizhao Li, Adriano Cardace, Petros Boufounos

Abstract: Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject s… ▽ More Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject setting, with an emphasis on the diversity of environment and subjects. Referred to as the millimeter-wave multi-view radar (MMVR) dataset, it consists of $345$K multi-view radar frames collected from $25$ human subjects over $6$ different rooms, $446$K annotated bounding boxes/segmentation instances, and $7.59$ million annotated keypoints to support three major perception tasks of object detection, pose estimation, and instance segmentation, respectively. For each task, we report performance benchmarks under two protocols: a single subject in an open space and multiple subjects in several cluttered rooms with two data splits: random split and cross-environment split over $395$ 1-min data segments. We anticipate that MMVR facilitates indoor radar perception development for indoor vehicle (robot/humanoid) navigation, building energy management, and elderly care for better efficiency, user experience, and safety. The MMVR dataset is available at https://doi.org/10.5281/zenodo.12611978. △ Less

Submitted 17 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: 26 pages, 25 figures, 10 tables; See https://doi.org/10.5281/zenodo.12611978 to access the MMVR dataset

arXiv:2406.10688 [pdf]

doi 10.1021/acsphotonics.4c01099

Integration of Programmable Diffraction with Digital Neural Networks

Authors: Md Sadman Sakib Rahman, Aydogan Ozcan

Abstract: Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural network… ▽ More Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural networks, there have been efforts to establish diffractive processors that are jointly optimized with digital neural networks serving as their back-end. These jointly optimized hybrid (optical+digital) processors establish a new "diffractive language" between input electromagnetic waves that carry analog information and neural networks that process the digitized information at the back-end, providing the best of both worlds. Such hybrid designs can process spatially and temporally coherent, partially coherent, or incoherent input waves, providing universal coverage for any spatially varying set of point spread functions that can be optimized for a given task, executed in collaboration with digital neural networks. In this article, we highlight the utility of this exciting collaboration between engineered and programmed diffraction and digital neural networks for a diverse range of applications. We survey some of the major innovations enabled by the push-pull relationship between analog wave processing and digital neural networks, also covering the significant benefits that could be reaped through the synergy between these two complementary paradigms. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 30 Pages, 6 Figures

Journal ref: ACS Photonics (2024)

arXiv:2406.09749 [pdf, other]

Substrate$-$bias driven Sputter deposited $β-$phase dominated Tungsten film for Spintronic applications

Authors: Abhay Singh Rajawat, Naim Ahmad, Risvana Nasril, Tasneem Sheikh, Mohammad Muhiuddin, A kumar, Mohammad R Rahman, Waseem Akhtar

Abstract: $β$-Tungsten ($β$-W), a A15 cubic phase of Tungsten exhibits giant spin hall angle as compared to its bcc-phase $α$-Tungsten ($α$-W), making high quality $β$-W film desirable for spin-based application. We report on the substrate bias driven on-demand growth of $β$-W film on SiO$_2$ coated silicon (SiO$_2… ▽ More $β$-Tungsten ($β$-W), a A15 cubic phase of Tungsten exhibits giant spin hall angle as compared to its bcc-phase $α$-Tungsten ($α$-W), making high quality $β$-W film desirable for spin-based application. We report on the substrate bias driven on-demand growth of $β$-W film on SiO$_2$ coated silicon (SiO$_2$/Si) using DC sputtering. GIXRD plots and SEM images are used to show a systematic change on the structure and grain size of the deposited films with the application of substrate bias. It is observed that zero bias film are amorphous in nature and changes phase from $α$ to $β$ or mixed ($α$ + $β$) depending upon the sign and magnitude of the substrate bias. We performed One-Dimensional Power spectrum density of the AFM images which revealed that the pure $β$-W film grown at a positive bias of +50V has the minimum roughness as compared to films grown at different substrate bias. We further confirm the metallic surface homogeneity using the room temperature STM. Our results shows that the substrate bias which controls the energy of the deposited atom, is a crucial parameter for an on demand growth of $β$-W, an important material for spintronic applications. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures

arXiv:2406.09155 [pdf, other]

DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation

Authors: A B M Ashikur Rahman, Saeed Anwar, Muhammad Usman, Ajmal Mian

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities, revolutionizing the integration of AI in daily life applications. However, they are prone to hallucinations, generating claims that contradict established facts, deviating from prompts, and producing inconsistent responses when the same prompt is presented multiple times. Addressing these issues is challenging due to the lack… ▽ More Large Language Models (LLMs) have demonstrated remarkable capabilities, revolutionizing the integration of AI in daily life applications. However, they are prone to hallucinations, generating claims that contradict established facts, deviating from prompts, and producing inconsistent responses when the same prompt is presented multiple times. Addressing these issues is challenging due to the lack of comprehensive and easily assessable benchmark datasets. Most existing datasets are small and rely on multiple-choice questions, which are inadequate for evaluating the generative prowess of LLMs. To measure hallucination in LLMs, this paper introduces a comprehensive benchmark dataset comprising over 75,000 prompts across eight domains. These prompts are designed to elicit definitive, concise, and informative answers. The dataset is divided into two segments: one publicly available for testing and assessing LLM performance and a hidden segment for benchmarking various LLMs. In our experiments, we tested six LLMs-GPT-3.5, LLama 2, LLama 3, Gemini, Mixtral, and Zephyr-revealing that overall factual hallucination ranges from 59% to 82% on the public dataset and 57% to 76% in the hidden benchmark. Prompt misalignment hallucination ranges from 6% to 95% in the public dataset and 17% to 94% in the hidden counterpart. Average consistency ranges from 21% to 61% and 22% to 63%, respectively. Domain-wise analysis shows that LLM performance significantly deteriorates when asked for specific numeric information while performing moderately with person, location, and date queries. Our dataset demonstrates its efficacy and serves as a comprehensive benchmark for LLM performance evaluation. Our dataset and LLMs responses are available at \href{https://github.com/ashikiut/DefAn}{https://github.com/ashikiut/DefAn}. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08534 [pdf, ps, other]

Optimizing Container Loading and Unloading through Dual-Cycling and Dockyard Rehandle Reduction Using a Hybrid Genetic Algorithm

Authors: Md. Mahfuzur Rahman, Md Abrar Jahin, Md. Saiful Islam, M. F. Mridha

Abstract: This paper addresses the optimization of container unloading and loading operations at ports, integrating quay-crane dual-cycling with dockyard rehandle minimization. We present a unified model encompassing both operations: ship container unloading and loading by quay crane, and the other is reducing dockyard rehandles while loading the ship. We recognize that optimizing one aspect in isolation ca… ▽ More This paper addresses the optimization of container unloading and loading operations at ports, integrating quay-crane dual-cycling with dockyard rehandle minimization. We present a unified model encompassing both operations: ship container unloading and loading by quay crane, and the other is reducing dockyard rehandles while loading the ship. We recognize that optimizing one aspect in isolation can lead to suboptimal outcomes due to interdependencies. Specifically, optimizing unloading sequences for minimal operation time may inadvertently increase dockyard rehandles during loading and vice versa. To address this NP-hard problem, we propose a hybrid genetic algorithm (GA) QCDC-DR-GA comprising one-dimensional and two-dimensional GA components. Our model, QCDC-DR-GA, consistently outperforms four state-of-the-art methods in maximizing dual cycles and minimizing dockyard rehandles. Compared to those methods, it reduced 15-20% of total operation time for large vessels. Statistical validation through a two-tailed paired t-test confirms the superiority of QCDC-DR-GA at a 5% significance level. The approach effectively combines QCDC optimization with dockyard rehandle minimization, optimizing the total unloading-loading time. Results underscore the inefficiency of separately optimizing QCDC and dockyard rehandles. Fragmented approaches, such as QCDC Scheduling Optimized by bi-level GA and GA-ILSRS (Scenario 2), show limited improvement compared to QCDC-DR-GA. As in GA-ILSRS (Scenario 1), neglecting dual-cycle optimization leads to inferior performance than QCDC-DR-GA. This emphasizes the necessity of simultaneously considering both aspects for optimal resource utilization and overall operational efficiency. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07710 [pdf, other]

Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas

Authors: SM Shaqib, Alaya Parvin Alo, Shahriar Sultan Ramit, Afraz Ul Haque Rupak, Sadman Sadik Khan, Md. Sadekur Rahman

Abstract: In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in… ▽ More In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in 2023 that 7,902 individuals lost their lives in traffic accidents during the course of the year. Efficient vehicle speed detection is essential to maintaining traffic safety. Reliable speed detection can also help gather important traffic data, which makes it easier to optimize traffic flow and provide safer road infrastructure. The YOLOv8 model can recognize and track cars in videos with greater speed and accuracy when trained under close supervision. By providing insights into the application of supervised learning in object identification for vehicle speed estimation and concentrating on the particular traffic conditions and safety concerns in Bangladesh, this work represents a noteworthy contribution to the area. The MAE was 3.5 and RMSE was 4.22 between the predicted speed of our model and the actual speed or the ground truth measured by the speedometer Promising increased efficiency and wider applicability in a variety of traffic conditions, the suggested solution offers a financially viable substitute for conventional approaches. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05912 [pdf]

BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Authors: Ovi Paul, Abu Bakar Siddik Nayem, Anis Sarker, Amin Ahsan Ali, M Ashraful Amin, AKM Mahbubur Rahman

Abstract: Land Use Land Cover (LULC) analysis on satellite images using deep learning-based methods is significantly helpful in understanding the geography, socio-economic conditions, poverty levels, and urban sprawl in developing countries. Recent works involve segmentation with LULC classes such as farmland, built-up areas, forests, meadows, water bodies, etc. Training deep learning methods on satellite i… ▽ More Land Use Land Cover (LULC) analysis on satellite images using deep learning-based methods is significantly helpful in understanding the geography, socio-economic conditions, poverty levels, and urban sprawl in developing countries. Recent works involve segmentation with LULC classes such as farmland, built-up areas, forests, meadows, water bodies, etc. Training deep learning methods on satellite images requires large sets of images annotated with LULC classes. However, annotated data for developing countries are scarce due to a lack of funding, absence of dedicated residential/industrial/economic zones, a large population, and diverse building materials. BD-SAT provides a high-resolution dataset that includes pixel-by-pixel LULC annotations for Dhaka metropolitan city and surrounding rural/urban areas. Using a strict and standardized procedure, the ground truth is created using Bing satellite imagery with a ground spatial distance of 2.22 meters per pixel. A three-stage, well-defined annotation process has been followed with support from GIS experts to ensure the reliability of the annotations. We performed several experiments to establish benchmark results. The results show that the annotated BD-SAT is sufficient to train large deep learning models with adequate accuracy for five major LULC classes: forest, farmland, built-up areas, water bodies, and meadows. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 26 pages, 15 figures and 12 tables

arXiv:2406.05151 [pdf]

CredSec: A Blockchain-based Secure Credential Management System for University Adoption

Authors: Md. Ahsan Habib, Md. Mostafijur Rahman, Nieb Hasan Neom

Abstract: University education play a critical role in shaping intellectual and professional development of the individuals and contribute significantly to the advancement of knowledge and society. Generally, university authority has a direct control of students result making and stores the credential in their local dedicated server. So, there is chance to alter the credential and also have a very high poss… ▽ More University education play a critical role in shaping intellectual and professional development of the individuals and contribute significantly to the advancement of knowledge and society. Generally, university authority has a direct control of students result making and stores the credential in their local dedicated server. So, there is chance to alter the credential and also have a very high possibility to encounter various threats and different security attacks. To resolve these, we propose a blockchain based secure credential management system (BCMS) for efficiently storing, managing and recovering credential without involving the university authority. The proposed BCMS incorporates a modified two factor encryption (m2FE) technique, a combination of RSA cryptosystem and a DNA encoding to ensure credential privacy and an enhanced authentication scheme for teachers and students. Besides, to reduce size of the cipher credential and its conversion time, we use character to integer (C2I) table instead of ASCII table. Finally, the experimental result and analysis of the BCMS illustrate the effectiveness over state of the art works. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 7 figures, 3 tables

arXiv:2406.04220 [pdf, other]

BEADs: Bias Evaluation Across Domains

Authors: Shaina Raza, Mizanur Rahman, Michael R. Zhang

Abstract: Recent improvements in large language models (LLMs) have significantly enhanced natural language processing (NLP) applications. However, these models can also inherit and perpetuate biases from their training data. Addressing this issue is crucial, yet many existing datasets do not offer evaluation across diverse NLP tasks. To tackle this, we introduce the Bias Evaluations Across Domains (BEADs) d… ▽ More Recent improvements in large language models (LLMs) have significantly enhanced natural language processing (NLP) applications. However, these models can also inherit and perpetuate biases from their training data. Addressing this issue is crucial, yet many existing datasets do not offer evaluation across diverse NLP tasks. To tackle this, we introduce the Bias Evaluations Across Domains (BEADs) dataset, designed to support a wide range of NLP tasks, including text classification, bias entity recognition, bias quantification, and benign language generation. BEADs uses AI-driven annotation combined with experts' verification to provide reliable labels. This method overcomes the limitations of existing datasets that typically depend on crowd-sourcing, expert-only annotations with limited bias evaluations, or unverified AI labeling. Our empirical analysis shows that BEADs is effective in detecting and reducing biases across different language models, with smaller models fine-tuned on BEADs often outperforming LLMs in bias classification tasks. However, these models may still exhibit biases towards certain demographics. Fine-tuning LLMs with our benign language data also reduces biases while preserving the models' knowledge. Our findings highlight the importance of comprehensive bias evaluation and the potential of targeted fine-tuning for reducing the bias of LLMs. We are making BEADs publicly available at https://huggingface.co/datasets/shainar/BEAD Warning: This paper contains examples that may be considered offensive. △ Less

Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: under review

arXiv:2406.02446 [pdf]

On a Modified Directional Distortional Hardening Model for Metal Plasticity

Authors: Md Mahmudur Rahman, Md Mahmudul Hasan Pathik, Nazrul Islam

Abstract: This study proposes a modification of the yield condition that overcomes the mathematical constraints of the Directional Distortional Hardening models developed by Feigenbaum and Dafalias. This modified model surpasses the mathematical inconsistency of the complete model and the limitation of the r-model of Feigenbaum and Dafalias. In the complete model, the inconsistency of distortional term in t… ▽ More This study proposes a modification of the yield condition that overcomes the mathematical constraints of the Directional Distortional Hardening models developed by Feigenbaum and Dafalias. This modified model surpasses the mathematical inconsistency of the complete model and the limitation of the r-model of Feigenbaum and Dafalias. In the complete model, the inconsistency of distortional term in the yield surface and the plastic part of free energy appears in the absence of kinematic hardening. In addition, the simplified r-model fails to capture the flattening of the yield surface in the opposite direction of loading due to the absence of a fourth-order internal variable in the distortional term. Hence, this study incorporates a decoupled distortional hardening term in the equation of yield surface that makes it possible to simultaneously capture the flattening of the yield surface and allows isotropic hardening with distortion, even in the absence of kinematic hardening. The mathematical formulation of the proposed modified model is also developed, which sets the stage for future experimental investigations and validation efforts. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages

arXiv:2406.00367 [pdf, other]

RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis

Authors: Md. Mostafizer Rahman, Ariful Islam Shiplu, Yutaka Watanobe, Md. Ashad Alam

Abstract: Effectively analyzing the comments to uncover latent intentions holds immense value in making strategic decisions across various domains. However, several challenges hinder the process of sentiment analysis including the lexical diversity exhibited in comments, the presence of long dependencies within the text, encountering unknown symbols and words, and dealing with imbalanced datasets. Moreover,… ▽ More Effectively analyzing the comments to uncover latent intentions holds immense value in making strategic decisions across various domains. However, several challenges hinder the process of sentiment analysis including the lexical diversity exhibited in comments, the presence of long dependencies within the text, encountering unknown symbols and words, and dealing with imbalanced datasets. Moreover, existing sentiment analysis tasks mostly leveraged sequential models to encode the long dependent texts and it requires longer execution time as it processes the text sequentially. In contrast, the Transformer requires less execution time due to its parallel processing nature. In this work, we introduce a novel hybrid deep learning model, RoBERTa-BiLSTM, which combines the Robustly Optimized BERT Pretraining Approach (RoBERTa) with Bidirectional Long Short-Term Memory (BiLSTM) networks. RoBERTa is utilized to generate meaningful word embedding vectors, while BiLSTM effectively captures the contextual semantics of long-dependent texts. The RoBERTa-BiLSTM hybrid model leverages the strengths of both sequential and Transformer models to enhance performance in sentiment analysis. We conducted experiments using datasets from IMDb, Twitter US Airline, and Sentiment140 to evaluate the proposed model against existing state-of-the-art methods. Our experimental findings demonstrate that the RoBERTa-BiLSTM model surpasses baseline models (e.g., BERT, RoBERTa-base, RoBERTa-GRU, and RoBERTa-LSTM), achieving accuracies of 80.74%, 92.36%, and 82.25% on the Twitter US Airline, IMDb, and Sentiment140 datasets, respectively. Additionally, the model achieves F1-scores of 80.73%, 92.35%, and 82.25% on the same datasets, respectively. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.16740 [pdf, other]

PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation

Authors: Md Mostafijur Rahman, Mustafa Munir, Debesh Jha, Ulas Bagci, Radu Marculescu

Abstract: The Segment Anything Model (SAM), originally designed for general-purpose segmentation tasks, has been used recently for polyp segmentation. Nonetheless, fine-tuning SAM with data from new imaging centers or clinics poses significant challenges. This is because this necessitates the creation of an expensive and time-intensive annotated dataset, along with the potential for variability in user prom… ▽ More The Segment Anything Model (SAM), originally designed for general-purpose segmentation tasks, has been used recently for polyp segmentation. Nonetheless, fine-tuning SAM with data from new imaging centers or clinics poses significant challenges. This is because this necessitates the creation of an expensive and time-intensive annotated dataset, along with the potential for variability in user prompts during inference. To address these issues, we propose a robust fine-tuning technique, PP-SAM, that allows SAM to adapt to the polyp segmentation task with limited images. To this end, we utilize variable perturbed bounding box prompts (BBP) to enrich the learning context and enhance the model's robustness to BBP perturbations during inference. Rigorous experiments on polyp segmentation benchmarks reveal that our variable BBP perturbation significantly improves model resilience. Notably, on Kvasir, 1-shot fine-tuning boosts the DICE score by 20% and 37% with 50 and 100-pixel BBP perturbations during inference, respectively. Moreover, our experiments show that 1-shot, 5-shot, and 10-shot PP-SAM with 50-pixel perturbations during inference outperform a recent state-of-the-art (SOTA) polyp segmentation method by 26%, 7%, and 5% DICE scores, respectively. Our results motivate the broader applicability of our PP-SAM for other medical imaging tasks with limited samples. Our implementation is available at https://github.com/SLDGroup/PP-SAM. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 7 pages, 9 figures, Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

arXiv:2405.15026 [pdf, other]

Enhancing Student Feedback Using Predictive Models in Visual Literacy Courses

Authors: Alon Friedman, Kevin Hawley, Paul Rosen, Md Dilshadur Rahman

Abstract: Peer review is a popular feedback mechanism in higher education that actively engages students and provides researchers with a means to assess student engagement. However, there is little empirical support for the durability of peer review, particularly when using data predictive modeling to analyze student comments. This study uses Naïve Bayes modeling to analyze peer review data obtained from an… ▽ More Peer review is a popular feedback mechanism in higher education that actively engages students and provides researchers with a means to assess student engagement. However, there is little empirical support for the durability of peer review, particularly when using data predictive modeling to analyze student comments. This study uses Naïve Bayes modeling to analyze peer review data obtained from an undergraduate visual literacy course over five years. We expand on the research of Friedman and Rosen and Beasley et al. by focusing on the Naïve Bayes model of students' remarks. Our findings highlight the utility of Naïve Bayes modeling, particularly in the analysis of student comments based on parts of speech, where nouns emerged as the prominent category. Additionally, when examining students' comments using the visual peer review rubric, the lie factor emerged as the predominant factor. Comparing Naïve Bayes model to Beasley's approach, we found both help instructors map directions taken in the class, but the Naïve Bayes model provides a more specific outline for forecasting with a more detailed framework for identifying core topics within the course, enhancing the forecasting of educational directions. Through the application of the Holdout Method and $\mathrm{k}$-fold cross-validation with continuity correction, we have validated the model's predictive accuracy, underscoring its effectiveness in offering deep insights into peer review mechanisms. Our study findings suggest that using predictive modeling to assess student comments can provide a new way to better serve the students' classroom comments on their visual peer work. This can benefit courses by inspiring changes to course content, reinforcement of course content, modification of projects, or modifications to the rubric itself. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 8 pages, 6 figures, IEEE EDUCON 2024 conference

arXiv:2405.12550 [pdf, other]

Blockchain-based AI Methods for Managing Industrial IoT: Recent Developments, Integration Challenges and Opportunities

Authors: Anichur Rahman, Dipanjali Kundu, Tanoy Debnath, Muaz Rahman, Airin Afroj Aishi, Jahidul Islam

Abstract: Currently, Blockchain (BC), Artificial Intelligence (AI), and smart Industrial Internet of Things (IIoT) are not only leading promising technologies in the world, but also these technologies facilitate the current society to develop the standard of living and make it easier for users. However, these technologies have been applied in various domains for different purposes. Then, these are successfu… ▽ More Currently, Blockchain (BC), Artificial Intelligence (AI), and smart Industrial Internet of Things (IIoT) are not only leading promising technologies in the world, but also these technologies facilitate the current society to develop the standard of living and make it easier for users. However, these technologies have been applied in various domains for different purposes. Then, these are successfully assisted in developing the desired system, such as-smart cities, homes, manufacturers, education, and industries. Moreover, these technologies need to consider various issues-security, privacy, confidentiality, scalability, and application challenges in diverse fields. In this context, with the increasing demand for these issues solutions, the authors present a comprehensive survey on the AI approaches with BC in the smart IIoT. Firstly, we focus on state-of-the-art overviews regarding AI, BC, and smart IoT applications. Then, we provide the benefits of integrating these technologies and discuss the established methods, tools, and strategies efficiently. Most importantly, we highlight the various issues--security, stability, scalability, and confidentiality and guide the way of addressing strategy and methods. Furthermore, the individual and collaborative benefits of applications have been discussed. Lastly, we are extensively concerned about the open research challenges and potential future guidelines based on BC-based AI approaches in the intelligent IIoT system. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.11188 [pdf, other]

Wind Power Prediction across Different Locations using Deep Domain Adaptive Learning

Authors: Md Saiful Islam Sajol, Md Shazid Islam, A S M Jahid Hasan, Md Saydur Rahman, Jubair Yusuf

Abstract: Accurate prediction of wind power is essential for the grid integration of this intermittent renewable source and aiding grid planners in forecasting available wind capacity. Spatial differences lead to discrepancies in climatological data distributions between two geographically dispersed regions, consequently making the prediction task more difficult. Thus, a prediction model that learns from th… ▽ More Accurate prediction of wind power is essential for the grid integration of this intermittent renewable source and aiding grid planners in forecasting available wind capacity. Spatial differences lead to discrepancies in climatological data distributions between two geographically dispersed regions, consequently making the prediction task more difficult. Thus, a prediction model that learns from the data of a particular climatic region can suffer from being less robust. A deep neural network (DNN) based domain adaptive approach is proposed to counter this drawback. Effective weather features from a large set of weather parameters are selected using a random forest approach. A pre-trained model from the source domain is utilized to perform the prediction task, assuming no source data is available during target domain prediction. The weights of only the last few layers of the DNN model are updated throughout the task, keeping the rest of the network unchanged, making the model faster compared to the traditional approaches. The proposed approach demonstrates higher accuracy ranging from 6.14% to even 28.44% compared to the traditional non-adaptive method. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.09458 [pdf, other]

Non-contact Lung Disease Classification via OFDM-based Passive 6G ISAC Sensing

Authors: Hasan Mujtaba Buttar, Muhammad Mahboob Ur Rahman, Muhammad Wasim Nawaz, Adnan Noor Mian, Adnan Zahid, Qammer H. Abbasi

Abstract: This paper is the first to present a novel, non-contact method that utilizes orthogonal frequency division multiplexing (OFDM) signals (of frequency 5.23 GHz, emitted by a software defined radio) to radio-expose the pulmonary patients in order to differentiate between five prevalent respiratory diseases, i.e., Asthma, Chronic obstructive pulmonary disease (COPD), Interstitial lung disease (ILD), P… ▽ More This paper is the first to present a novel, non-contact method that utilizes orthogonal frequency division multiplexing (OFDM) signals (of frequency 5.23 GHz, emitted by a software defined radio) to radio-expose the pulmonary patients in order to differentiate between five prevalent respiratory diseases, i.e., Asthma, Chronic obstructive pulmonary disease (COPD), Interstitial lung disease (ILD), Pneumonia (PN), and Tuberculosis (TB). The fact that each pulmonary disease leads to a distinct breathing pattern, and thus modulates the OFDM signal in a different way, motivates us to acquire OFDM-Breathe dataset, first of its kind. It consists of 13,920 seconds of raw RF data (at 64 distinct OFDM frequencies) that we have acquired from a total of 116 subjects in a hospital setting (25 healthy control subjects, and 91 pulmonary patients). Among the 91 patients, 25 have Asthma, 25 have COPD, 25 have TB, 5 have ILD, and 11 have PN. We implement a number of machine and deep learning models in order to do lung disease classification using OFDM-Breathe dataset. The vanilla convolutional neural network outperforms all the models with an accuracy of 97%, and stands out in terms of precision, recall, and F1-score. The ablation study reveals that it is sufficient to radio-observe the human chest on seven different microwave frequencies only, in order to make a reliable diagnosis (with 96% accuracy) of the underlying lung disease. This corresponds to a sensing overhead that is merely 10.93% of the allocated bandwidth. This points to the feasibility of 6G integrated sensing and communication (ISAC) systems of future where 89.07% of bandwidth still remains available for information exchange amidst on-demand health sensing. Through 6G ISAC, this work provides a tool for mass screening for respiratory diseases (e.g., COVID-19) at public places. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: submitted to a journal, 12 pages, 5 figures, 5 tables

arXiv:2405.09016 [pdf]

IoT-enabled Stability Chamber for the Pharmaceutical Industry

Authors: Nitol Saha, Md Masruk Aulia, Dibakar Das, Md. Mostafizur Rahman

Abstract: A stability chamber is a critical piece of equipment for any pharmaceutical facility to retain the manufactured product for testing the stability and quality of the products over a certain period of time by keeping the products in different sets of environmental conditions. In this paper, we proposed an IoT-enabled stability chamber for the pharmaceutical industry. We developed four stability cham… ▽ More A stability chamber is a critical piece of equipment for any pharmaceutical facility to retain the manufactured product for testing the stability and quality of the products over a certain period of time by keeping the products in different sets of environmental conditions. In this paper, we proposed an IoT-enabled stability chamber for the pharmaceutical industry. We developed four stability chambers by using the existing utilities of a manufacturing facility. The state-of-the-art automatic PID controlling system of Siemens S7-1200 PLC was used to control each chamber. PC-based Siemens WinCC Runtime Advanced visualization platform was used to visualize the data of the chamber which is FDA 21 CFR Part 11 Compliant. Additionally, an Internet of Things-based (IoT-based) application was also developed to monitor the sensor's data remotely using any client application. △ Less

Submitted 21 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.06880 [pdf, other]

EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation

Authors: Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu

Abstract: An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficienc… ▽ More An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroup/EMCAD. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2405.06849 [pdf, other]

GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs

Authors: Mustafa Munir, William Avery, Md Mostafijur Rahman, Radu Marculescu

Abstract: Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made… ▽ More Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made within an image. Additionally, we propose a novel CNN-GNN architecture, GreedyViG, which uses DAGC. Extensive experiments show that GreedyViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification, object detection, instance segmentation, and semantic segmentation tasks. Our smallest model, GreedyViG-S, achieves 81.1% top-1 accuracy on ImageNet-1K, 2.9% higher than Vision GNN and 2.2% higher than Vision HyperGraph Neural Network (ViHGNN), with less GMACs and a similar number of parameters. Our largest model, GreedyViG-B obtains 83.9% top-1 accuracy, 0.2% higher than Vision GNN, with a 66.6% decrease in parameters and a 69% decrease in GMACs. GreedyViG-B also obtains the same accuracy as ViHGNN with a 67.3% decrease in parameters and a 71.3% decrease in GMACs. Our work shows that hybrid CNN-GNN architectures not only provide a new avenue for designing efficient models, but that they can also exceed the performance of current state-of-the-art models. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2405.06242 [pdf, other]

Impedance vs. Power Side-channel Vulnerabilities: A Comparative Study

Authors: Md Sadik Awal, Buddhipriya Gayanath, Md Tauhidur Rahman

Abstract: In recent times, impedance side-channel analysis has emerged as a potent strategy for adversaries seeking to extract sensitive information from computing systems. It leverages variations in the intrinsic impedance of a chip's internal structure across different logic states. In this study, we conduct a comparative analysis between the newly explored impedance side channel and the well-established… ▽ More In recent times, impedance side-channel analysis has emerged as a potent strategy for adversaries seeking to extract sensitive information from computing systems. It leverages variations in the intrinsic impedance of a chip's internal structure across different logic states. In this study, we conduct a comparative analysis between the newly explored impedance side channel and the well-established power side channel. Through experimental evaluation, we investigate the efficacy of these two side channels in extracting the cryptographic key from the Advanced Encryption Standard (AES) and analyze their performance. Our results indicate that impedance analysis demonstrates a higher potential for cryptographic key extraction compared to power side-channel analysis. Moreover, we identify scenarios where power side-channel analysis does not yield satisfactory results, whereas impedance analysis proves to be more robust and effective. This work not only underscores the significance of impedance side-channel analysis in enhancing cryptographic security but also emphasizes the necessity for a deeper understanding of its mechanisms and implications. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06166 [pdf, other]

MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation

Authors: Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas, Gorkem Durak, Matthew Antalek, Zheyuan Zhang, Bin Wang, Md Mostafijur Rahman, Hongyi Pan, Alpay Medetalibeyoglu, Yury Velichko, Daniela Ladner, Amir Borhani, Ulas Bagci

Abstract: Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple di… ▽ More Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple different decoder networks. Each decoder network is connected to a different part of the encoder via a multi-scale feature enhancement dilated block. With each decoder, we increase the depth of the network iteratively and refine segmentation masks, enriching feature maps by integrating previous decoders' feature maps. To refine the feature map further, we also utilize the predicted masks from the previous decoder to the current decoder to provide spatial attention across foreground and background regions. MDNet effectively refines the segmentation mask with a high dice similarity coefficient (DSC) of 0.9013 and 0.9169 on the Liver Tumor segmentation (LiTS) and MSD Spleen datasets. Additionally, it reduces Hausdorff distance (HD) to 3.79 for the LiTS dataset and 2.26 for the spleen segmentation dataset, underscoring the precision of MDNet in capturing the complex contours. Moreover, \textit{\ac{MDNet}} is more interpretable and robust compared to the other baseline models. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Showing 1–50 of 1,157 results for author: Rahman, M