Search | arXiv e-print repository

Cognitively Inspired Energy-Based World Models

Authors: Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Aman Chadha, Jundong Li, Tariq Iqbal

Abstract: One of the predominant methods for training world models is autoregressive prediction in the output space of the next element of a sequence. In Natural Language Processing (NLP), this takes the form of Large Language Models (LLMs) predicting the next token; in Computer Vision (CV), this takes the form of autoregressive models predicting the next frame/token/pixel. However, this approach differs fr… ▽ More One of the predominant methods for training world models is autoregressive prediction in the output space of the next element of a sequence. In Natural Language Processing (NLP), this takes the form of Large Language Models (LLMs) predicting the next token; in Computer Vision (CV), this takes the form of autoregressive models predicting the next frame/token/pixel. However, this approach differs from human cognition in several respects. First, human predictions about the future actively influence internal cognitive processes. Second, humans naturally evaluate the plausibility of predictions regarding future states. Based on this capability, and third, by assessing when predictions are sufficient, humans allocate a dynamic amount of time to make a prediction. This adaptive process is analogous to System 2 thinking in psychology. All these capabilities are fundamental to the success of humans at high-level reasoning and planning. Therefore, to address the limitations of traditional autoregressive models lacking these human-like capabilities, we introduce Energy-Based World Models (EBWM). EBWM involves training an Energy-Based Model (EBM) to predict the compatibility of a given context and a predicted future state. In doing so, EBWM enables models to achieve all three facets of human cognition described. Moreover, we developed a variant of the traditional autoregressive transformer tailored for Energy-Based models, termed the Energy-Based Transformer (EBT). Our results demonstrate that EBWM scales better with data and GPU Hours than traditional autoregressive transformers in CV, and that EBWM offers promising early scaling in NLP. Consequently, this approach offers an exciting path toward training future models capable of System 2 thinking and intelligently searching across state spaces. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 23 pages, 6 figures

arXiv:2406.02450 [pdf, other]

A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies

Authors: Md Mirajul Islam, Xi Yang, John Hostetter, Adittya Soukarjya Saha, Min Chi

Abstract: A key challenge in e-learning environments like Intelligent Tutoring Systems (ITSs) is to induce effective pedagogical policies efficiently. While Deep Reinforcement Learning (DRL) often suffers from sample inefficiency and reward function design difficulty, Apprenticeship Learning(AL) algorithms can overcome them. However, most AL algorithms can not handle heterogeneity as they assume all demonst… ▽ More A key challenge in e-learning environments like Intelligent Tutoring Systems (ITSs) is to induce effective pedagogical policies efficiently. While Deep Reinforcement Learning (DRL) often suffers from sample inefficiency and reward function design difficulty, Apprenticeship Learning(AL) algorithms can overcome them. However, most AL algorithms can not handle heterogeneity as they assume all demonstrations are generated with a homogeneous policy driven by a single reward function. Still, some AL algorithms which consider heterogeneity, often can not generalize to large continuous state space and only work with discrete states. In this paper, we propose an expectation-maximization(EM)-EDM, a general AL framework to induce effective pedagogical policies from given optimal or near-optimal demonstrations, which are assumed to be driven by heterogeneous reward functions. We compare the effectiveness of the policies induced by our proposed EM-EDM against four AL-based baselines and two policies induced by DRL on two different but related tasks that involve pedagogical action prediction. Our overall results showed that, for both tasks, EM-EDM outperforms the four AL baselines across all performance metrics and the two DRL baselines. This suggests that EM-EDM can effectively model complex student pedagogical decision-making processes through the ability to manage a large, continuous state space and adapt to handle diverse and heterogeneous reward functions with very few given demonstrations. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.06667 [pdf, other]

Sentiment Polarity Analysis of Bangla Food Reviews Using Machine and Deep Learning Algorithms

Authors: Al Amin, Anik Sarkar, Md Mahamodul Islam, Asif Ahammad Miazee, Md Robiul Islam, Md Mahmudul Hoque

Abstract: The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to… ▽ More The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to have meals delivered to their residences. Although there are numerous methods for ordering food, customers sometimes experience disappointment with the food they receive. Our endeavor was to establish a model that could determine if food is of good or poor quality. We compiled an extensive dataset of over 1484 online reviews from prominent food ordering platforms, including Food Panda and HungryNaki. Leveraging the collected data, a rigorous assessment of various deep learning and machine learning techniques was performed to determine the most accurate approach for predicting food quality. Out of all the algorithms evaluated, logistic regression emerged as the most accurate, achieving an impressive 90.91% accuracy. The review offers valuable insights that will guide the user in deciding whether or not to order the food. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.17960 [pdf, other]

PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis

Authors: Md Robiul Islam, Md Mahamodul Islam, Mst. Suraiya Afrin, Anika Antara, Nujhat Tabassum, Al Amin

Abstract: Cybersecurity is one of the global issues because of the extensive dependence on cyber systems of individuals, industries, and organizations. Among the cyber attacks, phishing is increasing tremendously and affecting the global economy. Therefore, this phenomenon highlights the vital need for enhancing user awareness and robust support at both individual and organizational levels. Phishing URL ide… ▽ More Cybersecurity is one of the global issues because of the extensive dependence on cyber systems of individuals, industries, and organizations. Among the cyber attacks, phishing is increasing tremendously and affecting the global economy. Therefore, this phenomenon highlights the vital need for enhancing user awareness and robust support at both individual and organizational levels. Phishing URL identification is the best way to address the problem. Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs. However, these approaches often need more convincing accuracy and rely on datasets consisting of limited samples. Furthermore, these black box intelligent models decision to detect suspicious URLs needs proper explanation to understand the features affecting the output. To address the issues, we propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data. The proposed model outperforms existing works by attaining an accuracy of 99.85%. Additionally, our explainability analysis highlights certain features that significantly contribute to identifying the phishing URL. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 6 pages

arXiv:2404.03606 [pdf, other]

Analyzing Musical Characteristics of National Anthems in Relation to Global Indices

Authors: S M Rakib Hasan, Aakar Dhakal, Ms. Ayesha Siddiqua, Mohammad Mominur Rahman, Md Maidul Islam, Mohammed Arfat Raihan Chowdhury, S M Masfequier Rahman Swapno, SM Nuruzzaman Nobel

Abstract: Music plays a huge part in shaping peoples' psychology and behavioral patterns. This paper investigates the connection between national anthems and different global indices with computational music analysis and statistical correlation analysis. We analyze national anthem musical data to determine whether certain musical characteristics are associated with peace, happiness, suicide rate, crime rate… ▽ More Music plays a huge part in shaping peoples' psychology and behavioral patterns. This paper investigates the connection between national anthems and different global indices with computational music analysis and statistical correlation analysis. We analyze national anthem musical data to determine whether certain musical characteristics are associated with peace, happiness, suicide rate, crime rate, etc. To achieve this, we collect national anthems from 169 countries and use computational music analysis techniques to extract pitch, tempo, beat, and other pertinent audio features. We then compare these musical characteristics with data on different global indices to ascertain whether a significant correlation exists. Our findings indicate that there may be a correlation between the musical characteristics of national anthems and the indices we investigated. The implications of our findings for music psychology and policymakers interested in promoting social well-being are discussed. This paper emphasizes the potential of musical data analysis in social research and offers a novel perspective on the relationship between music and social indices. The source code and data are made open-access for reproducibility and future research endeavors. It can be accessed at http://bit.ly/na_code. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.18949 [pdf]

An IoT Based Water-Logging Detection System: A Case Study of Dhaka

Authors: Md Manirul Islam, Md. Sadad Mahamud, Umme Salsabil, A. A. M. Mazharul Amin, Samiul Haque Suman

Abstract: With a large number of populations, many problems are rising rapidly in Dhaka, the capital city of Bangladesh. Water-logging is one of the major issues among them. Heavy rainfall, lack of awareness and poor maintenance causes bad sewerage system in the city. As a result, water is overflowed on the roads and sometimes it gets mixed with the drinking water. To overcome this problem, this paper reali… ▽ More With a large number of populations, many problems are rising rapidly in Dhaka, the capital city of Bangladesh. Water-logging is one of the major issues among them. Heavy rainfall, lack of awareness and poor maintenance causes bad sewerage system in the city. As a result, water is overflowed on the roads and sometimes it gets mixed with the drinking water. To overcome this problem, this paper realizes the potential of using Internet of Things to combat water-logging in drainage pipes which are used to move wastes as well as rainwater away from the city. The proposed system will continuously monitor real time water level, water flow and gas level inside the drainage pipe. Moreover, all the monitoring data will be stored in the central database for graphical representation and further analysis. In addition to that if any emergency arises in the drainage system, an alert will be sent directly to the nearest maintenance office. △ Less

Submitted 25 February, 2024; originally announced March 2024.

Comments: Global Conference on Technology and Information Management

arXiv:2403.05353 [pdf, other]

doi 10.1109/ICCIT60459.2023.10441274

Hybridized Convolutional Neural Networks and Long Short-Term Memory for Improved Alzheimer's Disease Diagnosis from MRI Scans

Authors: Maleka Khatun, Md Manowarul Islam, Habibur Rahman Rifat, Md. Shamim Bin Shahid, Md. Alamin Talukder, Md Ashraf Uddin

Abstract: Brain-related diseases are more sensitive than other diseases due to several factors, including the complexity of surgical procedures, high costs, and other challenges. Alzheimer's disease is a common brain disorder that causes memory loss and the shrinking of brain cells. Early detection is critical for providing proper treatment to patients. However, identifying Alzheimer's at an early stage usi… ▽ More Brain-related diseases are more sensitive than other diseases due to several factors, including the complexity of surgical procedures, high costs, and other challenges. Alzheimer's disease is a common brain disorder that causes memory loss and the shrinking of brain cells. Early detection is critical for providing proper treatment to patients. However, identifying Alzheimer's at an early stage using manual scanning of CT or MRI scans is challenging. Therefore, researchers have delved into the exploration of computer-aided systems, employing Machine Learning and Deep Learning methodologies, which entail the training of datasets to detect Alzheimer's disease. This study aims to present a hybrid model that combines a CNN model's feature extraction capabilities with an LSTM model's detection capabilities. This study has applied the transfer learning called VGG16 in the hybrid model to extract features from MRI images. The LSTM detects features between the convolution layer and the fully connected layer. The output layer of the fully connected layer uses the softmax function. The training of the hybrid model involved utilizing the ADNI dataset. The trial findings revealed that the model achieved a level of accuracy of 98.8%, a sensitivity rate of 100%, and a specificity rate of 76%. The proposed hybrid model outperforms its contemporary CNN counterparts, showcasing a superior performance. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted In The 26th International Conference on Computer and Information Technology (ICCIT) On 13-15 December 2023

arXiv:2403.04786 [pdf, other]

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models

Authors: Arijit Ghosh Chowdhury, Md Mofijul Islam, Vaibhav Kumar, Faysal Hossain Shezan, Vaibhav Kumar, Vinija Jain, Aman Chadha

Abstract: Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP), offering transformative capabilities in understanding and generating human-like text. However, with their rising prominence, the security and vulnerability aspects of these models have garnered significant attention. This paper presents a comprehensive survey of the various forms of attacks ta… ▽ More Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP), offering transformative capabilities in understanding and generating human-like text. However, with their rising prominence, the security and vulnerability aspects of these models have garnered significant attention. This paper presents a comprehensive survey of the various forms of attacks targeting LLMs, discussing the nature and mechanisms of these attacks, their potential impacts, and current defense strategies. We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation. The paper also explores the effectiveness of different attack methodologies, the resilience of LLMs against these attacks, and the implications for model integrity and user trust. By examining the latest research, we provide insights into the current landscape of LLM vulnerabilities and defense mechanisms. Our objective is to offer a nuanced understanding of LLM attacks, foster awareness within the AI community, and inspire robust solutions to mitigate these risks in future developments. △ Less

Submitted 23 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.17807 [pdf, other]

Exploring Gene Regulatory Interaction Networks and predicting therapeutic molecules for Hypopharyngeal Cancer and EGFR-mutated lung adenocarcinoma

Authors: Abanti Bhattacharjya, Md Manowarul Islam, Md Ashraf Uddin, Md. Alamin Talukder, AKM Azad, Sunil Aryal, Bikash Kumar Paul, Wahia Tasnim, Muhammad Ali Abdulllah Almoyad, Mohammad Ali Moni

Abstract: With the advent of Information technology, the Bioinformatics research field is becoming increasingly attractive to researchers and academicians. The recent development of various Bioinformatics toolkits has facilitated the rapid processing and analysis of vast quantities of biological data for human perception. Most studies focus on locating two connected diseases and making some observations to… ▽ More With the advent of Information technology, the Bioinformatics research field is becoming increasingly attractive to researchers and academicians. The recent development of various Bioinformatics toolkits has facilitated the rapid processing and analysis of vast quantities of biological data for human perception. Most studies focus on locating two connected diseases and making some observations to construct diverse gene regulatory interaction networks, a forerunner to general drug design for curing illness. For instance, Hypopharyngeal cancer is a disease that is associated with EGFR-mutated lung adenocarcinoma. In this study, we select EGFR-mutated lung adenocarcinoma and Hypopharyngeal cancer by finding the Lung metastases in hypopharyngeal cancer. To conduct this study, we collect Mircorarray datasets from GEO (Gene Expression Omnibus), an online database controlled by NCBI. Differentially expressed genes, common genes, and hub genes between the selected two diseases are detected for the succeeding move. Our research findings have suggested common therapeutic molecules for the selected diseases based on 10 hub genes with the highest interactions according to the degree topology method and the maximum clique centrality (MCC). Our suggested therapeutic molecules will be fruitful for patients with those two diseases simultaneously. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted In The FEBS OPEN BIO (Q2, SCOPUS, SCIE, IF: 2.6, CS: 4.7), Wiley Journal, On FEB 25, 2024

arXiv:2402.13277 [pdf, other]

MLSTL-WSN: Machine Learning-based Intrusion Detection using SMOTETomek in WSNs

Authors: Md. Alamin Talukder, Selina Sharmin, Md Ashraf Uddin, Md Manowarul Islam, Sunil Aryal

Abstract: Wireless Sensor Networks (WSNs) play a pivotal role as infrastructures, encompassing both stationary and mobile sensors. These sensors self-organize and establish multi-hop connections for communication, collectively sensing, gathering, processing, and transmitting data about their surroundings. Despite their significance, WSNs face rapid and detrimental attacks that can disrupt functionality. Exi… ▽ More Wireless Sensor Networks (WSNs) play a pivotal role as infrastructures, encompassing both stationary and mobile sensors. These sensors self-organize and establish multi-hop connections for communication, collectively sensing, gathering, processing, and transmitting data about their surroundings. Despite their significance, WSNs face rapid and detrimental attacks that can disrupt functionality. Existing intrusion detection methods for WSNs encounter challenges such as low detection rates, computational overhead, and false alarms. These issues stem from sensor node resource constraints, data redundancy, and high correlation within the network. To address these challenges, we propose an innovative intrusion detection approach that integrates Machine Learning (ML) techniques with the Synthetic Minority Oversampling Technique Tomek Link (SMOTE-TomekLink) algorithm. This blend synthesizes minority instances and eliminates Tomek links, resulting in a balanced dataset that significantly enhances detection accuracy in WSNs. Additionally, we incorporate feature scaling through standardization to render input features consistent and scalable, facilitating more precise training and detection. To counteract imbalanced WSN datasets, we employ the SMOTE-Tomek resampling technique, mitigating overfitting and underfitting issues. Our comprehensive evaluation, using the WSN Dataset (WSN-DS) containing 374,661 records, identifies the optimal model for intrusion detection in WSNs. The standout outcome of our research is the remarkable performance of our model. In binary, it achieves an accuracy rate of 99.78% and in multiclass, it attains an exceptional accuracy rate of 99.92%. These findings underscore the efficiency and superiority of our proposal in the context of WSN intrusion detection, showcasing its effectiveness in detecting and mitigating intrusions in WSNs. △ Less

Submitted 22 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: International Journal of Information Security, Springer Journal - Q1, Scopus, ISI, SCIE, IF: 3.2 - Accepted on Jan 17, 2024

arXiv:2402.13250 [pdf, other]

Video ReCap: Recursive Captioning of Hour-Long Videos

Authors: Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius

Abstract: Most video captioning models are designed to process short video clips of few seconds and output text describing low-level visual concepts (e.g., objects, scenes, atomic actions). However, most real-world videos last for minutes or hours and have a complex hierarchical structure spanning different temporal granularities. We propose Video ReCap, a recursive video captioning model that can process v… ▽ More Most video captioning models are designed to process short video clips of few seconds and output text describing low-level visual concepts (e.g., objects, scenes, atomic actions). However, most real-world videos last for minutes or hours and have a complex hierarchical structure spanning different temporal granularities. We propose Video ReCap, a recursive video captioning model that can process video inputs of dramatically different lengths (from 1 second to 2 hours) and output video captions at multiple hierarchy levels. The recursive video-language architecture exploits the synergy between different video hierarchies and can process hour-long videos efficiently. We utilize a curriculum learning training scheme to learn the hierarchical structure of videos, starting from clip-level captions describing atomic actions, then focusing on segment-level descriptions, and concluding with generating summaries for hour-long videos. Furthermore, we introduce Ego4D-HCap dataset by augmenting Ego4D with 8,267 manually collected long-range video summaries. Our recursive model can flexibly generate captions at different hierarchy levels while also being useful for other complex video understanding tasks, such as VideoQA on EgoSchema. Data, code, and models are available at: https://sites.google.com/view/vidrecap △ Less

Submitted 16 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR 2024

arXiv:2402.05158 [pdf, other]

Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types

Authors: AKM Shahariar Azad Rabby, Hasmot Ali, Md. Majedul Islam, Sheikh Abujar, Fuad Rahman

Abstract: This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and han… ▽ More This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents. The system handles static and dynamic handwritten inputs, recognizing various writing styles. Furthermore, it has the ability to recognize compound characters in Bengali. Extensive data collection efforts provide a diverse corpus, while advanced technical components optimize character and word recognition. Additional contributions include image, logo, signature and table recognition, perspective correction, layout reconstruction, and a queuing module for efficient and scalable processing. The system demonstrates outstanding performance in efficient and accurate text extraction and analysis. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 8 pages, 7 figures, 4 table Link of the paper https://openaccess.thecvf.com/content/WACV2024W/WVLL/html/Rabby_Enhancement_of_Bengali_OCR_by_Specialized_Models_and_Advanced_Techniques_WACVW_2024_paper.html

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 1102-1109

arXiv:2402.04507 [pdf]

A Review on Digital Pixel Sensors

Authors: Md Rahatul Islam Udoy, Shamiul Alam, Md Mazharul Islam, Akhilesh Jaiswal, Ahmedullah Aziz

Abstract: Digital pixel sensor (DPS) has evolved as a pivotal component in modern imaging systems and has the potential to revolutionize various fields such as medical imaging, astronomy, surveillance, IoT devices, etc. Compared to analog pixel sensors, the DPS offers high speed and good image quality. However, the introduced intrinsic complexity within each pixel, primarily attributed to the accommodation… ▽ More Digital pixel sensor (DPS) has evolved as a pivotal component in modern imaging systems and has the potential to revolutionize various fields such as medical imaging, astronomy, surveillance, IoT devices, etc. Compared to analog pixel sensors, the DPS offers high speed and good image quality. However, the introduced intrinsic complexity within each pixel, primarily attributed to the accommodation of the ADC circuit, engenders a substantial increase in the pixel pitch. Unfortunately, such a pronounced escalation in pixel pitch drastically undermines the feasibility of achieving high-density integration, which is an obstacle that significantly narrows down the field of potential applications. Nonetheless, designing compact conversion circuits along with strategic integration of 3D architectural paradigms can be a potential remedy to the prevailing situation. This review article presents a comprehensive overview of the vast area of DPS technology. The operating principles, advantages, and challenges of different types of DPS circuits have been analyzed. We categorize the schemes into several categories based on ADC operation. A comparative study based on different performance metrics has also been showcased for a well-rounded understanding. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02036 [pdf, other]

Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks

Authors: Zhuomin Chen, Jiaxing Zhang, Jingchao Ni, Xiaoting Li, Yuchen Bian, Md Mezbahul Islam, Ananda Mohan Mondal, Hua Wei, Dongsheng Luo

Abstract: Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original g… ▽ More Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original graphs. This task is challenging due to the substantial distributional shift from the original graphs in the training set to the set of explainable subgraphs, which prevents accurate prediction of labels with the subgraphs. To address it, in this paper, we propose a novel method that generates proxy graphs for explainable subgraphs that are in the distribution of training data. We introduce a parametric method that employs graph generators to produce proxy graphs. A new training objective based on information theory is designed to ensure that proxy graphs not only adhere to the distribution of training data but also preserve explanatory factors. Such generated proxy graphs can be reliably used to approximate the predictions of the labels of explainable subgraphs. Empirical evaluations across various datasets demonstrate our method achieves more accurate explanations for GNNs. △ Less

Submitted 29 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: Accepted to International Conference on Machine Learning (ICML 2024)

arXiv:2401.12262 [pdf, other]

Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction

Authors: Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin, Khondokar Fida Hasan, Selina Sharmin, Salem A. Alyami, Mohammad Ali Moni

Abstract: Cybersecurity has emerged as a critical global concern. Intrusion Detection Systems (IDS) play a critical role in protecting interconnected networks by detecting malicious actors and activities. Machine Learning (ML)-based behavior analysis within the IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network.… ▽ More Cybersecurity has emerged as a critical global concern. Intrusion Detection Systems (IDS) play a critical role in protecting interconnected networks by detecting malicious actors and activities. Machine Learning (ML)-based behavior analysis within the IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network. However, as the number of data grows, dimension reduction becomes an increasingly difficult task when training ML models. Addressing this, our paper introduces a novel ML-based network intrusion detection model that uses Random Oversampling (RO) to address data imbalance and Stacking Feature Embedding based on clustering results, as well as Principal Component Analysis (PCA) for dimension reduction and is specifically designed for large and imbalanced datasets. This model's performance is carefully evaluated using three cutting-edge benchmark datasets: UNSW-NB15, CIC-IDS-2017, and CIC-IDS-2018. On the UNSW-NB15 dataset, our trials show that the RF and ET models achieve accuracy rates of 99.59% and 99.95%, respectively. Furthermore, using the CIC-IDS2017 dataset, DT, RF, and ET models reach 99.99% accuracy, while DT and RF models obtain 99.94% accuracy on CIC-IDS2018. These performance results continuously outperform the state-of-art, indicating significant progress in the field of network intrusion detection. This achievement demonstrates the efficacy of the suggested methodology, which can be used practically to accurately monitor and identify network traffic intrusions, thereby blocking possible threats. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted in Journal of Big Data (Q1, IF: 8.1, SCIE) on Jan 19, 2024

arXiv:2401.04746 [pdf]

Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-based Non-invasive Digital System

Authors: Galib Muhammad Shahriar Himel, Md. Masudul Islam, Kh Abdullah Al-Aff, Shams Ibne Karim, Md. Kabir Uddin Sikder

Abstract: Skin cancer is a global health concern, necessitating early and accurate diagnosis for improved patient outcomes. This study introduces a groundbreaking approach to skin cancer classification, employing the Vision Transformer, a state-of-the-art deep learning architecture renowned for its success in diverse image analysis tasks. Utilizing the HAM10000 dataset of 10,015 meticulously annotated skin… ▽ More Skin cancer is a global health concern, necessitating early and accurate diagnosis for improved patient outcomes. This study introduces a groundbreaking approach to skin cancer classification, employing the Vision Transformer, a state-of-the-art deep learning architecture renowned for its success in diverse image analysis tasks. Utilizing the HAM10000 dataset of 10,015 meticulously annotated skin lesion images, the model undergoes preprocessing for enhanced robustness. The Vision Transformer, adapted to the skin cancer classification task, leverages the self-attention mechanism to capture intricate spatial dependencies, achieving superior performance over traditional deep learning architectures. Segment Anything Model aids in precise segmentation of cancerous areas, attaining high IOU and Dice Coefficient. Extensive experiments highlight the model's supremacy, particularly the Google-based ViT patch-32 variant, which achieves 96.15% accuracy and showcases potential as an effective tool for dermatologists in skin cancer diagnosis, contributing to advancements in dermatological practices. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.04666 [pdf]

Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs Dataset

Authors: Galib Muhammad Shahriar Himel, Md. Masudul Islam

Abstract: As the most basic application and implementation of deep learning, image classification has grown in popularity. Various datasets are provided by renowned data science communities for benchmarking machine learning algorithms and pre-trained models. The ASSIRA Cats & Dogs dataset is one of them and is being used in this research for its overall acceptance and benchmark standards. A comparison of va… ▽ More As the most basic application and implementation of deep learning, image classification has grown in popularity. Various datasets are provided by renowned data science communities for benchmarking machine learning algorithms and pre-trained models. The ASSIRA Cats & Dogs dataset is one of them and is being used in this research for its overall acceptance and benchmark standards. A comparison of various pre-trained models is demonstrated by using different types of optimizers and loss functions. Hyper-parameters are changed to gain the best result from a model. By applying this approach, we have got higher accuracy without major changes in the training model. To run the experiment, we used three different computer architectures: a laptop equipped with NVIDIA GeForce GTX 1070, a laptop equipped with NVIDIA GeForce RTX 3080Ti, and a desktop equipped with NVIDIA GeForce RTX 3090. The acquired results demonstrate supremacy in terms of accuracy over the previously done experiments on this dataset. From this experiment, the highest accuracy which is 99.65% is gained using the NASNet Large. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.04057 [pdf]

doi 10.5281/zenodo.10469839

Unveiling Bias in Fairness Evaluations of Large Language Models: A Critical Literature Review of Music and Movie Recommendation Systems

Authors: Chandan Kumar Sah, Dr. Lian Xiaoli, Muhammad Mirajul Islam

Abstract: The rise of generative artificial intelligence, particularly Large Language Models (LLMs), has intensified the imperative to scrutinize fairness alongside accuracy. Recent studies have begun to investigate fairness evaluations for LLMs within domains such as recommendations. Given that personalization is an intrinsic aspect of recommendation systems, its incorporation into fairness assessments is… ▽ More The rise of generative artificial intelligence, particularly Large Language Models (LLMs), has intensified the imperative to scrutinize fairness alongside accuracy. Recent studies have begun to investigate fairness evaluations for LLMs within domains such as recommendations. Given that personalization is an intrinsic aspect of recommendation systems, its incorporation into fairness assessments is paramount. Yet, the degree to which current fairness evaluation frameworks account for personalization remains unclear. Our comprehensive literature review aims to fill this gap by examining how existing frameworks handle fairness evaluations of LLMs, with a focus on the integration of personalization factors. Despite an exhaustive collection and analysis of relevant works, we discovered that most evaluations overlook personalization, a critical facet of recommendation systems, thereby inadvertently perpetuating unfair practices. Our findings shed light on this oversight and underscore the urgent need for more nuanced fairness evaluations that acknowledge personalization. Such improvements are vital for fostering equitable development within the AI community. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 10 pages

arXiv:2312.17235 [pdf, other]

A Simple LLM Framework for Long-Range Video Question-Answering

Authors: Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

Abstract: We present LLoVi, a language-based framework for long-range video question-answering (LVQA). Unlike prior long-range video understanding methods, which are often costly and require specialized long-range video modeling design (e.g., memory queues, state-space layers, etc.), our approach uses a frame/clip-level visual captioner (e.g., BLIP2, LaViLa, LLaVA) coupled with a Large Language Model (GPT-3… ▽ More We present LLoVi, a language-based framework for long-range video question-answering (LVQA). Unlike prior long-range video understanding methods, which are often costly and require specialized long-range video modeling design (e.g., memory queues, state-space layers, etc.), our approach uses a frame/clip-level visual captioner (e.g., BLIP2, LaViLa, LLaVA) coupled with a Large Language Model (GPT-3.5, GPT-4) leading to a simple yet surprisingly effective LVQA framework. Specifically, we decompose short and long-range modeling aspects of LVQA into two stages. First, we use a short-term visual captioner to generate textual descriptions of short video clips (0.5-8s in length) densely sampled from a long input video. Afterward, an LLM aggregates the densely extracted short-term captions to perform long-range temporal reasoning needed to understand the whole video and answer a question. To analyze what makes our simple framework so effective, we thoroughly evaluate various components of our system. Our empirical analysis reveals that the choice of the visual captioner and LLM is critical for good LVQA performance. Furthermore, we show that a specialized prompt that asks the LLM first to summarize the noisy short-term visual captions and then answer a given input question leads to a significant LVQA performance boost. On EgoSchema, which is best known as a very long-form video question-answering benchmark, our method achieves 50.3% accuracy, outperforming the previous best-performing approach by 18.1% (absolute gain). In addition, our approach outperforms the previous state-of-the-art by 4.1% and 3.1% on NeXT-QA and IntentQA. We also extend LLoVi to grounded LVQA and show that it outperforms all prior methods on the NeXT-GQA dataset. We will release our code at https://github.com/CeeZh/LLoVi. △ Less

Submitted 26 February, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.06729 [pdf, other]

RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos

Authors: Tanveer Hannan, Md Mohaiminul Islam, Thomas Seidl, Gedas Bertasius

Abstract: Locating specific moments within long videos (20-120 minutes) presents a significant challenge, akin to finding a needle in a haystack. Adapting existing short video (5-30 seconds) grounding methods to this problem yields poor performance. Since most real life videos, such as those on YouTube and AR/VR, are lengthy, addressing this issue is crucial. Existing methods typically operate in two stages… ▽ More Locating specific moments within long videos (20-120 minutes) presents a significant challenge, akin to finding a needle in a haystack. Adapting existing short video (5-30 seconds) grounding methods to this problem yields poor performance. Since most real life videos, such as those on YouTube and AR/VR, are lengthy, addressing this issue is crucial. Existing methods typically operate in two stages: clip retrieval and grounding. However, this disjoint process limits the retrieval module's fine-grained event understanding, crucial for specific moment detection. We propose RGNet which deeply integrates clip retrieval and grounding into a single network capable of processing long videos into multiple granular levels, e.g., clips and frames. Its core component is a novel transformer encoder, RG-Encoder, that unifies the two stages through shared features and mutual optimization. The encoder incorporates a sparse attention mechanism and an attention loss to model both granularity jointly. Moreover, we introduce a contrastive clip sampling technique to mimic the long video paradigm closely during training. RGNet surpasses prior methods, showcasing state-of-the-art performance on long video temporal grounding (LVTG) datasets MAD and Ego4D. △ Less

Submitted 13 July, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: The code is released at https://github.com/Tanveer81/RGNet

arXiv:2311.18259 [pdf, other]

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/ △ Less

Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

arXiv:2311.12823 [pdf, other]

doi 10.1109/ICSECS58457.2023.10256323

EWasteNet: A Two-Stream Data Efficient Image Transformer Approach for E-Waste Classification

Authors: Niful Islam, Md. Mehedi Hasan Jony, Emam Hasan, Sunny Sutradhar, Atikur Rahman, Md. Motaharul Islam

Abstract: Improper disposal of e-waste poses global environmental and health risks, raising serious concerns. The accurate classification of e-waste images is critical for efficient management and recycling. In this paper, we have presented a comprehensive dataset comprised of eight different classes of images of electronic devices named the E-Waste Vision Dataset. We have also presented EWasteNet, a novel… ▽ More Improper disposal of e-waste poses global environmental and health risks, raising serious concerns. The accurate classification of e-waste images is critical for efficient management and recycling. In this paper, we have presented a comprehensive dataset comprised of eight different classes of images of electronic devices named the E-Waste Vision Dataset. We have also presented EWasteNet, a novel two-stream approach for precise e-waste image classification based on a data-efficient image transformer (DeiT). The first stream of EWasteNet passes through a sobel operator that detects the edges while the second stream is directed through an Atrous Spatial Pyramid Pooling and attention block where multi-scale contextual information is captured. We train both of the streams simultaneously and their features are merged at the decision level. The DeiT is used as the backbone of both streams. Extensive analysis of the e-waste dataset indicates the usefulness of our method, providing 96% accuracy in e-waste classification. The proposed approach demonstrates significant usefulness in addressing the global concern of e-waste management. It facilitates efficient waste management and recycling by accurately classifying e-waste images, reducing health and safety hazards associated with improper disposal. △ Less

Submitted 28 September, 2023; originally announced November 2023.

Comments: 6 pages

Journal ref: 2023 IEEE 8th International Conference On Software Engineering and Computer Systems (ICSECS), Penang, Malaysia, 2023, pp. 435-440

arXiv:2310.19830 [pdf]

GalliformeSpectra: A Hen Breed Dataset

Authors: Galib Muhammad Shahriar Himel, Md Masudul Islam

Abstract: This article presents a comprehensive dataset featuring ten distinct hen breeds, sourced from various regions, capturing the unique characteristics and traits of each breed. The dataset encompasses Bielefeld, Blackorpington, Brahma, Buckeye, Fayoumi, Leghorn, Newhampshire, Plymouthrock, Sussex, and Turken breeds, offering a diverse representation of poultry commonly bred worldwide. A total of 1010… ▽ More This article presents a comprehensive dataset featuring ten distinct hen breeds, sourced from various regions, capturing the unique characteristics and traits of each breed. The dataset encompasses Bielefeld, Blackorpington, Brahma, Buckeye, Fayoumi, Leghorn, Newhampshire, Plymouthrock, Sussex, and Turken breeds, offering a diverse representation of poultry commonly bred worldwide. A total of 1010 original JPG images were meticulously collected, showcasing the physical attributes, feather patterns, and distinctive features of each hen breed. These images were subsequently standardized, resized, and converted to PNG format for consistency within the dataset. The compilation, although unevenly distributed across the breeds, provides a rich resource, serving as a foundation for research and applications in poultry science, genetics, and agricultural studies. This dataset holds significant potential to contribute to various fields by enabling the exploration and analysis of unique characteristics and genetic traits across different hen breeds, thereby supporting advancements in poultry breeding, farming, and genetic research. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2309.13046 [pdf]

Privacy Preserving Machine Learning for Behavioral Authentication Systems

Authors: Md Morshedul Islam, Md Abdur Rafiq

Abstract: A behavioral authentication (BA) system uses the behavioral characteristics of users to verify their identity claims. A BA verification algorithm can be constructed by training a neural network (NN) classifier on users' profiles. The trained NN model classifies the presented verification data, and if the classification matches the claimed identity, the verification algorithm accepts the claim. Thi… ▽ More A behavioral authentication (BA) system uses the behavioral characteristics of users to verify their identity claims. A BA verification algorithm can be constructed by training a neural network (NN) classifier on users' profiles. The trained NN model classifies the presented verification data, and if the classification matches the claimed identity, the verification algorithm accepts the claim. This classification-based approach removes the need to maintain a profile database. However, similar to other NN architectures, the NN classifier of the BA system is vulnerable to privacy attacks. To protect the privacy of training and test data used in an NN different techniques are widely used. In this paper, our focus is on a non-crypto-based approach, and we used random projection (RP) to ensure data privacy in an NN model. RP is a distance-preserving transformation based on a random matrix. Before sharing the profiles with the verifier, users will transform their profiles by RP and keep their matrices secret. To reduce the computation load in RP, we use sparse random projection, which is very effective for low-compute devices. Along with correctness and security properties, our system can ensure the changeability property of the BA system. We also introduce an ML-based privacy attack, and our proposed system is robust against this and other privacy and security attacks. We implemented our approach on three existing behavioral BA systems and achieved a below 2.0% FRR and a below 1.0% FAR rate. Moreover, the machine learning-based privacy attacker can only recover below 3.0% to 12.0% of features from a portion of the projected profiles. However, these recovered features are not sufficient to know details about the users' behavioral pattern or to be used in a subsequent attack. Our approach is general and can be used in other NN-based BA systems as well as in traditional biometric systems. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.15756 [pdf]

Reimagining Sense Amplifiers: Harnessing Phase Transition Materials for Current and Voltage Sensing

Authors: Md Mazharul Islam, Shamiul Alam, Mohammad Adnan Jahangir, Garrett S. Rose, Suman Datta, Vijaykrishnan Narayanan, Sumeet Kumar Gupta, Ahmedullah Aziz

Abstract: Energy-efficient sense amplifier (SA) circuits are essential for reliable detection of stored memory states in emerging memory systems. In this work, we present four novel sense amplifier (SA) topologies based on phase transition material (PTM) tailored for non-volatile memory applications. We utilize the abrupt switching and volatile hysteretic characteristics of PTMs which enables efficient and… ▽ More Energy-efficient sense amplifier (SA) circuits are essential for reliable detection of stored memory states in emerging memory systems. In this work, we present four novel sense amplifier (SA) topologies based on phase transition material (PTM) tailored for non-volatile memory applications. We utilize the abrupt switching and volatile hysteretic characteristics of PTMs which enables efficient and fast sensing operation in our proposed SA topologies. We provide comprehensive details of their functionality and assess how process variations impact their performance metrics. Our proposed sense amplifier topologies manifest notable performance enhancement. We achieve a ~67% reduction in sensing delay and a ~80% decrease in sensing power for current sensing. For voltage sensing, we achieve a ~75% reduction in sensing delay and a ~33% decrease in sensing power. Moreover, the proposed SA topologies exhibit improved variation robustness compared to conventional SAs. We also scrutinize the dependence of transistor mirroring window and PTM transition voltages on several device parameters to determine the optimum operating conditions and stance of tunability for each of the proposed SA topologies. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.15754 [pdf]

A Deep Dive into the Design Space of a Dynamically Reconfigurable Cryogenic Spiking Neuron

Authors: Md Mazharul Islam, Shamiul Alam, Catherine D Schuman, Md Shafayat Hossain, Ahmedullah Aziz

Abstract: Spiking neural network offers the most bio-realistic approach to mimic the parallelism and compactness of the human brain. A spiking neuron is the central component of an SNN which generates information-encoded spikes. We present a comprehensive design space analysis of the superconducting memristor (SM)-based electrically reconfigurable cryogenic neuron. A superconducting nanowire (SNW) connected… ▽ More Spiking neural network offers the most bio-realistic approach to mimic the parallelism and compactness of the human brain. A spiking neuron is the central component of an SNN which generates information-encoded spikes. We present a comprehensive design space analysis of the superconducting memristor (SM)-based electrically reconfigurable cryogenic neuron. A superconducting nanowire (SNW) connected in parallel with an SM function as a dual-frequency oscillator and two of these oscillators can be coupled to design a dynamically tunable spiking neuron. The same neuron topology was previously proposed where a fixed resistance was used in parallel with the SNW. Replacing the fixed resistance with the SM provides an additional tuning knob with four distinct combinations of SM resistances, which improves the reconfigurability by up to ~70%. Utilizing an external bias current (Ibias), the spike frequency can be modulated up to ~3.5 times. Two distinct spike amplitudes (~1V and ~1.8 V) are also achieved. Here, we perform a systematic sensitivity analysis and show that the reconfigurability can be further tuned by choosing a higher input current strength. By performing a 500-point Monte Carlo variation analysis, we find that the spike amplitude is more variation robust than spike frequency and the variation robustness can be further improved by choosing a higher Ibias. Our study provides valuable insights for further exploration of materials and circuit level modification of the neuron that will be useful for system-level incorporation of the neuron circuit △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2307.13143 [pdf, ps, other]

doi 10.1109/TSE.2011.26

Evaluation and Measurement of Software Process Improvement -- A Systematic Literature Review

Authors: Michael Unterkalmsteiner, Tony Gorschek, A. K. M. Moinul Islam, Chow Kian Cheng, Rahadian Bayu Permadi, Robert Feldt

Abstract: BACKGROUND: Software Process Improvement (SPI) is a systematic approach to increase the efficiency and effectiveness of a software development organization and to enhance software products. OBJECTIVE: This paper aims to identify and characterize evaluation strategies and measurements used to assess the impact of different SPI initiatives. METHOD: The systematic literature review includes 148 paper… ▽ More BACKGROUND: Software Process Improvement (SPI) is a systematic approach to increase the efficiency and effectiveness of a software development organization and to enhance software products. OBJECTIVE: This paper aims to identify and characterize evaluation strategies and measurements used to assess the impact of different SPI initiatives. METHOD: The systematic literature review includes 148 papers published between 1991 and 2008. The selected papers were classified according to SPI initiative, applied evaluation strategies, and measurement perspectives. Potential confounding factors interfering with the evaluation of the improvement effort were assessed. RESULTS: Seven distinct evaluation strategies were identified, wherein the most common one, "Pre-Post Comparison" was applied in 49 percent of the inspected papers. Quality was the most measured attribute (62 percent), followed by Cost (41 percent), and Schedule (18 percent). Looking at measurement perspectives, "Project" represents the majority with 66 percent. CONCLUSION: The evaluation validity of SPI initiatives is challenged by the scarce consideration of potential confounding factors, particularly given that "Pre-Post Comparison" was identified as the most common evaluation strategy, and the inaccurate descriptions of the evaluation context. Measurements to assess the short and mid-term impact of SPI initiatives prevail, whereas long-term measurements in terms of customer satisfaction and return on investment tend to be less used. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Journal ref: IEEE Trans. Software Eng. 38(2): 398-424 (2012)

arXiv:2307.13089 [pdf, ps, other]

doi 10.1002/smr.1637

A conceptual framework for SPI evaluation

Authors: Michael Unterkalmsteiner, Tony Gorschek, A. K. M. Moinul Islam, Chow Kian Cheng, Rahadian Bayu Permadi, Robert Feldt

Abstract: Software Process Improvement (SPI) encompasses the analysis and modification of the processes within software development, aimed at improving key areas that contribute to the organizations' goals. The task of evaluating whether the selected improvement path meets these goals is challenging. On the basis of the results of a systematic literature review on SPI measurement and evaluation practices, w… ▽ More Software Process Improvement (SPI) encompasses the analysis and modification of the processes within software development, aimed at improving key areas that contribute to the organizations' goals. The task of evaluating whether the selected improvement path meets these goals is challenging. On the basis of the results of a systematic literature review on SPI measurement and evaluation practices, we developed a framework (SPI Measurement and Evaluation Framework (SPI-MEF)) that supports the planning and implementation of SPI evaluations. SPI-MEF guides the practitioner in scoping the evaluation, determining measures, and performing the assessment. SPI-MEF does not assume a specific approach to process improvement and can be integrated in existing measurement programs, refocusing the assessment on evaluating the improvement initiative's outcome. Sixteen industry and academic experts evaluated the framework's usability and capability to support practitioners, providing additional insights that were integrated in the application guidelines of the framework. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Journal ref: J. Softw. Evol. Process. 26(2): 251-279 (2014)

arXiv:2306.06124 [pdf, other]

Unsupervised clustering of disturbances in power systems via deep convolutional autoencoders

Authors: Md Maidul Islam, Md Omar Faruque, Joshua Butterfield, Gaurav Singh, Thomas A. Cooke

Abstract: Power quality (PQ) events are recorded by PQ meters whenever anomalous events are detected on the power grid. Using neural networks with machine learning can aid in accurately classifying the recorded waveforms and help power system engineers diagnose and rectify the root causes of problems. However, many of the waveforms captured during a disturbance in the power system need to be labeled for sup… ▽ More Power quality (PQ) events are recorded by PQ meters whenever anomalous events are detected on the power grid. Using neural networks with machine learning can aid in accurately classifying the recorded waveforms and help power system engineers diagnose and rectify the root causes of problems. However, many of the waveforms captured during a disturbance in the power system need to be labeled for supervised learning, leaving a large number of data recordings for engineers to process manually or go unseen. This paper presents an autoencoder and K-means clustering-based unsupervised technique that can be used to cluster PQ events into categories like sag, interruption, transients, normal, and harmonic distortion to enable filtering of anomalous waveforms from recurring or normal waveforms. The method is demonstrated using three-phase, field-obtained voltage waveforms recorded in a distribution grid. First, a convolutional autoencoder compresses the input signals into a set of lower feature dimensions which, after further processing, is passed to the K-means algorithm to identify data clusters. Using a small, labeled dataset, numerical labels are then assigned to events based on a cosine similarity analysis. Finally, the study analyzes the clusters using the t-distributed stochastic neighbor embedding (t-SNE) visualization tool, demonstrating that the technique can help investigate a large number of captured events in a quick manner. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2305.12844 [pdf, other]

An Optimized Ensemble Deep Learning Model For Brain Tumor Classification

Authors: Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin

Abstract: Brain tumors present a grave risk to human life, demanding precise and timely diagnosis for effective treatment. Inaccurate identification of brain tumors can significantly diminish life expectancy, underscoring the critical need for precise diagnostic methods. Manual identification of brain tumors within vast Magnetic Resonance Imaging (MRI) image datasets is arduous and time-consuming. Thus, the… ▽ More Brain tumors present a grave risk to human life, demanding precise and timely diagnosis for effective treatment. Inaccurate identification of brain tumors can significantly diminish life expectancy, underscoring the critical need for precise diagnostic methods. Manual identification of brain tumors within vast Magnetic Resonance Imaging (MRI) image datasets is arduous and time-consuming. Thus, the development of a reliable deep learning (DL) model is essential to enhance diagnostic accuracy and ultimately save lives. This study introduces an innovative optimization-based deep ensemble approach employing transfer learning (TL) to efficiently classify brain tumors. Our methodology includes meticulous preprocessing, reconstruction of TL architectures, fine-tuning, and ensemble DL models utilizing weighted optimization techniques such as Genetic Algorithm-based Weight Optimization (GAWO) and Grid Search-based Weight Optimization (GSWO). Experimentation is conducted on the Figshare Contrast-Enhanced MRI (CE-MRI) brain tumor dataset, comprising 3064 images. Our approach achieves notable accuracy scores, with Xception, ResNet50V2, ResNet152V2, InceptionResNetV2, GAWO, and GSWO attaining 99.42%, 98.37%, 98.22%, 98.26%, 99.71%, and 99.76% accuracy, respectively. Notably, GSWO demonstrates superior accuracy, averaging 99.76\% accuracy across five folds on the Figshare CE-MRI brain tumor dataset. The comparative analysis highlights the significant performance enhancement of our proposed model over existing counterparts. In conclusion, our optimized deep ensemble model exhibits exceptional accuracy in swiftly classifying brain tumors. Furthermore, it has the potential to assist neurologists and clinicians in making accurate and immediate diagnostic decisions. △ Less

Submitted 6 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2304.06015 [pdf]

An Improved Heart Disease Prediction Using Stacked Ensemble Method

Authors: Md. Maidul Islam, Tanzina Nasrin Tania, Sharmin Akter, Kazi Hassan Shakib

Abstract: Heart disorder has just overtaken cancer as the world's biggest cause of mortality. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. Medical data is collected in large quantities by the healthcare industry, but it is not well mined. The discovery of previously unknown patterns and connections in this information can… ▽ More Heart disorder has just overtaken cancer as the world's biggest cause of mortality. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. Medical data is collected in large quantities by the healthcare industry, but it is not well mined. The discovery of previously unknown patterns and connections in this information can help with an improved decision when it comes to forecasting heart disorder risk. In the proposed study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, cross-validation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews' correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 14 pages, 5 figures and submitted to Springer Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

arXiv:2212.14427 [pdf, other]

Efficient Movie Scene Detection using State-Space Transformers

Authors: Md Mohaiminul Islam, Mahmudul Hasan, Kishan Shamsundar Athrey, Tony Braskich, Gedas Bertasius

Abstract: The ability to distinguish between different movie scenes is critical for understanding the storyline of a movie. However, accurately detecting movie scenes is often challenging as it requires the ability to reason over very long movie segments. This is in contrast to most existing video recognition models, which are typically designed for short-range video analysis. This work proposes a State-Spa… ▽ More The ability to distinguish between different movie scenes is critical for understanding the storyline of a movie. However, accurately detecting movie scenes is often challenging as it requires the ability to reason over very long movie segments. This is in contrast to most existing video recognition models, which are typically designed for short-range video analysis. This work proposes a State-Space Transformer model that can efficiently capture dependencies in long movie videos for accurate movie scene detection. Our model, dubbed TranS4mer, is built using a novel S4A building block, which combines the strengths of structured state-space sequence (S4) and self-attention (A) layers. Given a sequence of frames divided into movie shots (uninterrupted periods where the camera position does not change), the S4A block first applies self-attention to capture short-range intra-shot dependencies. Afterward, the state-space operation in the S4A block is used to aggregate long-range inter-shot cues. The final TranS4mer model, which can be trained end-to-end, is obtained by stacking the S4A blocks one after the other multiple times. Our proposed TranS4mer outperforms all prior methods in three movie scene detection datasets, including MovieNet, BBC, and OVSD, while also being $2\times$ faster and requiring $3\times$ less GPU memory than standard Transformer models. We will release our code and models. △ Less

Submitted 21 June, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: Accepted by CVPR 2023. Code: https://github.com/md-mohaiminul/TranS4mer

arXiv:2212.13835 [pdf, other]

Representation Learning in Deep RL via Discrete Information Bottleneck

Authors: Riashat Islam, Hongyu Zang, Manan Tomar, Aniket Didolkar, Md Mofijul Islam, Samin Yeasar Arnob, Tariq Iqbal, Xin Li, Anirudh Goyal, Nicolas Heess, Alex Lamb

Abstract: Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in th… ▽ More Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information. △ Less

Submitted 30 May, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: AISTATS 2023

arXiv:2212.04546 [pdf, other]

doi 10.1016/j.jisa.2022.103405

A Dependable Hybrid Machine Learning Model for Network Intrusion Detection

Authors: Md. Alamin Talukder, Khondokar Fida Hasan, Md. Manowarul Islam, Md Ashraf Uddin, Arnisha Akhter, Mohammad Abu Yousuf, Fares Alharbi, Mohammad Ali Moni

Abstract: Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the… ▽ More Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the needs of today's networks in terms of required accuracy and dependability? In this research, we propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our proposed method ensures efficient pre-processing by combining SMOTE for data balancing and XGBoost for feature selection. We compared our developed method to various machine learning and deep learning algorithms to find a more efficient algorithm to implement in the pipeline. Furthermore, we chose the most effective model for network intrusion based on a set of benchmarked performance analysis criteria. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022, with an accuracy of 99.99% and 100% for KDDCUP'99 and CIC-MalMem-2022, respectively, and no overfitting or Type-1 and Type-2 issues. △ Less

Submitted 27 January, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: Accepted in the Journal of Information Security and Applications (Scopus, Web of Science (SCIE) Journal, Quartile: Q1, Site Score: 7.6, Impact Factor: 4.96) on 7 December 2022

Journal ref: Journal of Information Security and Applications, Volume 72, Pages 103405, Year 2023, ISSN 2214-2126

arXiv:2211.14235 [pdf, other]

doi 10.1007/s00521-023-08493-1

DoubleU-NetPlus: A Novel Attention and Context Guided Dual U-Net with Multi-Scale Residual Feature Fusion Network for Semantic Segmentation of Medical Images

Authors: Md. Rayhan Ahmed, Adnan Ferdous Ashrafi, Raihan Uddin Ahmed, Swakkhar Shatabda, A. K. M. Muzahidul Islam, Salekul Islam

Abstract: Accurate segmentation of the region of interest in medical images can provide an essential pathway for devising effective treatment plans for life-threatening diseases. It is still challenging for U-Net, and its state-of-the-art variants, such as CE-Net and DoubleU-Net, to effectively model the higher-level output feature maps of the convolutional units of the network mostly due to the presence of… ▽ More Accurate segmentation of the region of interest in medical images can provide an essential pathway for devising effective treatment plans for life-threatening diseases. It is still challenging for U-Net, and its state-of-the-art variants, such as CE-Net and DoubleU-Net, to effectively model the higher-level output feature maps of the convolutional units of the network mostly due to the presence of various scales of the region of interest, intricacy of context environments, ambiguous boundaries, and multiformity of textures in medical images. In this paper, we exploit multi-contextual features and several attention strategies to increase networks' ability to model discriminative feature representation for more accurate medical image segmentation, and we present a novel dual U-Net-based architecture named DoubleU-NetPlus. The DoubleU-NetPlus incorporates several architectural modifications. In particular, we integrate EfficientNetB7 as the feature encoder module, a newly designed multi-kernel residual convolution module, and an adaptive feature re-calibrating attention-based atrous spatial pyramid pooling module to progressively and precisely accumulate discriminative multi-scale high-level contextual feature maps and emphasize the salient regions. In addition, we introduce a novel triple attention gate module and a hybrid triple attention module to encourage selective modeling of relevant medical image features. Moreover, to mitigate the gradient vanishing issue and incorporate high-resolution features with deeper spatial details, the standard convolution operation is replaced with the attention-guided residual convolution operations, ... △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 25 pages, 9 figures, 4 tables, Submitted to Springer

MSC Class: 92C55 (Primary) ACM Class: I.4.6

Journal ref: Neural Computing and Applications, Volume 35, Pages 14379 - 14401 (2023)

arXiv:2207.11814 [pdf, other]

Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

Authors: Md Mohaiminul Islam, Gedas Bertasius

Abstract: This report describes our submission called "TarHeels" for the Ego4D: Object State Change Classification Challenge. We use a transformer-based video recognition model and leverage the Divided Space-Time Attention mechanism for classifying object state change in egocentric videos. Our submission achieves the second-best performance in the challenge. Furthermore, we perform an ablation study to show… ▽ More This report describes our submission called "TarHeels" for the Ego4D: Object State Change Classification Challenge. We use a transformer-based video recognition model and leverage the Divided Space-Time Attention mechanism for classifying object state change in egocentric videos. Our submission achieves the second-best performance in the challenge. Furthermore, we perform an ablation study to show that identifying object state change in egocentric videos requires temporal modeling ability. Lastly, we present several positive and negative examples to visualize our model's predictions. The code is publicly available at: https://github.com/md-mohaiminul/ObjectStateChange △ Less

Submitted 4 January, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

Comments: 2nd place winner, Ego4D challenge, CVPR 2022

arXiv:2206.01088 [pdf, other]

Machine Learning-based Lung and Colon Cancer Detection using Deep Feature Extraction and Ensemble Learning

Authors: Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin, Arnisha Akhter, Khondokar Fida Hasan, Mohammad Ali Moni

Abstract: Cancer is a fatal disease caused by a combination of genetic diseases and a variety of biochemical abnormalities. Lung and colon cancer have emerged as two of the leading causes of death and disability in humans. The histopathological detection of such malignancies is usually the most important component in determining the best course of action. Early detection of the ailment on either front consi… ▽ More Cancer is a fatal disease caused by a combination of genetic diseases and a variety of biochemical abnormalities. Lung and colon cancer have emerged as two of the leading causes of death and disability in humans. The histopathological detection of such malignancies is usually the most important component in determining the best course of action. Early detection of the ailment on either front considerably decreases the likelihood of mortality. Machine learning and deep learning techniques can be utilized to speed up such cancer detection, allowing researchers to study a large number of patients in a much shorter amount of time and at a lower cost. In this research work, we introduced a hybrid ensemble feature extraction model to efficiently identify lung and colon cancer. It integrates deep feature extraction and ensemble learning with high-performance filtering for cancer image datasets. The model is evaluated on histopathological (LC25000) lung and colon datasets. According to the study findings, our hybrid model can detect lung, colon, and (lung and colon) cancer with accuracy rates of 99.05%, 100%, and 99.30%, respectively. The study's findings show that our proposed strategy outperforms existing models significantly. Thus, these models could be applicable in clinics to support the doctor in the diagnosis of cancers. △ Less

Submitted 3 June, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Comments: Accepted for publication in the Special Issue of Expert Systems with Applications (IF:6.954, Cite:12.70) How to Cite: Md. Alamin Talukder, Md. Manowarul Islam, Md Ashraf Uddin, Arnisha Akhter, Khondokar Fida Hasan, Mohammad Ali Moni. "Machine Learning-based Lung and Colon Cancer Detection using Deep Feature Extraction and Ensemble Learning", Expert Systems with Applications. 2022 Jun 1

arXiv:2205.14729 [pdf]

CMOS-Compatible Ising Machines built using Bistable Latches Coupled through Ferroelectric Transistor Arrays

Authors: Antik Mallick, Zijian Zhao, Mohammad Khairul Bashar, Shamiul Alam, Md Mazharul Islam, Yi Xiao, Yixin Xu, Ahmedullah Aziz, Vijaykrishnan Narayanan, Kai Ni, Nikhil Shukla

Abstract: Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in… ▽ More Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in the scalability of such platforms. Therefore, in this work, we propose an Ising machine platform that exploits the novel behavior of compact bi-stable CMOS-latches (cross-coupled inverters) as classical Ising spins interacting through highly scalable and CMOS-process compatible ferroelectric-HfO2-based Ferroelectric FETs (FeFETs) which act as coupling elements. We experimentally demonstrate the prototype building blocks of this system, and evaluate the behavior of the scaled system using simulations. We project that the proposed architecture can compute Ising solutions with an efficiency of ~1.04 x 10^8 solutions/W/second. Our work not only provides a pathway to realizing CMOS-compatible designs but also to overcoming their scaling challenges. △ Less

Submitted 29 May, 2022; originally announced May 2022.

Comments: 29 pages, 10 figures

arXiv:2204.10815 [pdf]

A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning

Authors: Md Mofijul Islam, Gustavo Aguilar, Pragaash Ponnusamy, Clint Solomon Mathialagan, Chengyuan Ma, Chenlei Guo

Abstract: Subword tokenization is a commonly used input pre-processing step in most recent NLP models. However, it limits the models' ability to leverage end-to-end task learning. Its frequency-based vocabulary creation compromises tokenization in low-resource languages, leading models to produce suboptimal representations. Additionally, the dependency on a fixed vocabulary limits the subword models' adapta… ▽ More Subword tokenization is a commonly used input pre-processing step in most recent NLP models. However, it limits the models' ability to leverage end-to-end task learning. Its frequency-based vocabulary creation compromises tokenization in low-resource languages, leading models to produce suboptimal representations. Additionally, the dependency on a fixed vocabulary limits the subword models' adaptability across languages and domains. In this work, we propose a vocabulary-free neural tokenizer by distilling segmentation information from heuristic-based subword tokenization. We pre-train our character-based tokenizer by processing unique words from multilingual corpus, thereby extensively increasing word diversity across languages. Unlike the predefined and fixed vocabularies in subword methods, our tokenizer allows end-to-end task learning, resulting in optimal task-specific tokenization. The experimental results show that replacing the subword tokenizer with our neural tokenizer consistently improves performance on multilingual (NLI) and code-switching (sentiment analysis) tasks, with larger gains in low-resource languages. Additionally, our neural tokenizer exhibits a robust performance on downstream tasks when adversarial noise is present (typos and misspelling), further increasing the initial improvements over statistical subword tokenizers. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Journal ref: ACL 2022 Workshop on Representation Learning for NLP

arXiv:2204.07503 [pdf]

doi 10.1063/5.0133515

Cryogenic Neuromorphic Hardware

Authors: Md Mazharul Islam, Shamiul Alam, Md Shafayat Hossain, Kaushik Roy, Ahmedullah Aziz

Abstract: The revolution in artificial intelligence (AI) brings up an enormous storage and data processing requirement. Large power consumption and hardware overhead have become the main challenges for building next-generation AI hardware. To mitigate this, Neuromorphic computing has drawn immense attention due to its excellent capability for data processing with very low power consumption. While relentless… ▽ More The revolution in artificial intelligence (AI) brings up an enormous storage and data processing requirement. Large power consumption and hardware overhead have become the main challenges for building next-generation AI hardware. To mitigate this, Neuromorphic computing has drawn immense attention due to its excellent capability for data processing with very low power consumption. While relentless research has been underway for years to minimize the power consumption in neuromorphic hardware, we are still a long way off from reaching the energy efficiency of the human brain. Furthermore, design complexity and process variation hinder the large-scale implementation of current neuromorphic platforms. Recently, the concept of implementing neuromorphic computing systems in cryogenic temperature has garnered intense interest thanks to their excellent speed and power metric. Several cryogenic devices can be engineered to work as neuromorphic primitives with ultra-low demand for power. Here we comprehensively review the cryogenic neuromorphic hardware. We classify the existing cryogenic neuromorphic hardware into several hierarchical categories and sketch a comparative analysis based on key performance metrics. Our analysis concisely describes the operation of the associated circuit topology and outlines the advantages and challenges encountered by the state-of-the-art technology platforms. Finally, we provide insights to circumvent these challenges for the future progression of research. △ Less

Submitted 24 August, 2022; v1 submitted 25 March, 2022; originally announced April 2022.

Journal ref: Journal of Applied Physics 133, 070701 (2023)

arXiv:2204.05921 [pdf]

doi 10.1109/JIOT.2022.3228795

Internet of Things Device Capabilities, Architectures, Protocols, and Smart Applications in Healthcare Domain: A Review

Authors: Md. Milon Islam, Sheikh Nooruddin, Fakhri Karray, Ghulam Muhammad

Abstract: Nowadays, the Internet has spread to practically every country around the world and is having unprecedented effects on people's lives. The Internet of Things (IoT) is getting more popular and has a high level of interest in both practitioners and academicians in the age of wireless communication due to its diverse applications. The IoT is a technology that enables everyday things to become savvier… ▽ More Nowadays, the Internet has spread to practically every country around the world and is having unprecedented effects on people's lives. The Internet of Things (IoT) is getting more popular and has a high level of interest in both practitioners and academicians in the age of wireless communication due to its diverse applications. The IoT is a technology that enables everyday things to become savvier, everyday computation towards becoming intellectual, and everyday communication to become a little more insightful. In this paper, the most common and popular IoT device capabilities, architectures, and protocols are demonstrated in brief to provide a clear overview of the IoT technology to the researchers in this area. The common IoT device capabilities including hardware (Raspberry Pi, Arduino, and ESP8266) and software (operating systems, and built-in tools) platforms are described in detail. The widely used architectures that have been recently evolved and used are the three-layer architecture, SOA-based architecture, and middleware-based architecture. The popular protocols for IoT are demonstrated which include CoAP, MQTT, XMPP, AMQP, DDS, LoWPAN, BLE, and Zigbee that are frequently utilized to develop smart IoT applications. Additionally, this research provides an in-depth overview of the potential healthcare applications based on IoT technologies in the context of addressing various healthcare concerns. Finally, this paper summarizes state-of-the-art knowledge, highlights open issues and shortcomings, and provides recommendations for further studies which would be quite beneficial to anyone with a desire to work in this field and make breakthroughs to get expertise in this area. △ Less

Submitted 3 January, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

Comments: 27 pages, 7 figures, 4 tables

Journal ref: IEEE Internet of Things Journal, 2022

arXiv:2204.01692 [pdf, other]

Long Movie Clip Classification with State-Space Video Models

Authors: Md Mohaiminul Islam, Gedas Bertasius

Abstract: Most modern video recognition models are designed to operate on short video clips (e.g., 5-10s in length). Thus, it is challenging to apply such models to long movie understanding tasks, which typically require sophisticated long-range temporal reasoning. The recently introduced video transformers partially address this issue by using long-range temporal self-attention. However, due to the quadrat… ▽ More Most modern video recognition models are designed to operate on short video clips (e.g., 5-10s in length). Thus, it is challenging to apply such models to long movie understanding tasks, which typically require sophisticated long-range temporal reasoning. The recently introduced video transformers partially address this issue by using long-range temporal self-attention. However, due to the quadratic cost of self-attention, such models are often costly and impractical to use. Instead, we propose ViS4mer, an efficient long-range video model that combines the strengths of self-attention and the recently introduced structured state-space sequence (S4) layer. Our model uses a standard Transformer encoder for short-range spatiotemporal feature extraction, and a multi-scale temporal S4 decoder for subsequent long-range temporal reasoning. By progressively reducing the spatiotemporal feature resolution and channel dimension at each decoder layer, ViS4mer learns complex long-range spatiotemporal dependencies in a video. Furthermore, ViS4mer is $2.63\times$ faster and requires $8\times$ less GPU memory than the corresponding pure self-attention-based model. Additionally, ViS4mer achieves state-of-the-art results in $6$ out of $9$ long-form movie video classification tasks on the Long Video Understanding (LVU) benchmark. Furthermore, we show that our approach successfully generalizes to other domains, achieving competitive results on the Breakfast and the COIN procedural activity datasets. The code is publicly available at: https://github.com/md-mohaiminul/ViS4mer. △ Less

Submitted 4 January, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: Accepted by ECCV 2022

arXiv:2202.03274 [pdf]

doi 10.1016/j.compbiomed.2022.106060

Human Activity Recognition Using Tools of Convolutional Neural Networks: A State of the Art Review, Data Sets, Challenges and Future Prospects

Authors: Md. Milon Islam, Sheikh Nooruddin, Fakhri Karray, Ghulam Muhammad

Abstract: Human Activity Recognition (HAR) plays a significant role in the everyday life of people because of its ability to learn extensive high-level information about human activity from wearable or stationary devices. A substantial amount of research has been conducted on HAR and numerous approaches based on deep learning and machine learning have been exploited by the research community to classify hum… ▽ More Human Activity Recognition (HAR) plays a significant role in the everyday life of people because of its ability to learn extensive high-level information about human activity from wearable or stationary devices. A substantial amount of research has been conducted on HAR and numerous approaches based on deep learning and machine learning have been exploited by the research community to classify human activities. The main goal of this review is to summarize recent works based on a wide range of deep neural networks architecture, namely convolutional neural networks (CNNs) for human activity recognition. The reviewed systems are clustered into four categories depending on the use of input devices like multimodal sensing devices, smartphones, radar, and vision devices. This review describes the performances, strengths, weaknesses, and the used hyperparameters of CNN architectures for each reviewed system with an overview of available public data sources. In addition, a discussion with the current challenges to CNN-based HAR systems is presented. Finally, this review is concluded with some potential future directions that would be of great assistance for the researchers who would like to contribute to this field. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 32 pages, 4 figures, 4 Tables

Journal ref: Comput. Biol. Med.C149:106060,2022

arXiv:2112.05666 [pdf]

An Ensemble 1D-CNN-LSTM-GRU Model with Data Augmentation for Speech Emotion Recognition

Authors: Md. Rayhan Ahmed, Salekul Islam, Ph. D, A. K. M. Muzahidul Islam, Ph. D, Swakkhar Shatabda, Ph. D

Abstract: In this paper, we propose an ensemble of deep neural networks along with data augmentation (DA) learned using effective speech-based features to recognize emotions from speech. Our ensemble model is built on three deep neural network-based models. These neural networks are built using the basic local feature acquiring blocks (LFAB) which are consecutive layers of dilated 1D Convolutional Neural ne… ▽ More In this paper, we propose an ensemble of deep neural networks along with data augmentation (DA) learned using effective speech-based features to recognize emotions from speech. Our ensemble model is built on three deep neural network-based models. These neural networks are built using the basic local feature acquiring blocks (LFAB) which are consecutive layers of dilated 1D Convolutional Neural networks followed by the max pooling and batch normalization layers. To acquire the long-term dependencies in speech signals further two variants are proposed by adding Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) layers respectively. All three network models have consecutive fully connected layers before the final softmax layer for classification. The ensemble model uses a weighted average to provide the final classification. We have utilized five standard benchmark datasets: TESS, EMO-DB, RAVDESS, SAVEE, and CREMA-D for evaluation. We have performed DA by injecting Additive White Gaussian Noise, pitch shifting, and stretching the signal level to generalize the models, and thus increasing the accuracy of the models and reducing the overfitting as well. We handcrafted five categories of features: Mel-frequency cepstral coefficients, Log Mel-Scaled Spectrogram, Zero-Crossing Rate, Chromagram, and statistical Root Mean Square Energy value from each audio sample. These features are used as the input to the LFAB blocks that further extract the hidden local features which are then fed to either fully connected layers or to LSTM or GRU based on the model type to acquire the additional long-term contextual representations. LFAB followed by GRU or LSTM results in better performance compared to the baseline model. The ensemble model achieves the state-of-the-art weighted average accuracy in all the datasets. △ Less

Submitted 22 November, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

Comments: This paper is currently under revision process at expert systems with applications journal

arXiv:2112.00124 [pdf]

doi 10.1063/5.0092169

CryoCiM: Cryogenic Compute-in-Memory based on the Quantum Anomalous Hall Effect

Authors: Shamiul Alam, Md Mazharul Islam, Md Shafayat Hossain, Akhilesh Jaiswal, Ahmedullah Aziz

Abstract: The scaling of the already-matured CMOS technology is steadily approaching its physical limit, motivating the quest for a suitable alternative. Cryogenic operation offers a promising pathway towards continued improvement in computing speed and energy efficiency without aggressive scaling. However, the memory wall bottleneck of the traditional von-Neumann architecture persists even at cryogenic tem… ▽ More The scaling of the already-matured CMOS technology is steadily approaching its physical limit, motivating the quest for a suitable alternative. Cryogenic operation offers a promising pathway towards continued improvement in computing speed and energy efficiency without aggressive scaling. However, the memory wall bottleneck of the traditional von-Neumann architecture persists even at cryogenic temperature. That is where a compute-in-memory (CiM) architecture, that embeds computing within the memory unit, comes into play. Computations within the memory unit help reduce the expensive data transfer between the memory and the computing units. Therefore, CiM provides extreme energy efficiency that can enable lower cooling cost at cryogenic temperature. In this work, we demonstrate CryoCiM, a cryogenic compute-in-memory framework utilizing a non-volatile memory system based on the quantum anomalous Hall effect (QAHE). Our design can perform memory read/write, and universal binary logic operations (NAND, NOR, and XOR). We design a novel peripheral circuit assembly that can perform the read/write, and single-cycle in-memory logic operations. The utilization of a QAHE-based memory system promises robustness against process variations, through the usage of topologically protected resistive states for data storage. CryoCiM is the first step towards utilizing exclusively cryogenic phenomena to serve the dual purpose of storage and computation with ultra-low power (nano-watts) operations. △ Less

Submitted 21 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

Comments: 13 pages, 6figures

Journal ref: Appl. Phys. Lett. 120, 144102 (2022)

arXiv:2108.12375 [pdf, other]

A Pedestrian Detection and Tracking Framework for Autonomous Cars: Efficient Fusion of Camera and LiDAR Data

Authors: Muhammad Mobaidul Islam, Abdullah Al Redwan Newaz, Ali Karimoddini

Abstract: This paper presents a novel method for pedestrian detection and tracking by fusing camera and LiDAR sensor data. To deal with the challenges associated with the autonomous driving scenarios, an integrated tracking and detection framework is proposed. The detection phase is performed by converting LiDAR streams to computationally tractable depth images, and then, a deep neural network is developed… ▽ More This paper presents a novel method for pedestrian detection and tracking by fusing camera and LiDAR sensor data. To deal with the challenges associated with the autonomous driving scenarios, an integrated tracking and detection framework is proposed. The detection phase is performed by converting LiDAR streams to computationally tractable depth images, and then, a deep neural network is developed to identify pedestrian candidates both in RGB and depth images. To provide accurate information, the detection phase is further enhanced by fusing multi-modal sensor information using the Kalman filter. The tracking phase is a combination of the Kalman filter prediction and an optical flow algorithm to track multiple pedestrians in a scene. We evaluate our framework on a real public driving dataset. Experimental results demonstrate that the proposed method achieves significant performance improvement over a baseline method that solely uses image-based pedestrian detection. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:2108.11604 [pdf]

Identification of the Resting Position Based on EGG, ECG, Respiration Rate and SpO2 Using Stacked Ensemble Learning

Authors: Md. Mohsin Sarker Raihan, Muhammad Muinul Islam, Fariha Fairoz, Abdullah Bin Shams

Abstract: Rest is essential for a high-level physiological and psychological performance. It is also necessary for the muscles to repair, rebuild, and strengthen. There is a significant correlation between the quality of rest and the resting posture. Therefore, identification of the resting position is of paramount importance to maintain a healthy life. Resting postures can be classified into four basic cat… ▽ More Rest is essential for a high-level physiological and psychological performance. It is also necessary for the muscles to repair, rebuild, and strengthen. There is a significant correlation between the quality of rest and the resting posture. Therefore, identification of the resting position is of paramount importance to maintain a healthy life. Resting postures can be classified into four basic categories: Lying on the back (supine), facing of the left / right sides and free-fall position. The later position is already considered to be an unhealthy posture by researchers equivocally and hence can be eliminated. In this paper, we analyzed the other three states of resting position based on the data collected from the physiological parameters: Electrogastrogram (EGG), Electrocardiogram (ECG), Respiration Rate, Heart Rate, and Oxygen Saturation (SpO2). Based on these parameters, the resting position is classified using a hybrid stacked ensemble machine learning model designed using the Decision tree, Random Forest, and Xgboost algorithms. Our study demonstrates a 100% accurate prediction of the resting position using the hybrid model. The proposed method of identifying the resting position based on physiological parameters has the potential to be integrated into wearable devices. This is a low cost, highly accurate and autonomous technique to monitor the body posture while maintaining the user privacy by eliminating the use of RGB camera conventionally used to conduct the polysomnography (sleep Monitoring) or resting position studies. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: Accepted for publication in Lecture Notes on Data Engineering and Communication Technologies, Springer,BIM,2021

arXiv:2107.03924 [pdf]

Smart Healthcare in the Age of AI: Recent Advances, Challenges, and Future Prospects

Authors: Mahmoud Nasr, MD. Milon Islam, Shady Shehata, Fakhri Karray, Yuri Quintana

Abstract: The significant increase in the number of individuals with chronic ailments (including the elderly and disabled) has dictated an urgent need for an innovative model for healthcare systems. The evolved model will be more personalized and less reliant on traditional brick-and-mortar healthcare institutions such as hospitals, nursing homes, and long-term healthcare centers. The smart healthcare syste… ▽ More The significant increase in the number of individuals with chronic ailments (including the elderly and disabled) has dictated an urgent need for an innovative model for healthcare systems. The evolved model will be more personalized and less reliant on traditional brick-and-mortar healthcare institutions such as hospitals, nursing homes, and long-term healthcare centers. The smart healthcare system is a topic of recently growing interest and has become increasingly required due to major developments in modern technologies, especially in artificial intelligence (AI) and machine learning (ML). This paper is aimed to discuss the current state-of-the-art smart healthcare systems highlighting major areas like wearable and smartphone devices for health monitoring, machine learning for disease diagnosis, and the assistive frameworks, including social robots developed for the ambient assisted living environment. Additionally, the paper demonstrates software integration architectures that are very significant to create smart healthcare systems, integrating seamlessly the benefit of data analytics and other tools of AI. The explained developed systems focus on several facets: the contribution of each developed framework, the detailed working procedure, the performance as outcomes, and the comparative merits and limitations. The current research challenges with potential future directions are addressed to highlight the drawbacks of existing systems and the possible methods to introduce novel frameworks, respectively. This review aims at providing comprehensive insights into the recent developments of smart healthcare systems to equip experts to contribute to the field. △ Less

Submitted 24 June, 2021; originally announced July 2021.

arXiv:2105.00314 [pdf, ps, other]

Technical Report: Insider-Resistant Context-Based Pairing for Multimodality Sleep Apnea Test

Authors: Yao Zheng, Shekh Md Mahmudul Islam, Yanjun Pan, Marionne Millan, Samson Aggelopoulos, Brian Lu, Alvin Yang, Thomas Yang, Stephanie Aelmore, Willy Chang, Alana Power, Ming Li, Olga Borić-Lubecke, Victor Lubecke, Wenhai Sun

Abstract: The increasingly sophisticated at-home screening systems for obstructive sleep apnea (OSA), integrated with both contactless and contact-based sensing modalities, bring convenience and reliability to remote chronic disease management. However, the device pairing processes between system components are vulnerable to wireless exploitation from a non-compliant user wishing to manipulate the test resu… ▽ More The increasingly sophisticated at-home screening systems for obstructive sleep apnea (OSA), integrated with both contactless and contact-based sensing modalities, bring convenience and reliability to remote chronic disease management. However, the device pairing processes between system components are vulnerable to wireless exploitation from a non-compliant user wishing to manipulate the test results. This work presents SIENNA, an insider-resistant context-based pairing protocol. SIENNA leverages JADE-ICA to uniquely identify a user's respiration pattern within a multi-person environment and fuzzy commitment for automatic device pairing, while using friendly jamming technique to prevents an insider with knowledge of respiration patterns from acquiring the pairing key. Our analysis and test results show that SIENNA can achieve reliable (> 90% success rate) device pairing under a noisy environment and is robust against the attacker with full knowledge of the context information. △ Less

Submitted 24 May, 2021; v1 submitted 1 May, 2021; originally announced May 2021.

arXiv:2009.05802 [pdf, other]

doi 10.4018/IJGBL.2021040104

Learning Daily Calorie Intake Standard using a Mobile Game

Authors: Anik Das, Sumaiya Amin, Muhammad Ashad Kabir, Md. Sabir Hossain, Mohammad Mainul Islam

Abstract: Mobile games can contribute to learning at greater success. In this paper, we have developed and evaluated a novel educational game, named FoodCalorie, to learn the food calorie intake standards. Our game is aimed to learn the calorie values of various traditional Bangladeshi foods and the calorie intake standard that varies with age and gender. Our study confirms the finding of existing studies t… ▽ More Mobile games can contribute to learning at greater success. In this paper, we have developed and evaluated a novel educational game, named FoodCalorie, to learn the food calorie intake standards. Our game is aimed to learn the calorie values of various traditional Bangladeshi foods and the calorie intake standard that varies with age and gender. Our study confirms the finding of existing studies that game-based learning can enhance the learning experience. △ Less

Submitted 19 November, 2020; v1 submitted 12 September, 2020; originally announced September 2020.

Journal ref: International Journal of Game-Based Learning, 2021

Showing 1–50 of 68 results for author: Islam, M M