Search | arXiv e-print repository

Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras

Authors: Pratik K. Mishra, Irene Ballester, Andrea Iaboni, Bing Ye, Kristine Newman, Alex Mihailidis, Shehroz S. Khan

Abstract: The behavioural and psychological symptoms of dementia, such as agitation and aggression, present a significant health and safety risk in residential care settings. Many care facilities have video cameras in place for digital monitoring of public spaces, which can be leveraged to develop an automated behaviours of risk detection system that can alert the staff to enable timely intervention and pre… ▽ More The behavioural and psychological symptoms of dementia, such as agitation and aggression, present a significant health and safety risk in residential care settings. Many care facilities have video cameras in place for digital monitoring of public spaces, which can be leveraged to develop an automated behaviours of risk detection system that can alert the staff to enable timely intervention and prevent the situation from escalating. However, one of the challenges in our previous study was the presence of false alarms due to obstruction of view by activities happening close to the camera. To address this issue, we proposed a novel depth-weighted loss function to train a customized convolutional autoencoder to enforce equivalent importance to the events happening both near and far from the cameras; thus, helping to reduce false alarms and making the method more suitable for real-world deployment. The proposed method was trained using data from nine participants with dementia across three cameras situated in a specialized dementia unit and achieved an area under the curve of receiver operating characteristic of $0.852$, $0.81$ and $0.768$ for the three cameras. Ablation analysis was conducted for the individual components of the proposed method and the performance of the proposed method was investigated for participant-specific and sex-specific behaviours of risk detection. The proposed method performed reasonably well in detecting behaviours of risk in people with dementia motivating further research toward the development of a behaviours of risk detection system suitable for deployment in video surveillance systems in care facilities. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2406.07710 [pdf, other]

Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas

Authors: SM Shaqib, Alaya Parvin Alo, Shahriar Sultan Ramit, Afraz Ul Haque Rupak, Sadman Sadik Khan, Md. Sadekur Rahman

Abstract: In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in… ▽ More In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in 2023 that 7,902 individuals lost their lives in traffic accidents during the course of the year. Efficient vehicle speed detection is essential to maintaining traffic safety. Reliable speed detection can also help gather important traffic data, which makes it easier to optimize traffic flow and provide safer road infrastructure. The YOLOv8 model can recognize and track cars in videos with greater speed and accuracy when trained under close supervision. By providing insights into the application of supervised learning in object identification for vehicle speed estimation and concentrating on the particular traffic conditions and safety concerns in Bangladesh, this work represents a noteworthy contribution to the area. The MAE was 3.5 and RMSE was 4.22 between the predicted speed of our model and the actual speed or the ground truth measured by the speedometer Promising increased efficiency and wider applicability in a variety of traffic conditions, the suggested solution offers a financially viable substitute for conventional approaches. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2403.17175 [pdf, ps, other]

Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks

Authors: Ali Abedi, Shehroz S. Khan

Abstract: Engagement in virtual learning is crucial for a variety of factors including learner satisfaction, performance, and compliance with learning programs, but measuring it is a challenging task. There is therefore considerable interest in utilizing artificial intelligence and affective computing to measure engagement in natural settings as well as on a large scale. This paper introduces a novel, priva… ▽ More Engagement in virtual learning is crucial for a variety of factors including learner satisfaction, performance, and compliance with learning programs, but measuring it is a challenging task. There is therefore considerable interest in utilizing artificial intelligence and affective computing to measure engagement in natural settings as well as on a large scale. This paper introduces a novel, privacy-preserving method for engagement measurement from videos. It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution. The extracted facial landmarks are fed to a Spatial-Temporal Graph Convolutional Network (ST-GCN) to output the engagement level of the learner in the video. To integrate the ordinal nature of the engagement variable into the training process, ST-GCNs undergo training in a novel ordinal learning framework based on transfer learning. Experimental results on two video student engagement measurement datasets show the superiority of the proposed method compared to previous methods with improved state-of-the-art on the EngageNet dataset with a %3.1 improvement in four-class engagement level classification accuracy and on the Online Student Engagement dataset with a %1.5 improvement in binary engagement classification accuracy. The relatively lightweight ST-GCN and its integration with the real-time MediaPipe deep learning solution make the proposed approach capable of being deployed on virtual learning platforms and measuring engagement in real time. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.02772 [pdf, other]

doi 10.1007/s11517-024-03177-x

Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives

Authors: Mark Karlov, Ali Abedi, Shehroz S. Khan

Abstract: Exercise-based rehabilitation programs have proven to be effective in enhancing the quality of life and reducing mortality and rehospitalization rates. AI-driven virtual rehabilitation, which allows patients to independently complete exercises at home, utilizes AI algorithms to analyze exercise data, providing feedback to patients and updating clinicians on their progress. These programs commonly… ▽ More Exercise-based rehabilitation programs have proven to be effective in enhancing the quality of life and reducing mortality and rehospitalization rates. AI-driven virtual rehabilitation, which allows patients to independently complete exercises at home, utilizes AI algorithms to analyze exercise data, providing feedback to patients and updating clinicians on their progress. These programs commonly prescribe a variety of exercise types, leading to a distinct challenge in rehabilitation exercise assessment datasets: while abundant in overall training samples, these datasets often have a limited number of samples for each individual exercise type. This disparity hampers the ability of existing approaches to train generalizable models with such a small sample size per exercise type. Addressing this issue, this paper introduces a novel supervised contrastive learning framework with hard and soft negative samples that effectively utilizes the entire dataset to train a single model applicable to all exercise types. This model, with a Spatial-Temporal Graph Convolutional Network (ST-GCN) architecture, demonstrated enhanced generalizability across exercises and a decrease in overall complexity. Through extensive experiments on three publicly available rehabilitation exercise assessment datasets, UI-PRMD, IRDS, and KIMORE, our method has proven to surpass existing methods, setting a new benchmark in rehabilitation exercise quality assessment. △ Less

Submitted 9 August, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: 23 pages, 4 figures, 5 tables

Journal ref: Medical & Biological Engineering & Computing Journal, 2024

arXiv:2402.18102 [pdf, other]

Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging

Authors: Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma, Shreyas Singh, Vivek Boominathan, Kaushik Mitra, Ashok Veeraraghavan

Abstract: Passive, compact, single-shot 3D sensing is useful in many application areas such as microscopy, medical imaging, surgical navigation, and autonomous driving where form factor, time, and power constraints can exist. Obtaining RGB-D scene information over a short imaging distance, in an ultra-compact form factor, and in a passive, snapshot manner is challenging. Dual-pixel (DP) sensors are a potent… ▽ More Passive, compact, single-shot 3D sensing is useful in many application areas such as microscopy, medical imaging, surgical navigation, and autonomous driving where form factor, time, and power constraints can exist. Obtaining RGB-D scene information over a short imaging distance, in an ultra-compact form factor, and in a passive, snapshot manner is challenging. Dual-pixel (DP) sensors are a potential solution to achieve the same. DP sensors collect light rays from two different halves of the lens in two interleaved pixel arrays, thus capturing two slightly different views of the scene, like a stereo camera system. However, imaging with a DP sensor implies that the defocus blur size is directly proportional to the disparity seen between the views. This creates a trade-off between disparity estimation vs. deblurring accuracy. To improve this trade-off effect, we propose CADS (Coded Aperture Dual-Pixel Sensing), in which we use a coded aperture in the imaging lens along with a DP sensor. In our approach, we jointly learn an optimal coded pattern and the reconstruction algorithm in an end-to-end optimization setting. Our resulting CADS imaging system demonstrates improvement of >1.5dB PSNR in all-in-focus (AIF) estimates and 5-6% in depth estimation quality over naive DP sensing for a wide range of aperture settings. Furthermore, we build the proposed CADS prototypes for DSLR photography settings and in an endoscope and a dermoscope form factor. Our novel coded dual-pixel sensing approach demonstrates accurate RGB-D reconstruction results in simulations and real-world experiments in a passive, snapshot, and compact manner. △ Less

Submitted 30 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2311.16312 [pdf]

doi 10.1109/icdmw58026.2022.00041

Domain-Specific Deep Learning Feature Extractor for Diabetic Foot Ulcer Detection

Authors: Reza Basiri, Milos R. Popovic, Shehroz S. Khan

Abstract: Diabetic Foot Ulcer (DFU) is a condition requiring constant monitoring and evaluations for treatment. DFU patient population is on the rise and will soon outpace the available health resources. Autonomous monitoring and evaluation of DFU wounds is a much-needed area in health care. In this paper, we evaluate and identify the most accurate feature extractor that is the core basis for developing a d… ▽ More Diabetic Foot Ulcer (DFU) is a condition requiring constant monitoring and evaluations for treatment. DFU patient population is on the rise and will soon outpace the available health resources. Autonomous monitoring and evaluation of DFU wounds is a much-needed area in health care. In this paper, we evaluate and identify the most accurate feature extractor that is the core basis for developing a deep-learning wound detection network. For the evaluation, we used mAP and F1-score on the publicly available DFU2020 dataset. A combination of UNet and EfficientNetb3 feature extractor resulted in the best evaluation among the 14 networks compared. UNet and Efficientnetb3 can be used as the classifier in the development of a comprehensive DFU domain-specific autonomous wound detection pipeline. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 5 pages, 2 figures, 3 tables, 2022 IEEE International Conference on Data Mining Workshops

Journal ref: 2022 IEEE International Conference on Data Mining Workshops. pp. 1-5

arXiv:2311.02863 [pdf, other]

Temporal Shift -- Multi-Objective Loss Function for Improved Anomaly Fall Detection

Authors: Stefan Denkovski, Shehroz S. Khan, Alex Mihailidis

Abstract: Falls are a major cause of injuries and deaths among older adults worldwide. Accurate fall detection can help reduce potential injuries and additional health complications. Different types of video modalities can be used in a home setting to detect falls, including RGB, Infrared, and Thermal cameras. Anomaly detection frameworks using autoencoders and their variants can be used for fall detection… ▽ More Falls are a major cause of injuries and deaths among older adults worldwide. Accurate fall detection can help reduce potential injuries and additional health complications. Different types of video modalities can be used in a home setting to detect falls, including RGB, Infrared, and Thermal cameras. Anomaly detection frameworks using autoencoders and their variants can be used for fall detection due to the data imbalance that arises from the rarity and diversity of falls. However, the use of reconstruction error in autoencoders can limit the application of networks' structures that propagate information. In this paper, we propose a new multi-objective loss function called Temporal Shift, which aims to predict both future and reconstructed frames within a window of sequential frames. The proposed loss function is evaluated on a semi-naturalistic fall detection dataset containing multiple camera modalities. The autoencoders were trained on normal activities of daily living (ADL) performed by older adults and tested on ADLs and falls performed by young adults. Temporal shift shows significant improvement to a baseline 3D Convolutional autoencoder, an attention U-Net CAE, and a multi-modal neural network. The greatest improvement was observed in an attention U-Net model improving by 0.20 AUC ROC for a single camera when compared to reconstruction alone. With significant improvement across different models, this approach has the potential to be widely adopted and improve anomaly detection capabilities in other settings besides fall detection. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.20140 [pdf]

Synthesizing Diabetic Foot Ulcer Images with Diffusion Model

Authors: Reza Basiri, Karim Manji, Francois Harton, Alisha Poonja, Milos R. Popovic, Shehroz S. Khan

Abstract: Diabetic Foot Ulcer (DFU) is a serious skin wound requiring specialized care. However, real DFU datasets are limited, hindering clinical training and research activities. In recent years, generative adversarial networks and diffusion models have emerged as powerful tools for generating synthetic images with remarkable realism and diversity in many applications. This paper explores the potential of… ▽ More Diabetic Foot Ulcer (DFU) is a serious skin wound requiring specialized care. However, real DFU datasets are limited, hindering clinical training and research activities. In recent years, generative adversarial networks and diffusion models have emerged as powerful tools for generating synthetic images with remarkable realism and diversity in many applications. This paper explores the potential of diffusion models for synthesizing DFU images and evaluates their authenticity through expert clinician assessments. Additionally, evaluation metrics such as Frechet Inception Distance (FID) and Kernel Inception Distance (KID) are examined to assess the quality of the synthetic DFU images. A dataset of 2,000 DFU images is used for training the diffusion model, and the synthetic images are generated by applying diffusion processes. The results indicate that the diffusion model successfully synthesizes visually indistinguishable DFU images. 70% of the time, clinicians marked synthetic DFU images as real DFUs. However, clinicians demonstrate higher unanimous confidence in rating real images than synthetic ones. The study also reveals that FID and KID metrics do not significantly align with clinicians' assessments, suggesting alternative evaluation approaches are needed. The findings highlight the potential of diffusion models for generating synthetic DFU images and their impact on medical training programs and research in wound detection and classification. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 8 pages, 3 figures, 6th Workshop on AI for Aging, Rehabilitation and Intelligent Assisted Living at European Conference on Machine Learning, Italy, 2023

arXiv:2310.19925 [pdf, ps, other]

OpenRAND: A Performance Portable, Reproducible Random Number Generation Library for Parallel Computations

Authors: Shihab Shahriar Khan, Bryce Palmer, Christopher Edelmaierd, Hasan Metin Aktulga

Abstract: We introduce OpenRAND, a C++17 library aimed at facilitating reproducible scientific research through the generation of statistically robust and yet replicable random numbers. OpenRAND accommodates single and multi-threaded applications on CPUs and GPUs and offers a simplified, user-friendly API that complies with the C++ standard's random number engine interface. It is portable: it functions seam… ▽ More We introduce OpenRAND, a C++17 library aimed at facilitating reproducible scientific research through the generation of statistically robust and yet replicable random numbers. OpenRAND accommodates single and multi-threaded applications on CPUs and GPUs and offers a simplified, user-friendly API that complies with the C++ standard's random number engine interface. It is portable: it functions seamlessly as a lightweight, header-only library, making it adaptable to a wide spectrum of software and hardware platforms. It is statistically robust: a suite of built-in tests ensures no pattern exists within single or multiple streams. Despite the simplicity and portability, it is remarkably performant-matching and sometimes even outperforming native libraries by a significant margin. Our tests, including a Brownian walk simulation, affirm its reproducibility and highlight its computational efficiency, outperforming CUDA's cuRAND by up to 1.8 times. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Paper currently under review in Softwarex journal

arXiv:2306.09546 [pdf, other]

Cross-Modal Video to Body-joints Augmentation for Rehabilitation Exercise Quality Assessment

Authors: Ali Abedi, Mobin Malmirian, Shehroz S. Khan

Abstract: Exercise-based rehabilitation programs have been shown to enhance quality of life and reduce mortality and rehospitalizations. AI-driven virtual rehabilitation programs enable patients to complete exercises independently at home while AI algorithms can analyze exercise data to provide feedback to patients and report their progress to clinicians. This paper introduces a novel approach to assessing… ▽ More Exercise-based rehabilitation programs have been shown to enhance quality of life and reduce mortality and rehospitalizations. AI-driven virtual rehabilitation programs enable patients to complete exercises independently at home while AI algorithms can analyze exercise data to provide feedback to patients and report their progress to clinicians. This paper introduces a novel approach to assessing the quality of rehabilitation exercises using RGB video. Sequences of skeletal body joints are extracted from consecutive RGB video frames and analyzed by many-to-one sequential neural networks to evaluate exercise quality. Existing datasets for exercise rehabilitation lack adequate samples for training deep sequential neural networks to generalize effectively. A cross-modal data augmentation approach is proposed to resolve this problem. Visual augmentation techniques are applied to video data, and body joints extracted from the resulting augmented videos are used for training sequential neural networks. Extensive experiments conducted on the KInematic assessment of MOvement and clinical scores for remote monitoring of physical REhabilitation (KIMORE) dataset, demonstrate the superiority of the proposed method over previous baseline approaches. The ablation study highlights a significant enhancement in exercise quality assessment following cross-modal augmentation. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2304.14922 [pdf, other]

Supervised and Unsupervised Deep Learning Approaches for EEG Seizure Prediction

Authors: Zakary Georgis-Yap, Milos R. Popovic, Shehroz S. Khan

Abstract: Epilepsy affects more than 50 million people worldwide, making it one of the world's most prevalent neurological diseases. The main symptom of epilepsy is seizures, which occur abruptly and can cause serious injury or death. The ability to predict the occurrence of an epileptic seizure could alleviate many risks and stresses people with epilepsy face. We formulate the problem of detecting preictal… ▽ More Epilepsy affects more than 50 million people worldwide, making it one of the world's most prevalent neurological diseases. The main symptom of epilepsy is seizures, which occur abruptly and can cause serious injury or death. The ability to predict the occurrence of an epileptic seizure could alleviate many risks and stresses people with epilepsy face. We formulate the problem of detecting preictal (or pre-seizure) with reference to normal EEG as a precursor to incoming seizure. To this end, we developed several supervised deep learning approaches to identify preictal EEG from normal EEG. We further develop novel unsupervised deep learning approaches to train the models on only normal EEG, and detecting pre-seizure EEG as an anomalous event. These deep learning models were trained and evaluated on two large EEG seizure datasets in a person-specific manner. We found that both supervised and unsupervised approaches are feasible; however, their performance varies depending on the patient, approach and architecture. This new line of research has the potential to develop therapeutic interventions and save human lives. △ Less

Submitted 3 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 16 figures, 9 tables

Journal ref: Journal of Health Informatics Research, 2024

arXiv:2304.09735 [pdf, other]

Rehabilitation Exercise Repetition Segmentation and Counting using Skeletal Body Joints

Authors: Ali Abedi, Paritosh Bisht, Riddhi Chatterjee, Rachit Agrawal, Vyom Sharma, Dinesh Babu Jayagopi, Shehroz S. Khan

Abstract: Physical exercise is an essential component of rehabilitation programs that improve quality of life and reduce mortality and re-hospitalization rates. In AI-driven virtual rehabilitation programs, patients complete their exercises independently at home, while AI algorithms analyze the exercise data to provide feedback to patients and report their progress to clinicians. To analyze exercise data, t… ▽ More Physical exercise is an essential component of rehabilitation programs that improve quality of life and reduce mortality and re-hospitalization rates. In AI-driven virtual rehabilitation programs, patients complete their exercises independently at home, while AI algorithms analyze the exercise data to provide feedback to patients and report their progress to clinicians. To analyze exercise data, the first step is to segment it into consecutive repetitions. There has been a significant amount of research performed on segmenting and counting the repetitive activities of healthy individuals using raw video data, which raises concerns regarding privacy and is computationally intensive. Previous research on patients' rehabilitation exercise segmentation relied on data collected by multiple wearable sensors, which are difficult to use at home by rehabilitation patients. Compared to healthy individuals, segmenting and counting exercise repetitions in patients is more challenging because of the irregular repetition duration and the variation between repetitions. This paper presents a novel approach for segmenting and counting the repetitions of rehabilitation exercises performed by patients, based on their skeletal body joints. Skeletal body joints can be acquired through depth cameras or computer vision techniques applied to RGB videos of patients. Various sequential neural networks are designed to analyze the sequences of skeletal body joints and perform repetition segmentation and counting. Extensive experiments on three publicly available rehabilitation exercise datasets, KIMORE, UI-PRMD, and IntelliRehabDS, demonstrate the superiority of the proposed method compared to previous methods. The proposed method enables accurate exercise analysis while preserving privacy, facilitating the effective delivery of virtual rehabilitation programs. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 8 pages, 1 figure, 2 tables

arXiv:2302.11027 [pdf]

Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms

Authors: Labib Ahmed Siddique, Rabita Junhai, Tanzim Reza, Salman Sayeed Khan, Tanvir Rahman

Abstract: Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us autom… ▽ More Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm ,76.76% on CNN-Transformer and 80% on C3D. △ Less

Submitted 21 February, 2023; originally announced February 2023.

arXiv:2302.03224 [pdf, other]

Undersampling and Cumulative Class Re-decision Methods to Improve Detection of Agitation in People with Dementia

Authors: Zhidong Meng, Andrea Iaboni, Bing Ye, Kristine Newman, Alex Mihailidis, Zhihong Deng, Shehroz S. Khan

Abstract: Agitation is one of the most prevalent symptoms in people with dementia (PwD) that can place themselves and the caregiver's safety at risk. Developing objective agitation detection approaches is important to support health and safety of PwD living in a residential setting. In a previous study, we collected multimodal wearable sensor data from 17 participants for 600 days and developed machine lear… ▽ More Agitation is one of the most prevalent symptoms in people with dementia (PwD) that can place themselves and the caregiver's safety at risk. Developing objective agitation detection approaches is important to support health and safety of PwD living in a residential setting. In a previous study, we collected multimodal wearable sensor data from 17 participants for 600 days and developed machine learning models for detecting agitation in one-minute windows. However, there are significant limitations in the dataset, such as imbalance problem and potential imprecise labelsas the occurrence of agitation is much rarer in comparison to the normal behaviours. In this paper, we first implemented different undersampling methods to eliminate the imbalance problem, and came to the conclusion that only 20% of normal behaviour data were adequate to train a competitive agitation detection model. Then, we designed a weighted undersampling method to evaluate the manual labeling mechanism given the ambiguous time interval assumption. After that, the postprocessing method of cumulative class re-decision (CCR) was proposed based on the historical sequential information and continuity characteristic of agitation, improving the decision-making performance for the potential application of agitation detection system. The results showed that a combination of undersampling and CCR improved F1-score and other metrics to varying degrees with less training time and data. △ Less

Submitted 15 August, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: 19 pages, 6 figures

arXiv:2301.06730 [pdf, other]

Bag of States: A Non-sequential Approach to Video-based Engagement Measurement

Authors: Ali Abedi, Chinchu Thomas, Dinesh Babu Jayagopi, Shehroz S. Khan

Abstract: Automatic measurement of student engagement provides helpful information for instructors to meet learning program objectives and individualize program delivery. Students' behavioral and emotional states need to be analyzed at fine-grained time scales in order to measure their level of engagement. Many existing approaches have developed sequential and spatiotemporal models, such as recurrent neural… ▽ More Automatic measurement of student engagement provides helpful information for instructors to meet learning program objectives and individualize program delivery. Students' behavioral and emotional states need to be analyzed at fine-grained time scales in order to measure their level of engagement. Many existing approaches have developed sequential and spatiotemporal models, such as recurrent neural networks, temporal convolutional networks, and three-dimensional convolutional neural networks, for measuring student engagement from videos. These models are trained to incorporate the order of behavioral and emotional states of students into video analysis and output their level of engagement. In this paper, backed by educational psychology, we question the necessity of modeling the order of behavioral and emotional states of students in measuring their engagement. We develop bag-of-words-based models in which only the occurrence of behavioral and emotional states of students is modeled and analyzed and not the order in which they occur. Behavioral and affective features are extracted from videos and analyzed by the proposed models to determine the level of engagement in an ordinal-output classification setting. Compared to the existing sequential and spatiotemporal approaches for engagement measurement, the proposed non-sequential approach improves the state-of-the-art results. According to experimental results, our method significantly improved engagement level classification accuracy on the IIITB Online SE dataset by 26% compared to sequential models and achieved engagement level classification accuracy as high as 66.58% on the DAiSEE student engagement dataset. △ Less

Submitted 17 January, 2023; originally announced January 2023.

arXiv:2301.00114 [pdf, other]

Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions

Authors: Pratik K. Mishra, Alex Mihailidis, Shehroz S. Khan

Abstract: The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the… ▽ More The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them. △ Less

Submitted 17 January, 2024; v1 submitted 30 December, 2022; originally announced January 2023.

Comments: This work has been accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

arXiv:2212.10682 [pdf, other]

Privacy-Protecting Behaviours of Risk Detection in People with Dementia using Videos

Authors: Pratik K. Mishra, Andrea Iaboni, Bing Ye, Kristine Newman, Alex Mihailidis, Shehroz S. Khan

Abstract: People living with dementia often exhibit behavioural and psychological symptoms of dementia that can put their and others' safety at risk. Existing video surveillance systems in long-term care facilities can be used to monitor such behaviours of risk to alert the staff to prevent potential injuries or death in some cases. However, these behaviours of risk events are heterogeneous and infrequent i… ▽ More People living with dementia often exhibit behavioural and psychological symptoms of dementia that can put their and others' safety at risk. Existing video surveillance systems in long-term care facilities can be used to monitor such behaviours of risk to alert the staff to prevent potential injuries or death in some cases. However, these behaviours of risk events are heterogeneous and infrequent in comparison to normal events. Moreover, analyzing raw videos can also raise privacy concerns. In this paper, we present two novel privacy-protecting video-based anomaly detection approaches to detect behaviours of risks in people with dementia. We either extracted body pose information as skeletons or used semantic segmentation masks to replace multiple humans in the scene with their semantic boundaries. Our work differs from most existing approaches for video anomaly detection that focus on appearance-based features, which can put the privacy of a person at risk and is also susceptible to pixel-based noise, including illumination and viewing direction. We used anonymized videos of normal activities to train customized spatio-temporal convolutional autoencoders and identify behaviours of risk as anomalies. We showed our results on a real-world study conducted in a dementia care unit with patients with dementia, containing approximately 21 hours of normal activities data for training and 9 hours of data containing normal and behaviours of risk events for testing. We compared our approaches with the original RGB videos and obtained a similar area under the receiver operating characteristic curve performance of 0.807 for the skeleton-based approach and 0.823 for the segmentation mask-based approach. △ Less

Submitted 17 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2211.13114 [pdf, other]

Step Counting with Attention-based LSTM

Authors: Shehroz S. Khan, Ali Abedi

Abstract: Physical activity is recognized as an essential component of overall health. One measure of physical activity, the step count, is well known as a predictor of long-term morbidity and mortality. Step Counting (SC) is the automated counting of the number of steps an individual takes over a specified period of time and space. Due to the ubiquity of smartphones and smartwatches, most current SC approa… ▽ More Physical activity is recognized as an essential component of overall health. One measure of physical activity, the step count, is well known as a predictor of long-term morbidity and mortality. Step Counting (SC) is the automated counting of the number of steps an individual takes over a specified period of time and space. Due to the ubiquity of smartphones and smartwatches, most current SC approaches rely on the built-in accelerometer sensors on these devices. The sensor signals are analyzed as multivariate time series, and the number of steps is calculated through a variety of approaches, such as time-domain, frequency-domain, machine-learning, and deep-learning approaches. Most of the existing approaches rely on dividing the input signal into windows, detecting steps in each window, and summing the detected steps. However, these approaches require the determination of multiple parameters, including the window size. Furthermore, most of the existing deep-learning SC approaches require ground-truth labels for every single step, which can be arduous and time-consuming to annotate. To circumvent these requirements, we present a novel SC approach utilizing many-to-one attention-based LSTM. With the proposed LSTM network, SC is solved as a regression problem, taking the entire sensor signal as input and the step count as the output. The analysis shows that the attention-based LSTM automatically learned the pattern of steps even in the absence of ground-truth labels. The experimental results on three publicly available SC datasets demonstrate that the proposed method successfully counts the number of steps with low values of mean absolute error and high values of SC accuracy. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Report number: EFI-94-11

arXiv:2211.06870 [pdf, other]

Detecting Disengagement in Virtual Learning as an Anomaly using Temporal Convolutional Network Autoencoder

Authors: Ali Abedi, Shehroz S. Khan

Abstract: Student engagement is an important factor in meeting the goals of virtual learning programs. Automatic measurement of student engagement provides helpful information for instructors to meet learning program objectives and individualize program delivery. Many existing approaches solve video-based engagement measurement using the traditional frameworks of binary classification (classifying video sni… ▽ More Student engagement is an important factor in meeting the goals of virtual learning programs. Automatic measurement of student engagement provides helpful information for instructors to meet learning program objectives and individualize program delivery. Many existing approaches solve video-based engagement measurement using the traditional frameworks of binary classification (classifying video snippets into engaged or disengaged classes), multi-class classification (classifying video snippets into multiple classes corresponding to different levels of engagement), or regression (estimating a continuous value corresponding to the level of engagement). However, we observe that while the engagement behaviour is mostly well-defined (e.g., focused, not distracted), disengagement can be expressed in various ways. In addition, in some cases, the data for disengaged classes may not be sufficient to train generalizable binary or multi-class classifiers. To handle this situation, in this paper, for the first time, we formulate detecting disengagement in virtual learning as an anomaly detection problem. We design various autoencoders, including temporal convolutional network autoencoder, long-short-term memory autoencoder, and feedforward autoencoder using different behavioral and affect features for video-based student disengagement detection. The result of our experiments on two publicly available student engagement datasets, DAiSEE and EmotiW, shows the superiority of the proposed approach for disengagement detection as an anomaly compared to binary classifiers for classifying videos into engaged versus disengaged classes (with an average improvement of 9% on the area under the curve of the receiver operating characteristic curve and 22% on the area under the curve of the precision-recall curve). △ Less

Submitted 4 February, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

arXiv:2211.03615 [pdf, other]

MAISON -- Multimodal AI-based Sensor platform for Older Individuals

Authors: Ali Abedi, Faranak Dayyani, Charlene Chu, Shehroz S. Khan

Abstract: There is a global aging population requiring the need for the right tools that can enable older adults' greater independence and the ability to age at home, as well as assist healthcare workers. It is feasible to achieve this objective by building predictive models that assist healthcare workers in monitoring and analyzing older adults' behavioral, functional, and psychological data. To develop su… ▽ More There is a global aging population requiring the need for the right tools that can enable older adults' greater independence and the ability to age at home, as well as assist healthcare workers. It is feasible to achieve this objective by building predictive models that assist healthcare workers in monitoring and analyzing older adults' behavioral, functional, and psychological data. To develop such models, a large amount of multimodal sensor data is typically required. In this paper, we propose MAISON, a scalable cloud-based platform of commercially available smart devices capable of collecting desired multimodal sensor data from older adults and patients living in their own homes. The MAISON platform is novel due to its ability to collect a greater variety of data modalities than the existing platforms, as well as its new features that result in seamless data collection and ease of use for older adults who may not be digitally literate. We demonstrated the feasibility of the MAISON platform with two older adults discharged home from a large rehabilitation center. The results indicate that the MAISON platform was able to collect and store sensor data in a cloud without functional glitches or performance degradation. This paper will also discuss the challenges faced during the development of the platform and data collection in the homes of older adults. MAISON is a novel platform designed to collect multimodal data and facilitate the development of predictive models for detecting key health indicators, including social isolation, depression, and functional decline, and is feasible to use with older adults in the community. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2208.06827 [pdf]

BDSL 49: A Comprehensive Dataset of Bangla Sign Language

Authors: Ayman Hasib, Saqib Sizan Khan, Jannatul Ferdous Eva, Mst. Nipa Khatun, Ashraful Haque, Nishat Shahrin, Rashik Rahman, Hasan Murad, Md. Rajibul Islam, Molla Rashied Hussein

Abstract: Language is a method by which individuals express their thoughts. Each language has its own set of alphabetic and numeric characters. People can communicate with one another through either oral or written communication. However, each language has a sign language counterpart. Individuals who are deaf and/or mute communicate through sign language. The Bangla language also has a sign language, which… ▽ More Language is a method by which individuals express their thoughts. Each language has its own set of alphabetic and numeric characters. People can communicate with one another through either oral or written communication. However, each language has a sign language counterpart. Individuals who are deaf and/or mute communicate through sign language. The Bangla language also has a sign language, which is called BDSL. The dataset is about Bangla hand sign images. The collection contains 49 individual Bangla alphabet images in sign language. BDSL49 is a dataset that consists of 29,490 images with 49 labels. Images of 14 different adult individuals, each with a distinct background and appearance, have been recorded during data collection. Several strategies have been used to eliminate noise from datasets during preparation. This dataset is available to researchers for free. They can develop automated systems using machine learning, computer vision, and deep learning techniques. In addition, two models were used in this dataset. The first is for detection, while the second is for recognition. △ Less

Submitted 14 August, 2022; originally announced August 2022.

Comments: 16 pages; 6 figures; Submitted to Data in Brief, a multidisciplinary, open-access and peer-reviewed journal for reviewing

arXiv:2208.04548 [pdf, other]

Inconsistencies in the Definition and Annotation of Student Engagement in Virtual Learning Datasets: A Critical Review

Authors: Shehroz S. Khan, Ali Abedi, Tracey Colella

Abstract: Background: Student engagement (SE) in virtual learning can have a major impact on meeting learning objectives and program dropout risks. Developing Artificial Intelligence (AI) models for automatic SE measurement requires annotated datasets. However, existing SE datasets suffer from inconsistent definitions and annotation protocols mostly unaligned with the definition of SE in educational psychol… ▽ More Background: Student engagement (SE) in virtual learning can have a major impact on meeting learning objectives and program dropout risks. Developing Artificial Intelligence (AI) models for automatic SE measurement requires annotated datasets. However, existing SE datasets suffer from inconsistent definitions and annotation protocols mostly unaligned with the definition of SE in educational psychology. This issue could be misleading in developing generalizable AI models and make it hard to compare the performance of these models developed on different datasets. The objective of this critical review was to explore the existing SE datasets and highlight inconsistencies in terms of differing engagement definitions and annotation protocols. Methods: Several academic databases were searched for publications introducing new SE datasets. The datasets containing students' single- or multi-modal data in online or offline computer-based virtual learning sessions were included. The definition and annotation of SE in the existing datasets were analyzed based on our defined seven dimensions of engagement annotation: sources, data modalities, timing, temporal resolution, level of abstraction, combination, and quantification. Results: Thirty SE measurement datasets met the inclusion criteria. The reviewed SE datasets used very diverse and inconsistent definitions and annotation protocols. Unexpectedly, very few of the reviewed datasets used existing psychometrically validated scales in their definition of SE. Discussion: The inconsistent definition and annotation of SE are problematic for research on developing comparable AI models for automatic SE measurement. Some of the existing SE definitions and protocols in settings other than virtual learning that have the potential to be used in virtual learning are introduced. △ Less

Submitted 16 January, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

arXiv:2208.04283 [pdf, ps, other]

LWGNet: Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval

Authors: Atreyee Saha, Salman S Khan, Sagar Sehrawat, Sanjana S Prabhu, Shanti Bhattacharya, Kaushik Mitra

Abstract: Fourier Ptychographic Microscopy (FPM) is an imaging procedure that overcomes the traditional limit on Space-Bandwidth Product (SBP) of conventional microscopes through computational means. It utilizes multiple images captured using a low numerical aperture (NA) objective and enables high-resolution phase imaging through frequency domain stitching. Existing FPM reconstruction methods can be broadl… ▽ More Fourier Ptychographic Microscopy (FPM) is an imaging procedure that overcomes the traditional limit on Space-Bandwidth Product (SBP) of conventional microscopes through computational means. It utilizes multiple images captured using a low numerical aperture (NA) objective and enables high-resolution phase imaging through frequency domain stitching. Existing FPM reconstruction methods can be broadly categorized into two approaches: iterative optimization based methods, which are based on the physics of the forward imaging model, and data-driven methods which commonly employ a feed-forward deep learning framework. We propose a hybrid model-driven residual network that combines the knowledge of the forward imaging system with a deep data-driven network. Our proposed architecture, LWGNet, unrolls traditional Wirtinger flow optimization algorithm into a novel neural network design that enhances the gradient images through complex convolutional blocks. Unlike other conventional unrolling techniques, LWGNet uses fewer stages while performing at par or even better than existing traditional and deep learning techniques, particularly, for low-cost and low dynamic range CMOS sensors. This improvement in performance for low-bit depth and low-cost sensors has the potential to bring down the cost of FPM imaging setup significantly. Finally, we show consistently improved performance on our collected real data. △ Less

Submitted 16 August, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2206.12740 [pdf, other]

Multi Visual Modality Fall Detection Dataset

Authors: Stefan Denkovski, Shehroz S. Khan, Brandon Malamis, Sae Young Moon, Bing Ye, Alex Mihailidis

Abstract: Falls are one of the leading cause of injury-related deaths among the elderly worldwide. Effective detection of falls can reduce the risk of complications and injuries. Fall detection can be performed using wearable devices or ambient sensors; these methods may struggle with user compliance issues or false alarms. Video cameras provide a passive alternative; however, regular RGB cameras are impact… ▽ More Falls are one of the leading cause of injury-related deaths among the elderly worldwide. Effective detection of falls can reduce the risk of complications and injuries. Fall detection can be performed using wearable devices or ambient sensors; these methods may struggle with user compliance issues or false alarms. Video cameras provide a passive alternative; however, regular RGB cameras are impacted by changing lighting conditions and privacy concerns. From a machine learning perspective, developing an effective fall detection system is challenging because of the rarity and variability of falls. Many existing fall detection datasets lack important real-world considerations, such as varied lighting, continuous activities of daily living (ADLs), and camera placement. The lack of these considerations makes it difficult to develop predictive models that can operate effectively in the real world. To address these limitations, we introduce a novel multi-modality dataset (MUVIM) that contains four visual modalities: infra-red, depth, RGB and thermal cameras. These modalities offer benefits such as obfuscated facial features and improved performance in low-light conditions. We formulated fall detection as an anomaly detection problem, in which a customized spatio-temporal convolutional autoencoder was trained only on ADLs so that a fall would increase the reconstruction error. Our results showed that infra-red cameras provided the highest level of performance (AUC ROC=0.94), followed by thermal (AUC ROC=0.87), depth (AUC ROC=0.86) and RGB (AUC ROC=0.83). This research provides a unique opportunity to analyze the utility of camera modalities in detecting falls in a home setting while balancing performance, passiveness, and privacy. △ Less

Submitted 25 June, 2022; originally announced June 2022.

arXiv:2109.04021 [pdf, other]

Supervised Contrastive Learning for Detecting Anomalous Driving Behaviours from Multimodal Videos

Authors: Shehroz S. Khan, Ziting Shen, Haoying Sun, Ax Patel, Ali Abedi

Abstract: Distracted driving is one of the major reasons for vehicle accidents. Therefore, detecting distracted driving behaviors is of paramount importance to reduce the millions of deaths and injuries occurring worldwide. Distracted or anomalous driving behaviors are deviations from 'normal' driving that need to be identified correctly to alert the driver. However, these driving behaviors do not comprise… ▽ More Distracted driving is one of the major reasons for vehicle accidents. Therefore, detecting distracted driving behaviors is of paramount importance to reduce the millions of deaths and injuries occurring worldwide. Distracted or anomalous driving behaviors are deviations from 'normal' driving that need to be identified correctly to alert the driver. However, these driving behaviors do not comprise one specific type of driving style and their distribution can be different during the training and test phases of a classifier. We formulate this problem as a supervised contrastive learning approach to learn a visual representation to detect normal, and seen and unseen anomalous driving behaviors. We made a change to the standard contrastive loss function to adjust the similarity of negative pairs to aid the optimization. Normally, in a (self) supervised contrastive framework, the projection head layers are omitted during the test phase as the encoding layers are considered to contain general visual representative information. However, we assert that for a video-based supervised contrastive learning task, including a projection head can be beneficial. We showed our results on a driver anomaly detection dataset that contains 783 minutes of video recordings of normal and anomalous driving behaviors of 31 drivers from the various top and front cameras (both depth and infrared). Out of 9 video modalities combinations, our proposed contrastive approach improved the ROC AUC on 6 in comparison to the baseline models (from 4.23% to 8.91% for different modalities). We performed statistical tests that showed evidence that our proposed method performs better than the baseline contrastive learning setup. Finally, the results showed that the fusion of depth and infrared modalities from the top and front views achieved the best AUC ROC of 0.9738 and AUC PR of 0.9772. △ Less

Submitted 29 April, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 8 pages, 2 figures, 5 tables

arXiv:2104.10122 [pdf, other]

Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network

Authors: Ali Abedi, Shehroz S. Khan

Abstract: Automatic detection of students' engagement in online learning settings is a key element to improve the quality of learning and to deliver personalized learning materials to them. Varying levels of engagement exhibited by students in an online classroom is an affective behavior that takes place over space and time. Therefore, we formulate detecting levels of students' engagement from videos as a s… ▽ More Automatic detection of students' engagement in online learning settings is a key element to improve the quality of learning and to deliver personalized learning materials to them. Varying levels of engagement exhibited by students in an online classroom is an affective behavior that takes place over space and time. Therefore, we formulate detecting levels of students' engagement from videos as a spatio-temporal classification problem. In this paper, we present a novel end-to-end Residual Network (ResNet) and Temporal Convolutional Network (TCN) hybrid neural network architecture for students' engagement level detection in videos. The 2D ResNet extracts spatial features from consecutive video frames, and the TCN analyzes the temporal changes in video frames to detect the level of engagement. The spatial and temporal arms of the hybrid network are jointly trained on raw video frames of a large publicly available students' engagement detection dataset, DAiSEE. We compared our method with several competing students' engagement detection methods on this dataset. The ResNet+TCN architecture outperforms all other studied methods, improves the state-of-the-art engagement level detection accuracy, and sets a new baseline for future research. △ Less

Submitted 16 October, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 7 pages, 3 figures, 1 table

arXiv:2104.09305 [pdf, other]

Tracking agitation in people living with dementia in a care environment

Authors: Shehroz S. Khan, Thaejaesh Sooriyakumaran, Katherine Rich, Sofija Spasojevic, Bing Ye, Kristine Newman, Andrea Iaboni, Alex Mihailidis

Abstract: Agitation is a symptom that communicates distress in people living with dementia (PwD), and that can place them and others at risk. In a long term care (LTC) environment, care staff track and document these symptoms as a way to detect when there has been a change in resident status to assess risk, and to monitor for response to interventions. However, this documentation can be time-consuming, and… ▽ More Agitation is a symptom that communicates distress in people living with dementia (PwD), and that can place them and others at risk. In a long term care (LTC) environment, care staff track and document these symptoms as a way to detect when there has been a change in resident status to assess risk, and to monitor for response to interventions. However, this documentation can be time-consuming, and due to staffing constraints, episodes of agitation may go unobserved. This brings into question the reliability of these assessments, and presents an opportunity for technology to help track and monitor behavioural symptoms in dementia. In this paper, we present the outcomes of a 2 year real-world study performed in a dementia unit, where a multi-modal wearable device was worn by $20$ PwD. In line with a commonly used clinical documentation tool, this large multi-modal time-series data was analyzed to track the presence of episodes of agitation in 8-hour nursing shifts. The development of a baseline classification model (AUC=0.717) on this dataset and subsequent improvement (AUC= 0.779) lays the groundwork for automating the process of annotating agitation events in nursing charts. △ Less

Submitted 25 April, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: 12 pages, 9 figures, 2 Tables

arXiv:2011.03180 [pdf, other]

FedSL: Federated Split Learning on Distributed Sequential Data in Recurrent Neural Networks

Authors: Ali Abedi, Shehroz S. Khan

Abstract: Federated Learning (FL) and Split Learning (SL) are privacy-preserving Machine-Learning (ML) techniques that enable training ML models over data distributed among clients without requiring direct access to their raw data. Existing FL and SL approaches work on horizontally or vertically partitioned data and cannot handle sequentially partitioned data where segments of multiple-segment sequential da… ▽ More Federated Learning (FL) and Split Learning (SL) are privacy-preserving Machine-Learning (ML) techniques that enable training ML models over data distributed among clients without requiring direct access to their raw data. Existing FL and SL approaches work on horizontally or vertically partitioned data and cannot handle sequentially partitioned data where segments of multiple-segment sequential data are distributed across clients. In this paper, we propose a novel federated split learning framework, FedSL, to train models on distributed sequential data. The most common ML models to train on sequential data are Recurrent Neural Networks (RNNs). Since the proposed framework is privacy-preserving, segments of multiple-segment sequential data cannot be shared between clients or between clients and server. To circumvent this limitation, we propose a novel SL approach tailored for RNNs. A RNN is split into sub-networks, and each sub-network is trained on one client containing single segments of multiple-segment training sequences. During local training, the sub-networks on different clients communicate with each other to capture latent dependencies between consecutive segments of multiple-segment sequential data on different clients, but without sharing raw data or complete model parameters. After training local sub-networks with local sequential data segments, all clients send their sub-networks to a federated server where sub-networks are aggregated to generate a global model. The experimental results on simulated and real-world datasets demonstrate that the proposed method successfully trains models on distributed sequential data, while preserving privacy, and outperforms previous FL and centralized learning approaches in terms of achieving higher accuracy in fewer communication rounds. △ Less

Submitted 6 November, 2022; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2010.15440 [pdf, other]

doi 10.1109/TPAMI.2020.3033882

FlatNet: Towards Photorealistic Scene Reconstruction from Lensless Measurements

Authors: Salman S. Khan, Varun Sundar, Vivek Boominathan, Ashok Veeraraghavan, Kaushik Mitra

Abstract: Lensless imaging has emerged as a potential solution towards realizing ultra-miniature cameras by eschewing the bulky lens in a traditional camera. Without a focusing lens, the lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, the current iterative-optimization-based reconstruction algorithms produce noisier and perceptually poorer imag… ▽ More Lensless imaging has emerged as a potential solution towards realizing ultra-miniature cameras by eschewing the bulky lens in a traditional camera. Without a focusing lens, the lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, the current iterative-optimization-based reconstruction algorithms produce noisier and perceptually poorer images. In this work, we propose a non-iterative deep learning based reconstruction approach that results in orders of magnitude improvement in image quality for lensless reconstructions. Our approach, called $\textit{FlatNet}$, lays down a framework for reconstructing high-quality photorealistic images from mask-based lensless cameras, where the camera's forward model formulation is known. FlatNet consists of two stages: (1) an inversion stage that maps the measurement into a space of intermediate reconstruction by learning parameters within the forward model formulation, and (2) a perceptual enhancement stage that improves the perceptual quality of this intermediate reconstruction. These stages are trained together in an end-to-end manner. We show high-quality reconstructions by performing extensive experiments on real and challenging scenes using two different types of lensless prototypes: one which uses a separable forward model and another, which uses a more general non-separable cropped-convolution model. Our end-to-end approach is fast, produces photorealistic reconstructions, and is easy to adopt for other mask-based lensless cameras. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020. Supplementary material attached. For project website, see https://siddiquesalman.github.io/flatnet/

arXiv:2010.02814 [pdf, other]

Anomaly Detection Approach to Identify Early Cases in a Pandemic using Chest X-rays

Authors: Shehroz S. Khan, Faraz Khoshbakhtian, Ahmed Bilal Ashraf

Abstract: The current COVID-19 pandemic is now getting contained, albeit at the cost of morethan2.3million human lives. A critical phase in any pandemic is the early detection of cases to develop preventive treatments and strategies. In the case of COVID-19,several studies have indicated that chest radiography images of the infected patients show characteristic abnormalities. However, at the onset of a give… ▽ More The current COVID-19 pandemic is now getting contained, albeit at the cost of morethan2.3million human lives. A critical phase in any pandemic is the early detection of cases to develop preventive treatments and strategies. In the case of COVID-19,several studies have indicated that chest radiography images of the infected patients show characteristic abnormalities. However, at the onset of a given pandemic, such asCOVID-19, there may not be sufficient data for the affected cases to train models for their robust detection. Hence, supervised classification is ill-posed for this problem because the time spent in collecting large amounts of data from infected persons could lead to the loss of human lives and delays in preventive interventions. Therefore, we formulate the problem of identifying early cases in a pandemic as an anomaly detection problem, in which the data for healthy patients is abundantly available, whereas no training data is present for the class of interest (COVID-19 in our case). To solve this problem, we present several unsupervised deep learning approaches, including convolutional and adversarially trained autoencoder. We tested two settings on a publicly available dataset (COVIDx)by training the model on chest X-rays from (i) only healthy adults, and (ii) healthy and other non-COVID-19 pneumonia, and detected COVID-19 as an anomaly. Afterperforming3-fold cross validation, we obtain a ROC-AUC of0.765. These results are very encouraging and pave the way towards research for ensuring emergency preparedness in future pandemics, especially the ones that could be detected from chest X-rays △ Less

Submitted 13 April, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

Comments: 9 pages, 3 tables, 3 figures

arXiv:2004.08352 [pdf, other]

doi 10.1109/ICPR48806.2021.9412632

Motion and Region Aware Adversarial Learning for Fall Detection with Thermal Imaging

Authors: Vineet Mehta, Abhinav Dhall, Sujata Pal, Shehroz S. Khan

Abstract: Automatic fall detection is a vital technology for ensuring the health and safety of people. Home-based camera systems for fall detection often put people's privacy at risk. Thermal cameras can partially or fully obfuscate facial features, thus preserving the privacy of a person. Another challenge is the less occurrence of falls in comparison to the normal activities of daily living. As fall occur… ▽ More Automatic fall detection is a vital technology for ensuring the health and safety of people. Home-based camera systems for fall detection often put people's privacy at risk. Thermal cameras can partially or fully obfuscate facial features, thus preserving the privacy of a person. Another challenge is the less occurrence of falls in comparison to the normal activities of daily living. As fall occurs rarely, it is non-trivial to learn algorithms due to class imbalance. To handle these problems, we formulate fall detection as an anomaly detection within an adversarial framework using thermal imaging. We present a novel adversarial network that comprises of two-channel 3D convolutional autoencoders which reconstructs the thermal data and the optical flow input sequences respectively. We introduce a technique to track the region of interest, a region-based difference constraint, and a joint discriminator to compute the reconstruction error. A larger reconstruction error indicates the occurrence of a fall. The experiments on a publicly available thermal fall dataset show the superior results obtained compared to the standard baseline. △ Less

Submitted 24 October, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

Comments: 8 pages,7 figures

arXiv:1905.07817 [pdf, other]

doi 10.1007/s10044-020-00901-9

Spatio-Temporal Adversarial Learning for Detecting Unseen Falls

Authors: Shehroz S. Khan, Jacob Nogas, Alex Mihailidis

Abstract: Fall detection is an important problem from both the health and machine learning perspective. A fall can lead to severe injuries, long term impairments or even death in some cases. In terms of machine learning, it presents a severely class imbalance problem with very few or no training data for falls owing to the fact that falls occur rarely. In this paper, we take an alternate philosophy to detec… ▽ More Fall detection is an important problem from both the health and machine learning perspective. A fall can lead to severe injuries, long term impairments or even death in some cases. In terms of machine learning, it presents a severely class imbalance problem with very few or no training data for falls owing to the fact that falls occur rarely. In this paper, we take an alternate philosophy to detect falls in the absence of their training data, by training the classifier on only the normal activities (that are available in abundance) and identifying a fall as an anomaly. To realize such a classifier, we use an adversarial learning framework, which comprises of a spatio-temporal autoencoder for reconstructing input video frames and a spatio-temporal convolution network to discriminate them against original video frames. 3D convolutions are used to learn spatial and temporal features from the input video frames. The adversarial learning of the spatio-temporal autoencoder will enable reconstructing the normal activities of daily living efficiently; thus, rendering detecting unseen falls plausible within this framework. We tested the performance of the proposed framework on camera sensing modalities that may preserve an individual's privacy (fully or partially), such as thermal and depth camera. Our results on three publicly available datasets show that the proposed spatio-temporal adversarial framework performed better than other baseline frame based (or spatial) adversarial learning methods. △ Less

Submitted 2 March, 2020; v1 submitted 19 May, 2019; originally announced May 2019.

Comments: 17 pages, 10 figures, 4 tables, 39 references

Journal ref: Pattern Analysis and Applications, 2020

arXiv:1902.00127 [pdf, other]

doi 10.13140/RG.2.2.21979.62244

initKmix -- A Novel Initial Partition Generation Algorithm for Clustering Mixed Data using k-means-based Clustering

Authors: Amir Ahmad, Shehroz S. Khan

Abstract: Mixed datasets consist of both numeric and categorical attributes. Various k-means-based clustering algorithms have been developed for these datasets. Generally, these algorithms use random partition as a starting point, which tends to produce different clustering results for different runs. In this paper, we propose, initKmix, a novel algorithm for finding an initial partition for k-means-based c… ▽ More Mixed datasets consist of both numeric and categorical attributes. Various k-means-based clustering algorithms have been developed for these datasets. Generally, these algorithms use random partition as a starting point, which tends to produce different clustering results for different runs. In this paper, we propose, initKmix, a novel algorithm for finding an initial partition for k-means-based clustering algorithms for mixed datasets. In the initKmix algorithm, a k-means-based clustering algorithm is run many times, and in each run, one of the attributes is used to create initial clusters for that run. The clustering results of various runs are combined to produce the initial partition. This initial partition is then used as a seed to a k-means-based clustering algorithm to cluster mixed data. Experiments with various categorical and mixed datasets showed that initKmix produced accurate and consistent results, and outperformed the random initial partition method and other state-of-the-art initialization methods. Experiments also showed that k-means-based clustering for mixed datasets with initKmix performed similar to or better than many state-of-the-art clustering algorithms for categorical and mixed datasets. △ Less

Submitted 22 July, 2020; v1 submitted 31 January, 2019; originally announced February 2019.

Comments: 41 page, 3 Figures, 19 Tables, 5 Algorithms

arXiv:1811.04364 [pdf, other]

doi 10.1109/ACCESS.2019.2903568

Survey of state-of-the-art mixed data clustering algorithms

Authors: Amir Ahmad, Shehroz S. Khan

Abstract: Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation… ▽ More Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present a state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. Lastly, we present an in-depth analysis of the overall challenges in this field, highlight open research questions and discuss guidelines to make progress in the field. △ Less

Submitted 18 March, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

Comments: 20 Pages, 2 columns, 6 Tables, 209 References

Journal ref: IEEE Access, 2019

arXiv:1809.00977 [pdf, other]

DeepFall -- Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders

Authors: Jacob Nogas, Shehroz S. Khan, Alex Mihailidis

Abstract: Human falls rarely occur; however, detecting falls is very important from the health and safety perspective. Due to the rarity of falls, it is difficult to employ supervised classification techniques to detect them. Moreover, in these highly skewed situations it is also difficult to extract domain specific features to identify falls. In this paper, we present a novel framework, \textit{DeepFall},… ▽ More Human falls rarely occur; however, detecting falls is very important from the health and safety perspective. Due to the rarity of falls, it is difficult to employ supervised classification techniques to detect them. Moreover, in these highly skewed situations it is also difficult to extract domain specific features to identify falls. In this paper, we present a novel framework, \textit{DeepFall}, which formulates the fall detection problem as an anomaly detection problem. The \textit{DeepFall} framework presents the novel use of deep spatio-temporal convolutional autoencoders to learn spatial and temporal features from normal activities using non-invasive sensing modalities. We also present a new anomaly scoring method that combines the reconstruction score of frames across a temporal window to detect unseen falls. We tested the \textit{DeepFall} framework on three publicly available datasets collected through non-invasive sensing modalities, thermal camera and depth cameras and show superior results in comparison to traditional autoencoder methods to identify unseen falls. △ Less

Submitted 27 April, 2020; v1 submitted 30 August, 2018; originally announced September 2018.

arXiv:1802.00154 [pdf, other]

Bootstrapping and Multiple Imputation Ensemble Approaches for Missing Data

Authors: Shehroz S. Khan, Amir Ahmad, Alex Mihailidis

Abstract: Presence of missing values in a dataset can adversely affect the performance of a classifier. Single and Multiple Imputation are normally performed to fill in the missing values. In this paper, we present several variants of combining single and multiple imputation with bootstrapping to create ensembles that can model uncertainty and diversity in the data, and that are robust to high missingness i… ▽ More Presence of missing values in a dataset can adversely affect the performance of a classifier. Single and Multiple Imputation are normally performed to fill in the missing values. In this paper, we present several variants of combining single and multiple imputation with bootstrapping to create ensembles that can model uncertainty and diversity in the data, and that are robust to high missingness in the data. We present three ensemble strategies: bootstrapping on incomplete data followed by (i) single imputation and (ii) multiple imputation, and (iii) multiple imputation ensemble without bootstrapping. We perform an extensive evaluation of the performance of the these ensemble strategies on 8 datasets by varying the missingness ratio. Our results show that bootstrapping followed by multiple imputation using expectation maximization is the most robust method even at high missingness ratio (up to 30%). For small missingness ratio (up to 10%) most of the ensemble methods perform quivalently but better than single imputation. Kappa-error plots suggest that accurate classifiers with reasonable diversity is the reason for this behaviour. A consistent observation in all the datasets suggests that for small missingness (up to 10%), bootstrapping on incomplete data without any imputation produces equivalent results to other ensemble methods. △ Less

Submitted 15 October, 2019; v1 submitted 1 February, 2018; originally announced February 2018.

Comments: 16 Pages, 15 Tables, 4 Figures

Journal ref: Journal of Intelligent & Fuzzy Systems, 2019

arXiv:1705.04924 [pdf]

Gland Segmentation in Histopathology Images Using Random Forest Guided Boundary Construction

Authors: Rohith AP, Salman S. Khan, Kumar Anubhav, Angshuman Paul

Abstract: Grading of cancer is important to know the extent of its spread. Prior to grading, segmentation of glandular structures is important. Manual segmentation is a time consuming process and is subject to observer bias. Hence, an automated process is required to segment the gland structures. These glands show a large variation in shape size and texture. This makes the task challenging as the glands can… ▽ More Grading of cancer is important to know the extent of its spread. Prior to grading, segmentation of glandular structures is important. Manual segmentation is a time consuming process and is subject to observer bias. Hence, an automated process is required to segment the gland structures. These glands show a large variation in shape size and texture. This makes the task challenging as the glands cannot be segmented using mere morphological operations and conventional segmentation mechanisms. In this project we propose a method which detects the boundary epithelial cells of glands and then a novel approach is used to construct the complete gland boundary. The region enclosed within the boundary can then be obtained to get the segmented gland regions. △ Less

Submitted 15 August, 2017; v1 submitted 14 May, 2017; originally announced May 2017.

Comments: 7 pages, 8 figures

arXiv:1610.03761 [pdf, other]

doi 10.1016/j.eswa.2017.06.011

Detecting Unseen Falls from Wearable Devices using Channel-wise Ensemble of Autoencoders

Authors: Shehroz S. Khan, Babak Taati

Abstract: A fall is an abnormal activity that occurs rarely, so it is hard to collect real data for falls. It is, therefore, difficult to use supervised learning methods to automatically detect falls. Another challenge in using machine learning methods to automatically detect falls is the choice of engineered features. In this paper, we propose to use an ensemble of autoencoders to extract features from dif… ▽ More A fall is an abnormal activity that occurs rarely, so it is hard to collect real data for falls. It is, therefore, difficult to use supervised learning methods to automatically detect falls. Another challenge in using machine learning methods to automatically detect falls is the choice of engineered features. In this paper, we propose to use an ensemble of autoencoders to extract features from different channels of wearable sensor data trained only on normal activities. We show that the traditional approach of choosing a threshold as the maximum of the reconstruction error on the training normal data is not the right way to identify unseen falls. We propose two methods for automatic tightening of reconstruction error from only the normal activities for better identification of unseen falls. We present our results on two activity recognition datasets and show the efficacy of our proposed method against traditional autoencoder models and two standard one-class classification methods. △ Less

Submitted 22 March, 2017; v1 submitted 12 October, 2016; originally announced October 2016.

Comments: 25 pages, 6 figures, 4 Tables

Journal ref: Expert Systems with Applications, Volume 87, 30 November 2017, Pages 280-290

arXiv:1605.09351 [pdf, other]

doi 10.1016/j.medengphy.2016.10.014

Review of Fall Detection Techniques: A Data Availability Perspective

Authors: Shehroz S. Khan, Jesse Hoey

Abstract: A fall is an abnormal activity that occurs rarely; however, missing to identify falls can have serious health and safety implications on an individual. Due to the rarity of occurrence of falls, there may be insufficient or no training data available for them. Therefore, standard supervised machine learning methods may not be directly applied to handle this problem. In this paper, we present a taxo… ▽ More A fall is an abnormal activity that occurs rarely; however, missing to identify falls can have serious health and safety implications on an individual. Due to the rarity of occurrence of falls, there may be insufficient or no training data available for them. Therefore, standard supervised machine learning methods may not be directly applied to handle this problem. In this paper, we present a taxonomy for the study of fall detection from the perspective of availability of fall data. The proposed taxonomy is independent of the type of sensors used and specific feature extraction/selection methods. The taxonomy identifies different categories of classification methods for the study of fall detection based on the availability of their data during training the classifiers. Then, we present a comprehensive literature review within those categories and identify the approach of treating a fall as an abnormal activity to be a plausible research direction. We conclude our paper by discussing several open research problems in the field and pointers for future research. △ Less

Submitted 16 September, 2016; v1 submitted 30 May, 2016; originally announced May 2016.

Comments: 30 pages, 1 figure, 3 Tables

Journal ref: Medical Engineering and Physics, Volume 39, 2017

arXiv:1604.01686 [pdf, other]

Relationship between Variants of One-Class Nearest Neighbours and Creating their Accurate Ensembles

Authors: Shehroz S. Khan, Amir Ahmad

Abstract: In one-class classification problems, only the data for the target class is available, whereas the data for the non-target class may be completely absent. In this paper, we study one-class nearest neighbour (OCNN) classifiers and their different variants. We present a theoretical analysis to show the relationships among different variants of OCNN that may use different neighbours or thresholds to… ▽ More In one-class classification problems, only the data for the target class is available, whereas the data for the non-target class may be completely absent. In this paper, we study one-class nearest neighbour (OCNN) classifiers and their different variants. We present a theoretical analysis to show the relationships among different variants of OCNN that may use different neighbours or thresholds to identify unseen examples of the non-target class. We also present a method based on inter-quartile range for optimising parameters used in OCNN in the absence of non-target data during training. Then, we propose two ensemble approaches based on random subspace and random projection methods to create accurate OCNN ensembles. We tested the proposed methods on 15 benchmark and real world domain-specific datasets and show that random-projection ensembles of OCNN perform best. △ Less

Submitted 28 December, 2017; v1 submitted 6 April, 2016; originally announced April 2016.

Comments: 14 pages, 9 figures, 8 Tables

arXiv:1504.02141 [pdf, other]

doi 10.1016/j.asoc.2017.01.034

Detecting Falls with X-Factor Hidden Markov Models

Authors: Shehroz S. Khan, Michelle E. Karg, Dana Kulic, Jesse Hoey

Abstract: Identification of falls while performing normal activities of daily living (ADL) is important to ensure personal safety and well-being. However, falling is a short term activity that occurs infrequently. This poses a challenge to traditional classification algorithms, because there may be very little training data for falls (or none at all). This paper proposes an approach for the identification o… ▽ More Identification of falls while performing normal activities of daily living (ADL) is important to ensure personal safety and well-being. However, falling is a short term activity that occurs infrequently. This poses a challenge to traditional classification algorithms, because there may be very little training data for falls (or none at all). This paper proposes an approach for the identification of falls using a wearable device in the absence of training data for falls but with plentiful data for normal ADL. We propose three `X-Factor' Hidden Markov Model (XHMMs) approaches. The XHMMs model unseen falls using "inflated" output covariances (observation models). To estimate the inflated covariances, we propose a novel cross validation method to remove "outliers" from the normal ADL that serve as proxies for the unseen falls and allow learning the XHMMs using only normal activities. We tested the proposed XHMM approaches on two activity recognition datasets and show high detection rates for falls in the absence of fall-specific training data. We show that the traditional method of choosing a threshold based on maximum of negative of log-likelihood to identify unseen falls is ill-posed for this problem. We also show that supervised classification methods perform poorly when very limited fall data are available during the training phase. △ Less

Submitted 20 January, 2017; v1 submitted 8 April, 2015; originally announced April 2015.

Comments: 27 pages, 4 figures, 3 tables, Applied Soft Computing, 2017

Journal ref: Applied Soft Computing Volume 55, June 2017, Pages 168-177

arXiv:1410.5718 [pdf]

A Computer Virus Propagation Model Using Delay Differential Equations With Probabilistic Contagion And Immunity

Authors: M. S. S. Khan

Abstract: The SIR model is used extensively in the field of epidemiology, in particular, for the analysis of communal diseases. One problem with SIR and other existing models is that they are tailored to random or Erdos type networks since they do not consider the varying probabilities of infection or immunity per node. In this paper, we present the application and the simulation results of the pSEIRS model… ▽ More The SIR model is used extensively in the field of epidemiology, in particular, for the analysis of communal diseases. One problem with SIR and other existing models is that they are tailored to random or Erdos type networks since they do not consider the varying probabilities of infection or immunity per node. In this paper, we present the application and the simulation results of the pSEIRS model that takes into account the probabilities, and is thus suitable for more realistic scale free networks. In the pSEIRS model, the death rate and the excess death rate are constant for infective nodes. Latent and immune periods are assumed to be constant and the infection rate is assumed to be proportional to I (t) N(t), where N (t) is the size of the total population and I(t) is the size of the infected population. A node recovers from an infection temporarily with a probability p and dies from the infection with probability (1-p). △ Less

Submitted 21 October, 2014; originally announced October 2014.

Comments: International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.5, September 2014

arXiv:1312.0049 [pdf, ps, other]

doi 10.1017/S026988891300043X

One-Class Classification: Taxonomy of Study and Review of Techniques

Authors: Shehroz S. Khan, Michael G. Madden

Abstract: One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty dete… ▽ More One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research. △ Less

Submitted 29 November, 2013; originally announced December 2013.

Comments: 24 pages + 11 pages of references, 8 figures

Journal ref: The Knowledge Engineering Review, pp 1-30, 2014

Showing 1–43 of 43 results for author: Khan, S S