-
Scaling Graph Convolutions for Mobile Vision
Authors:
William Avery,
Mustafa Munir,
Radu Marculescu
Abstract:
To compete with existing mobile architectures, MobileViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling p…
▽ More
To compete with existing mobile architectures, MobileViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, MobileViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 $AP^{box}$ and 0.7 $AP^{mask}$, and MobileViGv2-B outperforms MobileViG-B by 1.0 $AP^{box}$ and 0.7 $AP^{mask}$. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% $mIoU$ and MobileViGv2-B achieves 44.3% $mIoU$. Our code can be found at \url{https://github.com/SLDGroup/MobileViGv2}.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
Authors:
Tanvir Mahmud,
Mustafa Munir,
Radu Marculescu,
Diana Marculescu
Abstract:
Video-to-video synthesis models face significant challenges, such as ensuring consistent character generation across frames, maintaining smooth temporal transitions, and preserving quality during fast motion. The introduction of joint fully cross-frame self-attention mechanisms has improved character consistency, but this comes at the cost of increased computational complexity. This full cross-fra…
▽ More
Video-to-video synthesis models face significant challenges, such as ensuring consistent character generation across frames, maintaining smooth temporal transitions, and preserving quality during fast motion. The introduction of joint fully cross-frame self-attention mechanisms has improved character consistency, but this comes at the cost of increased computational complexity. This full cross-frame self-attention mechanism also incorporates redundant details and limits the number of frames that can be jointly edited due to its computational cost. Moreover, the lack of frames in cross-frame attention adversely affects temporal consistency and visual quality. To address these limitations, we propose a new adaptive motion-guided cross-frame attention mechanism that drastically reduces complexity while preserving semantic details and temporal consistency. Specifically, we selectively incorporate the moving regions of successive frames in cross-frame attention and sparsely include stationary regions based on optical flow sampling. This technique allows for an increased number of jointly edited frames without additional computational overhead. For longer duration of video editing, existing methods primarily focus on frame interpolation or flow-warping from jointly edited keyframes, which often results in blurry frames or reduced temporal consistency. To improve this, we introduce KV-caching of jointly edited frames and reuse the same KV across all intermediate frames, significantly enhancing both intermediate frame quality and temporal consistency. Overall, our motion-sampling method enables the use of around three times more keyframes than existing joint editing methods while maintaining superior prediction quality. Ada-VE achieves up to 4x speed-up when using fully-extended self-attention across 40 frames for joint editing, without compromising visual quality or temporal consistency.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation
Authors:
Md Mostafijur Rahman,
Mustafa Munir,
Debesh Jha,
Ulas Bagci,
Radu Marculescu
Abstract:
The Segment Anything Model (SAM), originally designed for general-purpose segmentation tasks, has been used recently for polyp segmentation. Nonetheless, fine-tuning SAM with data from new imaging centers or clinics poses significant challenges. This is because this necessitates the creation of an expensive and time-intensive annotated dataset, along with the potential for variability in user prom…
▽ More
The Segment Anything Model (SAM), originally designed for general-purpose segmentation tasks, has been used recently for polyp segmentation. Nonetheless, fine-tuning SAM with data from new imaging centers or clinics poses significant challenges. This is because this necessitates the creation of an expensive and time-intensive annotated dataset, along with the potential for variability in user prompts during inference. To address these issues, we propose a robust fine-tuning technique, PP-SAM, that allows SAM to adapt to the polyp segmentation task with limited images. To this end, we utilize variable perturbed bounding box prompts (BBP) to enrich the learning context and enhance the model's robustness to BBP perturbations during inference. Rigorous experiments on polyp segmentation benchmarks reveal that our variable BBP perturbation significantly improves model resilience. Notably, on Kvasir, 1-shot fine-tuning boosts the DICE score by 20% and 37% with 50 and 100-pixel BBP perturbations during inference, respectively. Moreover, our experiments show that 1-shot, 5-shot, and 10-shot PP-SAM with 50-pixel perturbations during inference outperform a recent state-of-the-art (SOTA) polyp segmentation method by 26%, 7%, and 5% DICE scores, respectively. Our results motivate the broader applicability of our PP-SAM for other medical imaging tasks with limited samples. Our implementation is available at https://github.com/SLDGroup/PP-SAM.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Authors:
Muhammad Sohail Danish,
Muhammad Haris Khan,
Muhammad Akhtar Munir,
M. Saquib Sarfraz,
Mohsen Ali
Abstract:
In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set o…
▽ More
In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set of augmentations, a base detector can outperform existing methods for single domain generalization by a good margin. This highlights the importance of domain diversification in improving the performance of object detectors. Secondly, we introduce a method to align detections from multiple views, considering both classification and localization outputs. This alignment procedure leads to better generalized and well-calibrated object detector models, which are crucial for accurate decision-making in safety-critical applications. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors. To validate the effectiveness of our proposed methods, we conduct extensive experiments and ablations on challenging domain-shift scenarios. The results consistently demonstrate the superiority of our approach compared to existing methods. Our code and models are available at: https://github.com/msohaildanish/DivAlign
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Authors:
Md Mostafijur Rahman,
Mustafa Munir,
Radu Marculescu
Abstract:
An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficienc…
▽ More
An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroup/EMCAD.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
Authors:
Mustafa Munir,
William Avery,
Md Mostafijur Rahman,
Radu Marculescu
Abstract:
Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made…
▽ More
Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made within an image. Additionally, we propose a novel CNN-GNN architecture, GreedyViG, which uses DAGC. Extensive experiments show that GreedyViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification, object detection, instance segmentation, and semantic segmentation tasks. Our smallest model, GreedyViG-S, achieves 81.1% top-1 accuracy on ImageNet-1K, 2.9% higher than Vision GNN and 2.2% higher than Vision HyperGraph Neural Network (ViHGNN), with less GMACs and a similar number of parameters. Our largest model, GreedyViG-B obtains 83.9% top-1 accuracy, 0.2% higher than Vision GNN, with a 66.6% decrease in parameters and a 69% decrease in GMACs. GreedyViG-B also obtains the same accuracy as ViHGNN with a 67.3% decrease in parameters and a 71.3% decrease in GMACs. Our work shows that hybrid CNN-GNN architectures not only provide a new avenue for designing efficient models, but that they can also exceed the performance of current state-of-the-art models.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
A Zero Trust Framework for Realization and Defense Against Generative AI Attacks in Power Grid
Authors:
Md. Shirajum Munir,
Sravanthi Proddatoori,
Manjushree Muralidhara,
Walid Saad,
Zhu Han,
Sachin Shetty
Abstract:
Understanding the potential of generative AI (GenAI)-based attacks on the power grid is a fundamental challenge that must be addressed in order to protect the power grid by realizing and validating risk in new attack vectors. In this paper, a novel zero trust framework for a power grid supply chain (PGSC) is proposed. This framework facilitates early detection of potential GenAI-driven attack vect…
▽ More
Understanding the potential of generative AI (GenAI)-based attacks on the power grid is a fundamental challenge that must be addressed in order to protect the power grid by realizing and validating risk in new attack vectors. In this paper, a novel zero trust framework for a power grid supply chain (PGSC) is proposed. This framework facilitates early detection of potential GenAI-driven attack vectors (e.g., replay and protocol-type attacks), assessment of tail risk-based stability measures, and mitigation of such threats. First, a new zero trust system model of PGSC is designed and formulated as a zero-trust problem that seeks to guarantee for a stable PGSC by realizing and defending against GenAI-driven cyber attacks. Second, in which a domain-specific generative adversarial networks (GAN)-based attack generation mechanism is developed to create a new vulnerability cyberspace for further understanding that threat. Third, tail-based risk realization metrics are developed and implemented for quantifying the extreme risk of a potential attack while leveraging a trust measurement approach for continuous validation. Fourth, an ensemble learning-based bootstrap aggregation scheme is devised to detect the attacks that are generating synthetic identities with convincing user and distributed energy resources device profiles. Experimental results show the efficacy of the proposed zero trust framework that achieves an accuracy of 95.7% on attack vector generation, a risk measure of 9.61% for a 95% stable PGSC, and a 99% confidence in defense against GenAI-driven attack.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Domain Adaptive Object Detection via Balancing Between Self-Training and Adversarial Learning
Authors:
Muhammad Akhtar Munir,
Muhammad Haris Khan,
M. Saquib Sarfraz,
Mohsen Ali
Abstract:
Deep learning based object detectors struggle generalizing to a new target domain bearing significant variations in object and background. Most current methods align domains by using image or instance-level adversarial feature alignment. This often suffers due to unwanted background and lacks class-specific alignment. A straightforward approach to promote class-level alignment is to use high confi…
▽ More
Deep learning based object detectors struggle generalizing to a new target domain bearing significant variations in object and background. Most current methods align domains by using image or instance-level adversarial feature alignment. This often suffers due to unwanted background and lacks class-specific alignment. A straightforward approach to promote class-level alignment is to use high confidence predictions on unlabeled domain as pseudo-labels. These predictions are often noisy since model is poorly calibrated under domain shift. In this paper, we propose to leverage model's predictive uncertainty to strike the right balance between adversarial feature alignment and class-level alignment. We develop a technique to quantify predictive uncertainty on class assignments and bounding-box predictions. Model predictions with low uncertainty are used to generate pseudo-labels for self-training, whereas the ones with higher uncertainty are used to generate tiles for adversarial feature alignment. This synergy between tiling around uncertain object regions and generating pseudo-labels from highly certain object regions allows capturing both image and instance-level context during the model adaptation. We report thorough ablation study to reveal the impact of different components in our approach. Results on five diverse and challenging adaptation scenarios show that our approach outperforms existing state-of-the-art methods with noticeable margins.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Cal-DETR: Calibrated Detection Transformer
Authors:
Muhammad Akhtar Munir,
Salman Khan,
Muhammad Haris Khan,
Mohsen Ali,
Fahad Shahbaz Khan
Abstract:
Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little at…
▽ More
Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Generative AI-driven Semantic Communication Framework for NextG Wireless Network
Authors:
Avi Deb Raha,
Md. Shirajum Munir,
Apurba Adhikary,
Yu Qiao,
Choong Seon Hong
Abstract:
This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed re…
▽ More
This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed reality, and the Internet of Everything, where real-time and efficient data transmission is paramount. Therefore, to reduce communication overhead and maintain the QoS of emerging applications such as metaverse, ITS, and digital twin creation, this work proposes a novel semantic communication framework. First, an intelligent semantic transmitter is designed to capture the meaningful information (e.g., the rode-side image in ITS) by designing a domain-specific Mobile Segment Anything Model (MSAM)-based mechanism to reduce the potential communication traffic while QoS remains intact. Second, the concept of generative AI is introduced for building the SemCom to reconstruct and denoise the received semantic data frame at the receiver end. In particular, the Generative Adversarial Network (GAN) mechanism is designed to maintain a superior quality reconstruction under different signal-to-noise (SNR) channel conditions. Finally, we have tested and evaluated the proposed semantic communication (SemCom) framework with the real-world 6G scenario of ITS; in particular, the base station equipped with an RGB camera and a mmWave phased array. Experimental results demonstrate the efficacy of the proposed SemCom framework by achieving high-quality reconstruction across various SNR channel conditions, resulting in 93.45% data reduction in communication.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures
Authors:
Arif Mahmood,
Abdul Basit,
M. Akhtar Munir,
Mohsen Ali
Abstract:
Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction informat…
▽ More
Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction information, which provides valuable clues for localizing firearm carriers. Our approach incorporates an attention mechanism that effectively distinguishes humans and firearms from the background by focusing on relevant areas. Additionally, we introduce a saliency-driven locality-preserving constraint to learn essential features while preserving foreground information in the input image. By combining these components, our approach achieves exceptional results on a newly proposed dataset. To handle inputs of varying sizes, we pass paired human-firearm instances with attention masks as channels through a deep network for feature computation, utilizing an adaptive average pooling layer. We extensively evaluate our approach against existing methods in human-object interaction detection and achieve significant results (AP=77.8\%) compared to the baseline approach (AP=63.1\%). This demonstrates the effectiveness of leveraging attention mechanisms and saliency-driven locality preservation for accurate human-firearm interaction detection. Our findings contribute to advancing the fields of security and surveillance, enabling more efficient firearm localization and identification in diverse scenarios.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications
Authors:
Mustafa Munir,
William Avery,
Radu Marculescu
Abstract:
Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based spars…
▽ More
Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based sparse attention mechanism, Sparse Vision Graph Attention (SVGA), that is designed for ViGs running on mobile devices. Additionally, we propose the first hybrid CNN-GNN architecture for vision tasks on mobile devices, MobileViG, which uses SVGA. Extensive experiments show that MobileViG beats existing ViG models and existing mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. Our fastest model, MobileViG-Ti, achieves 75.7% top-1 accuracy on ImageNet-1K with 0.78 ms inference latency on iPhone 13 Mini NPU (compiled with CoreML), which is faster than MobileNetV2x1.4 (1.02 ms, 74.7% top-1) and MobileNetV2x1.0 (0.81 ms, 71.8% top-1). Our largest model, MobileViG-B obtains 82.6% top-1 accuracy with only 2.30 ms latency, which is faster and more accurate than the similarly sized EfficientFormer-L3 model (2.77 ms, 82.4%). Our work proves that well designed hybrid CNN-GNN architectures can be a new avenue of exploration for designing models that are extremely fast and accurate on mobile devices. Our code is publicly available at https://github.com/SLDGroup/MobileViG.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid
Authors:
Md. Shirajum Munir,
Sachin Shetty,
Danda B. Rawat
Abstract:
The rapid growth of distributed energy resources (DERs), such as renewable energy sources, generators, consumers, and prosumers in the smart grid infrastructure, poses significant cybersecurity and trust challenges to the grid controller. Consequently, it is crucial to identify adversarial tactics and measure the strength of the attacker's DER. To enable a trustworthy smart grid controller, this w…
▽ More
The rapid growth of distributed energy resources (DERs), such as renewable energy sources, generators, consumers, and prosumers in the smart grid infrastructure, poses significant cybersecurity and trust challenges to the grid controller. Consequently, it is crucial to identify adversarial tactics and measure the strength of the attacker's DER. To enable a trustworthy smart grid controller, this work investigates a trustworthy artificial intelligence (AI) mechanism for proactive identification and explanation of the cyber risk caused by the control/status message of DERs. Thus, proposing and developing a trustworthy AI framework to facilitate the deployment of any AI algorithms for detecting potential cyber threats and analyzing root causes based on Shapley value interpretation while dynamically quantifying the risk of an attack based on Ward's minimum variance formula. The experiment with a state-of-the-art dataset establishes the proposed framework as a trustworthy AI by fulfilling the capabilities of reliability, fairness, explainability, transparency, reproducibility, and accountability.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
MP-FedCL: Multiprototype Federated Contrastive Learning for Edge Intelligence
Authors:
Yu Qiao,
Md. Shirajum Munir,
Apurba Adhikary,
Huy Q. Le,
Avi Deb Raha,
Chaoning Zhang,
Choong Seon Hong
Abstract:
Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a…
▽ More
Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a single prototype may not represent a class well. Motivated by this, this paper proposes a multi-prototype federated contrastive learning approach (MP-FedCL) which demonstrates the effectiveness of using a multi-prototype strategy over a single-prototype under non-IID settings, including both label and feature skewness. Specifically, a multi-prototype computation strategy based on \textit{k-means} is first proposed to capture different embedding representations for each class space, using multiple prototypes ($k$ centroids) to represent a class in the embedding space. In each global round, the computed multiple prototypes and their respective model parameters are sent to the edge server for aggregation into a global prototype pool, which is then sent back to all clients to guide their local training. Finally, local training for each client minimizes their own supervised learning tasks and learns from shared prototypes in the global prototype pool through supervised contrastive learning, which encourages them to learn knowledge related to their own class from others and reduces the absorption of unrelated knowledge in each global iteration. Experimental results on MNIST, Digit-5, Office-10, and DomainNet show that our method outperforms multiple baselines, with an average test accuracy improvement of about 4.6\% and 10.4\% under feature and label non-IID distributions, respectively.
△ Less
Submitted 11 October, 2023; v1 submitted 1 April, 2023;
originally announced April 2023.
-
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
Authors:
Muhammad Akhtar Munir,
Muhammad Haris Khan,
Salman Khan,
Fahad Shahbaz Khan
Abstract:
Deep neural networks (DNNs) have enabled astounding progress in several vision-based problems. Despite showing high predictive accuracy, recently, several works have revealed that they tend to provide overconfident predictions and thus are poorly calibrated. The majority of the works addressing the miscalibration of DNNs fall under the scope of classification and consider only in-domain prediction…
▽ More
Deep neural networks (DNNs) have enabled astounding progress in several vision-based problems. Despite showing high predictive accuracy, recently, several works have revealed that they tend to provide overconfident predictions and thus are poorly calibrated. The majority of the works addressing the miscalibration of DNNs fall under the scope of classification and consider only in-domain predictions. However, there is little to no progress in studying the calibration of DNN-based object detection models, which are central to many vision-based safety-critical applications. In this paper, inspired by the train-time calibration methods, we propose a novel auxiliary loss formulation that explicitly aims to align the class confidence of bounding boxes with the accurateness of predictions (i.e. precision). Since the original formulation of our loss depends on the counts of true positives and false positives in a minibatch, we develop a differentiable proxy of our loss that can be used during training with other application-specific loss functions. We perform extensive experiments on challenging in-domain and out-domain scenarios with six benchmark datasets including MS-COCO, Cityscapes, Sim10k, and BDD100k. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios. Our source code and pre-trained models are available at https://github.com/akhtarvision/bpc_calibration
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Neuro-symbolic Explainable Artificial Intelligence Twin for Zero-touch IoE in Wireless Network
Authors:
Md. Shirajum Munir,
Ki Tae Kim,
Apurba Adhikary,
Walid Saad,
Sachin Shetty,
Seong-Bae Park,
Choong Seon Hong
Abstract:
Explainable artificial intelligence (XAI) twin systems will be a fundamental enabler of zero-touch network and service management (ZSM) for sixth-generation (6G) wireless networks. A reliable XAI twin system for ZSM requires two composites: an extreme analytical ability for discretizing the physical behavior of the Internet of Everything (IoE) and rigorous methods for characterizing the reasoning…
▽ More
Explainable artificial intelligence (XAI) twin systems will be a fundamental enabler of zero-touch network and service management (ZSM) for sixth-generation (6G) wireless networks. A reliable XAI twin system for ZSM requires two composites: an extreme analytical ability for discretizing the physical behavior of the Internet of Everything (IoE) and rigorous methods for characterizing the reasoning of such behavior. In this paper, a novel neuro-symbolic explainable artificial intelligence twin framework is proposed to enable trustworthy ZSM for a wireless IoE. The physical space of the XAI twin executes a neural-network-driven multivariate regression to capture the time-dependent wireless IoE environment while determining unconscious decisions of IoE service aggregation. Subsequently, the virtual space of the XAI twin constructs a directed acyclic graph (DAG)-based Bayesian network that can infer a symbolic reasoning score over unconscious decisions through a first-order probabilistic language model. Furthermore, a Bayesian multi-arm bandits-based learning problem is proposed for reducing the gap between the expected explained score and the current obtained score of the proposed neuro-symbolic XAI twin. To address the challenges of extensible, modular, and stateless management functions in ZSM, the proposed neuro-symbolic XAI twin framework consists of two learning systems: 1) an implicit learner that acts as an unconscious learner in physical space, and 2) an explicit leaner that can exploit symbolic reasoning based on implicit learner decisions and prior evidence. Experimental results show that the proposed neuro-symbolic XAI twin can achieve around 96.26% accuracy while guaranteeing from 18% to 44% more trust score in terms of reasoning and closed-loop automation.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Towards Improving Calibration in Object Detection Under Domain Shift
Authors:
Muhammad Akhtar Munir,
Muhammad Haris Khan,
M. Saquib Sarfraz,
Mohsen Ali
Abstract:
With deep neural network based solution more readily being incorporated in real-world applications, it has been pressing requirement that predictions by such models, especially in safety-critical environments, be highly accurate and well-calibrated. Although some techniques addressing DNN calibration have been proposed, they are only limited to visual classification applications and in-domain pred…
▽ More
With deep neural network based solution more readily being incorporated in real-world applications, it has been pressing requirement that predictions by such models, especially in safety-critical environments, be highly accurate and well-calibrated. Although some techniques addressing DNN calibration have been proposed, they are only limited to visual classification applications and in-domain predictions. Unfortunately, very little to no attention is paid towards addressing calibration of DNN-based visual object detectors, that occupy similar space and importance in many decision making systems as their visual classification counterparts. In this work, we study the calibration of DNN-based object detection models, particularly under domain shift. To this end, we first propose a new, plug-and-play, train-time calibration loss for object detection (coined as TCD). It can be used with various application-specific loss functions as an auxiliary loss function to improve detection calibration. Second, we devise a new implicit technique for improving calibration in self-training based domain adaptive detectors, featuring a new uncertainty quantification mechanism for object detection. We demonstrate TCD is capable of enhancing calibration with notable margins (1) across different DNN-based object detection paradigms both in in-domain and out-of-domain predictions, and (2) in different domain-adaptive detectors across challenging adaptation scenarios. Finally, we empirically show that our implicit calibration technique can be used in tandem with TCD during adaptation to further boost calibration in diverse domain shift scenarios.
△ Less
Submitted 29 October, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
Authors:
Julian Wörmann,
Daniel Bogdoll,
Christian Brunner,
Etienne Bührle,
Han Chen,
Evaristus Fuh Chuo,
Kostadin Cvejoski,
Ludger van Elst,
Philip Gottschall,
Stefan Griesche,
Christian Hellert,
Christian Hesels,
Sebastian Houben,
Tim Joseph,
Niklas Keil,
Johann Kelsch,
Mert Keser,
Hendrik Königshof,
Erwin Kraft,
Leonie Kreuser,
Kevin Krone,
Tobias Latka,
Denny Mattern,
Stefan Matthes,
Franz Motzkus
, et al. (27 additional authors not shown)
Abstract:
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical con…
▽ More
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
△ Less
Submitted 20 November, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
F2DNet: Fast Focal Detection Network for Pedestrian Detection
Authors:
Abdul Hannan Khan,
Mohsin Munir,
Ludger van Elst,
Andreas Dengel
Abstract:
Two-stage detectors are state-of-the-art in object detection as well as pedestrian detection. However, the current two-stage detectors are inefficient as they do bounding box regression in multiple steps i.e. in region proposal networks and bounding box heads. Also, the anchor-based region proposal networks are computationally expensive to train. We propose F2DNet, a novel two-stage detection arch…
▽ More
Two-stage detectors are state-of-the-art in object detection as well as pedestrian detection. However, the current two-stage detectors are inefficient as they do bounding box regression in multiple steps i.e. in region proposal networks and bounding box heads. Also, the anchor-based region proposal networks are computationally expensive to train. We propose F2DNet, a novel two-stage detection architecture which eliminates redundancy of current two-stage detectors by replacing the region proposal network with our focal detection network and bounding box head with our fast suppression head. We benchmark F2DNet on top pedestrian detection datasets, thoroughly compare it against the existing state-of-the-art detectors and conduct cross dataset evaluation to test the generalizability of our model to unseen data. Our F2DNet achieves 8.7\%, 2.2\%, and 6.1\% MR-2 on City Persons, Caltech Pedestrian, and Euro City Person datasets respectively when trained on a single dataset and reaches 20.4\% and 26.2\% MR-2 in heavy occlusion setting of Caltech Pedestrian and City Persons datasets when using progressive fine-tunning. Furthermore, F2DNet have significantly lesser inference time compared to the current state-of-the-art. Code and trained models will be available at https://github.com/AbdulHannanKhan/F2DNet.
△ Less
Submitted 23 September, 2022; v1 submitted 4 March, 2022;
originally announced March 2022.
-
An Explainable Artificial Intelligence Framework for Quality-Aware IoE Service Delivery
Authors:
Md. Shirajum Munir,
Seong-Bae Park,
Choong Seon Hong
Abstract:
One of the core envisions of the sixth-generation (6G) wireless networks is to accumulate artificial intelligence (AI) for autonomous controlling of the Internet of Everything (IoE). Particularly, the quality of IoE services delivery must be maintained by analyzing contextual metrics of IoE such as people, data, process, and things. However, the challenges incorporate when the AI model conceives a…
▽ More
One of the core envisions of the sixth-generation (6G) wireless networks is to accumulate artificial intelligence (AI) for autonomous controlling of the Internet of Everything (IoE). Particularly, the quality of IoE services delivery must be maintained by analyzing contextual metrics of IoE such as people, data, process, and things. However, the challenges incorporate when the AI model conceives a lake of interpretation and intuition to the network service provider. Therefore, this paper provides an explainable artificial intelligence (XAI) framework for quality-aware IoE service delivery that enables both intelligence and interpretation. First, a problem of quality-aware IoE service delivery is formulated by taking into account network dynamics and contextual metrics of IoE, where the objective is to maximize the channel quality index (CQI) of each IoE service user. Second, a regression problem is devised to solve the formulated problem, where explainable coefficients of the contextual matrices are estimated by Shapley value interpretation. Third, the XAI-enabled quality-aware IoE service delivery algorithm is implemented by employing ensemble-based regression models for ensuring the interpretation of contextual relationships among the matrices to reconfigure network parameters. Finally, the experiment results show that the uplink improvement rate becomes 42.43% and 16.32% for the AdaBoost and Extra Trees, respectively, while the downlink improvement rate reaches up to 28.57% and 14.29%. However, the AdaBoost-based approach cannot maintain the CQI of IoE service users. Therefore, the proposed Extra Trees-based regression model shows significant performance gain for mitigating the trade-off between accuracy and interpretability than other baselines.
△ Less
Submitted 26 January, 2022;
originally announced January 2022.
-
Evaluating Privacy-Preserving Machine Learning in Critical Infrastructures: A Case Study on Time-Series Classification
Authors:
Dominique Mercier,
Adriano Lucieri,
Mohsin Munir,
Andreas Dengel,
Sheraz Ahmed
Abstract:
With the advent of machine learning in applications of critical infrastructure such as healthcare and energy, privacy is a growing concern in the minds of stakeholders. It is pivotal to ensure that neither the model nor the data can be used to extract sensitive information used by attackers against individuals or to harm whole societies through the exploitation of critical infrastructure. The appl…
▽ More
With the advent of machine learning in applications of critical infrastructure such as healthcare and energy, privacy is a growing concern in the minds of stakeholders. It is pivotal to ensure that neither the model nor the data can be used to extract sensitive information used by attackers against individuals or to harm whole societies through the exploitation of critical infrastructure. The applicability of machine learning in these domains is mostly limited due to a lack of trust regarding the transparency and the privacy constraints. Various safety-critical use cases (mostly relying on time-series data) are currently underrepresented in privacy-related considerations. By evaluating several privacy-preserving methods regarding their applicability on time-series data, we validated the inefficacy of encryption for deep learning, the strong dataset dependence of differential privacy, and the broad applicability of federated methods.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
FedPrune: Towards Inclusive Federated Learning
Authors:
Muhammad Tahir Munir,
Muhammad Mustansar Saeed,
Mahad Ali,
Zafar Ayyub Qazi,
Ihsan Ayyub Qazi
Abstract:
Federated learning (FL) is a distributed learning technique that trains a shared model over distributed data in a privacy-preserving manner. Unfortunately, FL's performance degrades when there is (i) variability in client characteristics in terms of computational and memory resources (system heterogeneity) and (ii) non-IID data distribution across clients (statistical heterogeneity). For example,…
▽ More
Federated learning (FL) is a distributed learning technique that trains a shared model over distributed data in a privacy-preserving manner. Unfortunately, FL's performance degrades when there is (i) variability in client characteristics in terms of computational and memory resources (system heterogeneity) and (ii) non-IID data distribution across clients (statistical heterogeneity). For example, slow clients get dropped in FL schemes, such as Federated Averaging (FedAvg), which not only limits overall learning but also biases results towards fast clients. We propose FedPrune; a system that tackles this challenge by pruning the global model for slow clients based on their device characteristics. By doing so, slow clients can train a small model quickly and participate in FL which increases test accuracy as well as fairness. By using insights from Central Limit Theorem, FedPrune incorporates a new aggregation technique that achieves robust performance over non-IID data. Experimental evaluation shows that Fed- Prune provides robust convergence and better fairness compared to Federated Averaging.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection
Authors:
Muhammad Akhtar Munir,
Muhammad Haris Khan,
M. Saquib Sarfraz,
Mohsen Ali
Abstract:
We study adapting trained object detectors to unseen domains manifesting significant variations of object appearance, viewpoints and backgrounds. Most current methods align domains by either using image or instance-level feature alignment in an adversarial fashion. This often suffers due to the presence of unwanted background and as such lacks class-specific alignment. A common remedy to promote c…
▽ More
We study adapting trained object detectors to unseen domains manifesting significant variations of object appearance, viewpoints and backgrounds. Most current methods align domains by either using image or instance-level feature alignment in an adversarial fashion. This often suffers due to the presence of unwanted background and as such lacks class-specific alignment. A common remedy to promote class-level alignment is to use high confidence predictions on the unlabelled domain as pseudo labels. These high confidence predictions are often fallacious since the model is poorly calibrated under domain shift. In this paper, we propose to leverage model predictive uncertainty to strike the right balance between adversarial feature alignment and class-level alignment. Specifically, we measure predictive uncertainty on class assignments and the bounding box predictions. Model predictions with low uncertainty are used to generate pseudo-labels for self-supervision, whereas the ones with higher uncertainty are used to generate tiles for an adversarial feature alignment stage. This synergy between tiling around the uncertain object regions and generating pseudo-labels from highly certain object regions allows us to capture both the image and instance level context during the model adaptation stage. We perform extensive experiments covering various domain shift scenarios. Our approach improves upon existing state-of-the-art methods with visible margins.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Risk Adversarial Learning System for Connected and Autonomous Vehicle Charging
Authors:
Md. Shirajum Munir,
Ki Tae Kim,
Kyi Thar,
Dusit Niyato,
Choong Seon Hong
Abstract:
In this paper, the design of a rational decision support system (RDSS) for a connected and autonomous vehicle charging infrastructure (CAV-CI) is studied. In the considered CAV-CI, the distribution system operator (DSO) deploys electric vehicle supply equipment (EVSE) to provide an EV charging facility for human-driven connected vehicles (CVs) and autonomous vehicles (AVs). The charging request by…
▽ More
In this paper, the design of a rational decision support system (RDSS) for a connected and autonomous vehicle charging infrastructure (CAV-CI) is studied. In the considered CAV-CI, the distribution system operator (DSO) deploys electric vehicle supply equipment (EVSE) to provide an EV charging facility for human-driven connected vehicles (CVs) and autonomous vehicles (AVs). The charging request by the human-driven EV becomes irrational when it demands more energy and charging period than its actual need. Therefore, the scheduling policy of each EVSE must be adaptively accumulated the irrational charging request to satisfy the charging demand of both CVs and AVs. To tackle this, we formulate an RDSS problem for the DSO, where the objective is to maximize the charging capacity utilization by satisfying the laxity risk of the DSO. Thus, we devise a rational reward maximization problem to adapt the irrational behavior by CVs in a data-informed manner. We propose a novel risk adversarial multi-agent learning system (RAMALS) for CAV-CI to solve the formulated RDSS problem. In RAMALS, the DSO acts as a centralized risk adversarial agent (RAA) for informing the laxity risk to each EVSE. Subsequently, each EVSE plays the role of a self-learner agent to adaptively schedule its own EV sessions by coping advice from RAA. Experiment results show that the proposed RAMALS affords around 46.6% improvement in charging rate, about 28.6% improvement in the EVSE's active charging time and at least 33.3% more energy utilization, as compared to a currently deployed ACN EVSE system, and other baselines.
△ Less
Submitted 2 February, 2022; v1 submitted 1 August, 2021;
originally announced August 2021.
-
XAI Handbook: Towards a Unified Framework for Explainable AI
Authors:
Sebastian Palacio,
Adriano Lucieri,
Mohsin Munir,
Jörn Hees,
Sheraz Ahmed,
Andreas Dengel
Abstract:
The field of explainable AI (XAI) has quickly become a thriving and prolific community. However, a silent, recurrent and acknowledged issue in this area is the lack of consensus regarding its terminology. In particular, each new contribution seems to rely on its own (and often intuitive) version of terms like "explanation" and "interpretation". Such disarray encumbers the consolidation of advances…
▽ More
The field of explainable AI (XAI) has quickly become a thriving and prolific community. However, a silent, recurrent and acknowledged issue in this area is the lack of consensus regarding its terminology. In particular, each new contribution seems to rely on its own (and often intuitive) version of terms like "explanation" and "interpretation". Such disarray encumbers the consolidation of advances in the field towards the fulfillment of scientific and regulatory demands e.g., when comparing methods or establishing their compliance with respect to biases and fairness constraints. We propose a theoretical framework that not only provides concrete definitions for these terms, but it also outlines all steps necessary to produce explanations and interpretations. The framework also allows for existing contributions to be re-contextualized such that their scope can be measured, thus making them comparable to other methods. We show that this framework is compliant with desiderata on explanations, on interpretability and on evaluation metrics. We present a use-case showing how the framework can be used to compare LIME, SHAP and MDNet, establishing their advantages and shortcomings. Finally, we discuss relevant trends in XAI as well as recommendations for future work, all from the standpoint of our framework.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Drive Safe: Cognitive-Behavioral Mining for Intelligent Transportation Cyber-Physical System
Authors:
Md. Shirajum Munir,
Sarder Fakhrul Abedin,
Ki Tae Kim,
Do Hyeon Kim,
Md. Golam Rabiul Alam,
Choong Seon Hong
Abstract:
This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further,…
▽ More
This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further, we develop a prototype of the proposed drive safe platform to establish proof-of-concept (PoC) for the road safety in IT-CPS. In the developed driving safety platform, we employ five AI and statistical-based models to infer a vehicle driver's cognitive-behavioral mining to ensure safe driving during the drive. Especially, capsule network (CN), maximum likelihood (ML), convolutional neural network (CNN), Apriori algorithm, and Bayesian network (BN) are deployed for driver activity recognition, environmental feature extraction, mood recognition, sequential pattern mining, and content recommendation for affective mood repairment of the driver, respectively. Besides, we develop a communication module to interact with the systems in IT-CPS asynchronously. Thus, the developed drive safe PoC can guide the vehicle drivers when they are distracted from driving due to the cognitive-behavioral factors. Finally, we have performed a qualitative evaluation to measure the usability and effectiveness of the developed drive safe platform. We observe that the P-value is 0.0041 (i.e., < 0.05) in the ANOVA test. Moreover, the confidence interval analysis also shows significant gains in prevalence value which is around 0.93 for a 95% confidence level. The aforementioned statistical results indicate high reliability in terms of driver's safety and mental state.
△ Less
Submitted 23 August, 2020;
originally announced August 2020.
-
Controlling the Outbreak of COVID-19: A Noncooperative Game Perspective
Authors:
Anupam Kumar Bairagi,
Mehedi Masud,
Do Hyeon Kim,
Md. Shirajum Munir,
Abdullah Al Nahid,
Sarder Fakhrul Abedin,
Kazi Masudul Alam,
Sujit Biswas,
Sultan S Alshamrani,
Zhu Han,
Choong Seon Hong
Abstract:
COVID-19 is a global epidemic. Till now, there is no remedy for this epidemic. However, isolation and social distancing are seemed to be effective preventive measures to control this pandemic. Therefore, in this paper, an optimization problem is formulated that accommodates both isolation and social distancing features of the individuals. To promote social distancing, we solve the formulated probl…
▽ More
COVID-19 is a global epidemic. Till now, there is no remedy for this epidemic. However, isolation and social distancing are seemed to be effective preventive measures to control this pandemic. Therefore, in this paper, an optimization problem is formulated that accommodates both isolation and social distancing features of the individuals. To promote social distancing, we solve the formulated problem by applying a noncooperative game that can provide an incentive for maintaining social distancing to prevent the spread of COVID-19. Furthermore, the sustainability of the lockdown policy is interpreted with the help of our proposed game-theoretic incentive model for maintaining social distancing where there exists a Nash equilibrium. Finally, we perform an extensive numerical analysis that shows the effectiveness of the proposed approach in terms of achieving the desired social-distancing to prevent the outbreak of the COVID-19 in a noncooperative environment. Numerical results show that the individual incentive increases more than 85% with an increasing percentage of home isolation from 25% to 100% for all considered scenarios. The numerical results also demonstrate that in a particular percentage of home isolation, the individual incentive decreases with an increasing number of individuals.
△ Less
Submitted 26 November, 2020; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Localizing Firearm Carriers by Identifying Human-Object Pairs
Authors:
Abdul Basit,
Muhammad Akhtar Munir,
Mohsen Ali,
Arif Mahmood
Abstract:
Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allow…
▽ More
Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allowing us to create a paired bounding box that contains both object and the human. A network is trained to classify these paired-bounding-boxes into human carrying the identified firearm or not. Extensive experiments were performed to evaluate effectiveness of the algorithm, including exploiting full pose of the human, hand key-points, and their association with the firearm. The knowledge of spatially localized features is key to success of our method by using multi-size proposals with adaptive average pooling. We have also extended a previously firearm detection dataset, by adding more images and tagging in extended dataset the human-firearm pairs (including bounding boxes for firearms and gunmen). The experimental results ($AP_{hold} = 78.5$) demonstrate effectiveness of the proposed method.
△ Less
Submitted 20 May, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach
Authors:
Sarder Fakhrul Abedin,
Md. Shirajum Munir,
Nguyen H. Tran,
Zhu Han,
Choong Seon Hong
Abstract:
In this paper, we design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed to improve the data freshness and connectivity to the Internet of Things (IoT) devices. First, we formulate an energy-efficient trajectory optimization problem in which the objective is to maximize the energy efficiency by optimizing the UAV-BS trajectory policy.…
▽ More
In this paper, we design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed to improve the data freshness and connectivity to the Internet of Things (IoT) devices. First, we formulate an energy-efficient trajectory optimization problem in which the objective is to maximize the energy efficiency by optimizing the UAV-BS trajectory policy. We also incorporate different contextual information such as energy and age of information (AoI) constraints to ensure the data freshness at the ground BS. Second, we propose an agile deep reinforcement learning with experience replay model to solve the formulated problem concerning the contextual constraints for the UAV-BS navigation. Moreover, the proposed approach is well-suited for solving the problem, since the state space of the problem is extremely large and finding the best trajectory policy with useful contextual features is too complex for the UAV-BSs. By applying the proposed trained model, an effective real-time trajectory policy for the UAV-BSs captures the observable network states over time. Finally, the simulation results illustrate the proposed approach is 3.6% and 3.13% more energy efficient than those of the greedy and baseline deep Q Network (DQN) approaches.
△ Less
Submitted 21 February, 2020;
originally announced March 2020.
-
Coexistence Mechanism between eMBB and uRLLC in 5G Wireless Networks
Authors:
Anupam Kumar Bairagi,
Md. Shirajum Munir,
Madyan Alsenwi,
Nguyen H. Tran,
Sultan S Alshamrani,
Mehedi Masud,
Zhu Han,
Choong Seon Hong
Abstract:
uRLLC and eMBB are two influential services of the emerging 5G cellular network. Latency and reliability are major concerns for uRLLC applications, whereas eMBB services claim for the maximum data rates. Owing to the trade-off among latency, reliability and spectral efficiency, sharing of radio resources between eMBB and uRLLC services, heads to a challenging scheduling dilemma. In this paper, we…
▽ More
uRLLC and eMBB are two influential services of the emerging 5G cellular network. Latency and reliability are major concerns for uRLLC applications, whereas eMBB services claim for the maximum data rates. Owing to the trade-off among latency, reliability and spectral efficiency, sharing of radio resources between eMBB and uRLLC services, heads to a challenging scheduling dilemma. In this paper, we study the co-scheduling problem of eMBB and uRLLC traffic based upon the puncturing technique. Precisely, we formulate an optimization problem aiming to maximize the MEAR of eMBB UEs while fulfilling the provisions of the uRLLC traffic. We decompose the original problem into two sub-problems, namely scheduling problem of eMBB UEs and uRLLC UEs while prevailing objective unchanged. Radio resources are scheduled among the eMBB UEs on a time slot basis, whereas it is handled for uRLLC UEs on a mini-slot basis. Moreover, for resolving the scheduling issue of eMBB UEs, we use PSUM based algorithm, whereas the optimal TM is adopted for solving the same problem of uRLLC UEs. Furthermore, a heuristic algorithm is also provided to solve the first sub-problem with lower complexity. Finally, the significance of the proposed approach over other baseline approaches is established through numerical analysis in terms of the MEAR and fairness scores of the eMBB UEs.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A Multi-Agent Deep Reinforcement Learning Approach
Authors:
Md. Shirajum Munir,
Sarder Fakhrul Abedin,
Nguyen H. Tran,
Zhu Han,
Eui-Nam Huh,
Choong Seon Hong
Abstract:
In recent years, multi-access edge computing (MEC) is a key enabler for handling the massive expansion of Internet of Things (IoT) applications and services. However, energy consumption of a MEC network depends on volatile tasks that induces risk for energy demand estimations. As an energy supplier, a microgrid can facilitate seamless energy supply. However, the risk associated with energy supply…
▽ More
In recent years, multi-access edge computing (MEC) is a key enabler for handling the massive expansion of Internet of Things (IoT) applications and services. However, energy consumption of a MEC network depends on volatile tasks that induces risk for energy demand estimations. As an energy supplier, a microgrid can facilitate seamless energy supply. However, the risk associated with energy supply is also increased due to unpredictable energy generation from renewable and non-renewable sources. Especially, the risk of energy shortfall is involved with uncertainties in both energy consumption and generation. In this paper, we study a risk-aware energy scheduling problem for a microgrid-powered MEC network. First, we formulate an optimization problem considering the conditional value-at-risk (CVaR) measurement for both energy consumption and generation, where the objective is to minimize the expected residual of scheduled energy for the MEC networks and we show this problem is an NP-hard problem. Second, we analyze our formulated problem using a multi-agent stochastic game that ensures the joint policy Nash equilibrium, and show the convergence of the proposed model. Third, we derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based asynchronous advantage actor-critic (A3C) algorithm with shared neural networks. This method mitigates the curse of dimensionality of the state space and chooses the best policy among the agents for the proposed problem. Finally, the experimental results establish a significant performance gain by considering CVaR for high accuracy energy scheduling of the proposed model than both the single and random agent models.
△ Less
Submitted 5 January, 2021; v1 submitted 20 February, 2020;
originally announced March 2020.
-
Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable Edge Computing Systems
Authors:
Md. Shirajum Munir,
Nguyen H. Tran,
Walid Saad,
Choong Seon Hong
Abstract:
The stringent requirements of mobile edge computing (MEC) applications and functions fathom the high capacity and dense deployment of MEC hosts to the upcoming wireless networks. However, operating such high capacity MEC hosts can significantly increase energy consumption. Thus, a base station (BS) unit can act as a self-powered BS. In this paper, an effective energy dispatch mechanism for self-po…
▽ More
The stringent requirements of mobile edge computing (MEC) applications and functions fathom the high capacity and dense deployment of MEC hosts to the upcoming wireless networks. However, operating such high capacity MEC hosts can significantly increase energy consumption. Thus, a base station (BS) unit can act as a self-powered BS. In this paper, an effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied. First, a two-stage linear stochastic programming problem is formulated with the goal of minimizing the total energy consumption cost of the system while fulfilling the energy demand. Second, a semi-distributed data-driven solution is proposed by developing a novel multi-agent meta-reinforcement learning (MAMRL) framework to solve the formulated problem. In particular, each BS plays the role of a local agent that explores a Markovian behavior for both energy consumption and generation while each BS transfers time-varying features to a meta-agent. Sequentially, the meta-agent optimizes (i.e., exploits) the energy dispatch decision by accepting only the observations from each local agent with its own state information. Meanwhile, each BS agent estimates its own energy dispatch policy by applying the learned parameters from meta-agent. Finally, the proposed MAMRL framework is benchmarked by analyzing deterministic, asymmetric, and stochastic environments in terms of non-renewable energy usages, energy cost, and accuracy. Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost (with 95.8% prediction accuracy), compared to other baseline methods.
△ Less
Submitted 9 February, 2021; v1 submitted 19 February, 2020;
originally announced February 2020.
-
TSXplain: Demystification of DNN Decisions for Time-Series using Natural Language and Statistical Features
Authors:
Mohsin Munir,
Shoaib Ahmed Siddiqui,
Ferdinand Küsters,
Dominique Mercier,
Andreas Dengel,
Sheraz Ahmed
Abstract:
Neural networks (NN) are considered as black-boxes due to the lack of explainability and transparency of their decisions. This significantly hampers their deployment in environments where explainability is essential along with the accuracy of the system. Recently, significant efforts have been made for the interpretability of these deep networks with the aim to open up the black-box. However, most…
▽ More
Neural networks (NN) are considered as black-boxes due to the lack of explainability and transparency of their decisions. This significantly hampers their deployment in environments where explainability is essential along with the accuracy of the system. Recently, significant efforts have been made for the interpretability of these deep networks with the aim to open up the black-box. However, most of these approaches are specifically developed for visual modalities. In addition, the interpretations provided by these systems require expert knowledge and understanding for intelligibility. This indicates a vital gap between the explainability provided by the systems and the novice user. To bridge this gap, we present a novel framework i.e. Time-Series eXplanation (TSXplain) system which produces a natural language based explanation of the decision taken by a NN. It uses the extracted statistical features to describe the decision of a NN, merging the deep learning world with that of statistics. The two-level explanation provides ample description of the decision made by the network to aid an expert as well as a novice user alike. Our survey and reliability assessment test confirm that the generated explanations are meaningful and correct. We believe that generating natural language based descriptions of the network's decisions is a big step towards opening up the black-box.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Leveraging Orientation for Weakly Supervised Object Detection with Application to Firearm Localization
Authors:
Javed Iqbal,
Muhammad Akhtar Munir,
Arif Mahmood,
Afsheen Rafaqat Ali,
Mohsen Ali
Abstract:
Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size, and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrele…
▽ More
Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size, and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrelevant details suppressing the required object signatures. To handle these challenges, we propose a weakly supervised Orientation Aware Object Detection (OAOD) algorithm which learns to detect oriented object bounding boxes (OBB) while using AxisAligned Bounding Boxes (AABB) for training. The proposed OAOD is different from the existing oriented object detectors which strictly require OBB during training which may not always be present. The goal of training on AABB and detection of OBB is achieved by employing a multistage scheme, with Stage-1 predicting the AABB and Stage-2 predicting OBB. In-between the two stages, the oriented proposal generation module along with the object aligned RoI pooling is designed to extract features based on the predicted orientation and to make these features orientation invariant. A diverse and challenging dataset consisting of eleven thousand images is also proposed for firearm detection which is manually annotated for firearm classification and localization. The proposed ITU Firearm dataset (ITUF) contains a wide range of guns and rifles. The OAOD algorithm is evaluated on the ITUF dataset and compared with current state-of-the-art object detectors, including fully supervised oriented object detectors. OAOD has outperformed both types of object detectors with a significant margin. The experimental results (mAP: 88.3 on AABB & mAP: 77.5 on OBB) demonstrate the effectiveness of the proposed algorithm for firearm detection.
△ Less
Submitted 29 January, 2021; v1 submitted 22 April, 2019;
originally announced April 2019.
-
TSViz: Demystification of Deep Learning Models for Time-Series Analysis
Authors:
Shoaib Ahmed Siddiqui,
Dominik Mercier,
Mohsin Munir,
Andreas Dengel,
Sheraz Ahmed
Abstract:
This paper presents a novel framework for demystification of convolutional deep learning models for time-series analysis. This is a step towards making informed/explainable decisions in the domain of time-series, powered by deep learning. There have been numerous efforts to increase the interpretability of image-centric deep neural network models, where the learned features are more intuitive to v…
▽ More
This paper presents a novel framework for demystification of convolutional deep learning models for time-series analysis. This is a step towards making informed/explainable decisions in the domain of time-series, powered by deep learning. There have been numerous efforts to increase the interpretability of image-centric deep neural network models, where the learned features are more intuitive to visualize. Visualization in time-series domain is much more complicated as there is no direct interpretation of the filters and inputs as compared to the image modality. In addition, little or no concentration has been devoted for the development of such tools in the domain of time-series in the past. TSViz provides possibilities to explore and analyze a network from different dimensions at different levels of abstraction which includes identification of parts of the input that were responsible for a prediction (including per filter saliency), importance of different filters present in the network for a particular prediction, notion of diversity present in the network through filter clustering, understanding of the main sources of variation learnt by the network through inverse optimization, and analysis of the network's robustness against adversarial noise. As a sanity check for the computed influence values, we demonstrate results regarding pruning of neural networks based on the computed influence information. These representations allow to understand the network features so that the acceptability of deep networks for time-series data can be enhanced. This is extremely important in domains like finance, industry 4.0, self-driving cars, health-care, counter-terrorism etc., where reasons for reaching a particular prediction are equally important as the prediction itself. We assess the proposed framework for interpretability with a set of desirable properties essential for any method.
△ Less
Submitted 5 May, 2020; v1 submitted 8 February, 2018;
originally announced February 2018.
-
Secure Debit Card Device Model
Authors:
Ms. Rumaisah Munir
Abstract:
The project envisages the implementation of an e-payment system utilizing FIPS-201 Smart Card. The system combines hardware and software modules. The hardware module takes data insertions (e.g. currency notes), processes the data and then creates connection with the smart card using serial/USB ports to perform further mathematical manipulations. The hardware interacts with servers at the back for…
▽ More
The project envisages the implementation of an e-payment system utilizing FIPS-201 Smart Card. The system combines hardware and software modules. The hardware module takes data insertions (e.g. currency notes), processes the data and then creates connection with the smart card using serial/USB ports to perform further mathematical manipulations. The hardware interacts with servers at the back for authentication and identification of users and for data storage pertaining to a particular user. The software module manages database, handles identities, provide authentication and secure communication between the various system components. It will also provide a component to the end users. This component can be in the form of software for computer or executable binaries for PoS devices. The idea is to receive data in the embedded system from data reader and smart card. After manipulations, the updated data is imprinted on smart card memory and also updated in the back end servers maintaining database. The information to be sent to a server is sent through a PoS device which has multiple transfer mediums involving wired and un-wired mediums. The user device also acts as an updater; therefore, whenever the smart card is inserted by user, it is automatically updated by synchronizing with back-end database. The project required expertise in embedded systems, networks, java and C++ (Optional).
△ Less
Submitted 2 February, 2014;
originally announced February 2014.