-
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Authors:
Mir Rayat Imtiaz Hossain,
Mennatullah Siam,
Leonid Sigal,
James J. Little
Abstract:
The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we exa…
▽ More
The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we introduce a unidirectional causal attention mechanism between the novel prompts, learned with limited examples, and the base prompts, learned with abundant data. This mechanism enriches the novel prompts without deteriorating the base class performance. Overall, this form of prompting helps us achieve state-of-the-art performance for GFSS on two different benchmark datasets: COCO-$20^i$ and Pascal-$5^i$, without the need for test-time optimization (or transduction). Furthermore, test-time optimization leveraging unlabelled test data can be used to improve the prompts, which we refer to as transductive prompt tuning.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Brain-Inspired Reservoir Computing Using Memristors with Tunable Dynamics and Short-Term Plasticity
Authors:
Nicholas X. Armendarez,
Ahmed S. Mohamed,
Anurag Dhungel,
Md Razuan Hossain,
Md Sakib Hasan,
Joseph S. Najem
Abstract:
Recent advancements in reservoir computing research have created a demand for analog devices with dynamics that can facilitate the physical implementation of reservoirs, promising faster information processing while consuming less energy and occupying a smaller area footprint. Studies have demonstrated that dynamic memristors, with nonlinear and short-term memory dynamics, are excellent candidates…
▽ More
Recent advancements in reservoir computing research have created a demand for analog devices with dynamics that can facilitate the physical implementation of reservoirs, promising faster information processing while consuming less energy and occupying a smaller area footprint. Studies have demonstrated that dynamic memristors, with nonlinear and short-term memory dynamics, are excellent candidates as information-processing devices or reservoirs for temporal classification and prediction tasks. Previous implementations relied on nominally identical memristors that applied the same nonlinear transformation to the input data, which is not enough to achieve a rich state space. To address this limitation, researchers either diversified the data encoding across multiple memristors or harnessed the stochastic device-to-device variability among the memristors. However, this approach requires additional pre-processing steps and leads to synchronization issues. Instead, it is preferable to encode the data once and pass it through a reservoir layer consisting of memristors with distinct dynamics. Here, we demonstrate that ion-channel-based memristors with voltage-dependent dynamics can be controllably and predictively tuned through voltage or adjustment of the ion channel concentration to exhibit diverse dynamic properties. We show, through experiments and simulations, that reservoir layers constructed with a small number of distinct memristors exhibit significantly higher predictive and classification accuracies with a single data encoding. We found that for a second-order nonlinear dynamical system prediction task, the varied memristor reservoir experimentally achieved a normalized mean square error of 0.0015 using only five distinct memristors. Moreover, in a neural activity classification task, a reservoir of just three distinct memristors experimentally attained an accuracy of 96.5%.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Biomembrane-based Memcapacitive Reservoir Computing System for Energy Efficient Temporal Data Processing
Authors:
Md Razuan Hossain,
Ahmed Salah Mohamed,
Nicholas Xavier Armendarez,
Joseph S. Najem,
Md Sakib Hasan
Abstract:
Reservoir computing is a highly efficient machine learning framework for processing temporal data by extracting features from the input signal and mapping them into higher dimensional spaces. Physical reservoir layers have been realized using spintronic oscillators, atomic switch networks, silicon photonic modules, ferroelectric transistors, and volatile memristors. However, these devices are intr…
▽ More
Reservoir computing is a highly efficient machine learning framework for processing temporal data by extracting features from the input signal and mapping them into higher dimensional spaces. Physical reservoir layers have been realized using spintronic oscillators, atomic switch networks, silicon photonic modules, ferroelectric transistors, and volatile memristors. However, these devices are intrinsically energy-dissipative due to their resistive nature, which leads to increased power consumption. Therefore, capacitive memory devices can provide a more energy-efficient approach. Here, we leverage volatile biomembrane-based memcapacitors that closely mimic certain short-term synaptic plasticity functions as reservoirs to solve classification tasks and analyze time-series data in simulation and experimentally. Our system achieves a 99.6% accuracy rate for spoken digit classification and a normalized mean square error of 7.81*10^{-4} in a second-order non-linear regression task. Furthermore, to showcase the device's real-time temporal data processing capability, we achieve 100% accuracy for a real-time epilepsy detection problem from an inputted electroencephalography (EEG) signal. Most importantly, we demonstrate that each memcapacitor consumes an average of 41.5 fJ of energy per spike, regardless of the selected input voltage pulse width, while maintaining an average power of 415 fW for a pulse width of 100 ms. These values are orders of magnitude lower than those achieved by state-of-the-art memristors used as reservoirs. Lastly, we believe the biocompatible, soft nature of our memcapacitor makes it highly suitable for computing and signal-processing applications in biological environments.
△ Less
Submitted 15 November, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Machine Learning Techniques for Estimating Soil Moisture from Mobile Captured Images
Authors:
Muhammad Riaz Hasib Hossain,
Muhammad Ashad Kabir
Abstract:
Precise Soil Moisture (SM) assessment is essential in agriculture. By understanding the level of SM, we can improve yield irrigation scheduling which significantly impacts food production and other needs of the global population. The advancements in smartphone technologies and computer vision have demonstrated a non-destructive nature of soil properties, including SM. The study aims to analyze the…
▽ More
Precise Soil Moisture (SM) assessment is essential in agriculture. By understanding the level of SM, we can improve yield irrigation scheduling which significantly impacts food production and other needs of the global population. The advancements in smartphone technologies and computer vision have demonstrated a non-destructive nature of soil properties, including SM. The study aims to analyze the existing Machine Learning (ML) techniques for estimating SM from soil images and understand the moisture accuracy using different smartphones and various sunlight conditions. Therefore, 629 images of 38 soil samples were taken from seven areas in Sydney, Australia, and split into four datasets based on the image-capturing devices used (iPhone 6s and iPhone 11 Pro) and the lighting circumstances (direct and indirect sunlight). A comparison between Multiple Linear Regression (MLR), Support Vector Regression (SVR), and Convolutional Neural Network (CNN) was presented. MLR was performed with higher accuracy using holdout cross-validation, where the images were captured in indirect sunlight with the Mean Absolute Error (MAE) value of 0.35, Root Mean Square Error (RMSE) value of 0.15, and R^2 value of 0.60. Nevertheless, SVR was better with MAE, RMSE, and R^2 values of 0.05, 0.06, and 0.96 for 10-fold cross-validation and 0.22, 0.06, and 0.95 for leave-one-out cross-validation when images were captured in indirect sunlight. It demonstrates a smartphone camera's potential for predicting SM by utilizing ML. In the future, software developers can develop mobile applications based on the research findings for accurate, easy, and rapid SM estimation.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Framework-agnostic Semantically-aware Global Reasoning for Segmentation
Authors:
Mir Rayat Imtiaz Hossain,
Leonid Sigal,
James J. Little
Abstract:
Recent advances in pixel-level tasks (e.g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features. However, such aggregated representations, often in the form of attention, fail to model the underlying semantics of the scene (e.g. individual objects and, by extension, their interactions). In this work, we a…
▽ More
Recent advances in pixel-level tasks (e.g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features. However, such aggregated representations, often in the form of attention, fail to model the underlying semantics of the scene (e.g. individual objects and, by extension, their interactions). In this work, we address the issue by proposing a component that learns to project image features into latent representations and reason between them using a transformer encoder to generate contextualized and scene-consistent representations which are fused with original image features. Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint and the union of such regions corresponds to a connected object segment. The proposed semantic global reasoning (SGR) component is end-to-end trainable and can be easily added to a wide variety of backbones (CNN or transformer-based) and segmentation heads (per-pixel or mask classification) to consistently improve the segmentation results on different datasets. In addition, our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks like object detection and segmentation, with improved performance. Furthermore, we also proposed metrics to quantify the semantics of latent tokens at both class \& instance level.
△ Less
Submitted 17 April, 2024; v1 submitted 6 December, 2022;
originally announced December 2022.
-
A Privacy-preserving Mobile and Fog Computing Framework to Trace and Prevent COVID-19 Community Transmission
Authors:
Md Whaiduzzaman,
Md. Razon Hossain,
Ahmedur Rahman Shovon,
Shanto Roy,
Aron Laszka,
Rajkumar Buyya,
Alistair Barros
Abstract:
To slow down the spread of COVID-19, governments around the world are trying to identify infected people and to contain the virus by enforcing isolation and quarantine. However, it is difficult to trace people who came into contact with an infected person, which causes widespread community transmission and mass infection. To address this problem, we develop an e-government Privacy Preserving Mobil…
▽ More
To slow down the spread of COVID-19, governments around the world are trying to identify infected people and to contain the virus by enforcing isolation and quarantine. However, it is difficult to trace people who came into contact with an infected person, which causes widespread community transmission and mass infection. To address this problem, we develop an e-government Privacy Preserving Mobile and Fog computing framework entitled PPMF that can trace infected and suspected cases nationwide. We use personal mobile devices with contact tracing app and two types of stationary fog nodes, named Automatic Risk Checkers (ARC) and Suspected User Data Uploader Node (SUDUN), to trace community transmission alongside maintaining user data privacy. Each user's mobile device receives a Unique Encrypted Reference Code (UERC) when registering on the central application. The mobile device and the central application both generate Rotational Unique Encrypted Reference Code (RUERC), which broadcasted using the Bluetooth Low Energy (BLE) technology. The ARCs are placed at the entry points of buildings, which can immediately detect if there are positive or suspected cases nearby. If any confirmed case is found, the ARCs broadcast pre-cautionary messages to nearby people without revealing the identity of the infected person. The SUDUNs are placed at the health centers that report test results to the central cloud application. The reported data is later used to map between infected and suspected cases. Therefore, using our proposed PPMF framework, governments can let organizations continue their economic activities without complete lockdown.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Exploiting temporal information for 3D pose estimation
Authors:
Mir Rayat Imtiaz Hossain,
James J. Little
Abstract:
In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to predict from images directly, the top-performing approaches have shown the effectiveness of dividing the task of 3D pose estimation into two steps: using a s…
▽ More
In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to predict from images directly, the top-performing approaches have shown the effectiveness of dividing the task of 3D pose estimation into two steps: using a state-of-the-art 2D pose estimator to estimate the 2D pose from images and then mapping them into 3D space. They also showed that a low-dimensional representation like 2D locations of a set of joints can be discriminative enough to estimate 3D pose with high accuracy. However, estimation of 3D pose for individual frames leads to temporally incoherent estimates due to independent error in each frame causing jitter. Therefore, in this work we utilize the temporal information across a sequence of 2D joint locations to estimate a sequence of 3D poses. We designed a sequence-to-sequence network composed of layer-normalized LSTM units with shortcut connections connecting the input to the output on the decoder side and imposed temporal smoothness constraint during training. We found that the knowledge of temporal consistency improves the best reported result on Human3.6M dataset by approximately $12.2\%$ and helps our network to recover temporally consistent 3D poses over a sequence of images even when the 2D pose detector fails.
△ Less
Submitted 12 September, 2018; v1 submitted 23 November, 2017;
originally announced November 2017.
-
PC Guided Automatic Vehicle System
Authors:
M. A. A. Mashud,
M. R. Hossain,
Mustari Zaman,
M. A. Razzaque
Abstract:
The main objective of this paper is to design and develop an automatic vehicle, fully controlled by a computer system. The vehicle designed in the present work can move in a pre-determined path and work automatically without the need of any human operator and it also controlled by human operator. Such a vehicle is capable of performing wide variety of difficult tasks in space research, domestic, s…
▽ More
The main objective of this paper is to design and develop an automatic vehicle, fully controlled by a computer system. The vehicle designed in the present work can move in a pre-determined path and work automatically without the need of any human operator and it also controlled by human operator. Such a vehicle is capable of performing wide variety of difficult tasks in space research, domestic, scientific and industrial fields. For this purpose, an IBM compatible PC with Pentium microprocessor has been used which performed the function of the system controller. Its parallel printer port has been used as data communication port to interface the vehicle. A suitable software program has been developed for the system controller to send commands to the vehicle.
△ Less
Submitted 6 January, 2015;
originally announced January 2015.
-
Comparative Study of Different Guard Time Intervals to Improve the BER Performance of Wimax Systems to Minimize the Effects of ISI and ICI under Adaptive Modulation Techniques over SUI1 and AWGN Communication Channels
Authors:
Md. Zahid Hasan,
Mohammad Reaz Hossain,
Md. Ashraful Islam,
Riaz Hossain
Abstract:
The WIMAX technology based on air interface standard 802.16 wireless MAN is configured in the same way as a traditional cellular network with base stations using point to multipoint architecture to drive a service over a radius up to several kilometers. The range and the Non Line of Sight (NLOS) ability of WIMAX make the system very attractive for users, but there will be slightly higher BER at…
▽ More
The WIMAX technology based on air interface standard 802.16 wireless MAN is configured in the same way as a traditional cellular network with base stations using point to multipoint architecture to drive a service over a radius up to several kilometers. The range and the Non Line of Sight (NLOS) ability of WIMAX make the system very attractive for users, but there will be slightly higher BER at low SNR. The aim of this paper is the comparative study of different guard time intervals effect for improving BER at different SNR under digital modulation (QPSK, 16QAM and 64QAM) techniques and different communication channels AWGN and fading channels Stanford University Interim (SUI 1) of an WIMAX system. The comparison between these effects with Reed-Solomon (RS) encoder with Convolutional encoder (half) rated codes in FEC channel coding will be investigated. The simulation results of estimated Bit Error Rate (BER) displays that the implementation of interleaved RS code (255,239,8) with (half) rated Convolutional code of 0.25 guard time intervals under QPSK modulation technique over AWGN channel is highly effective to combat in the Wimax communication system. To complete this performance analysis in Wimax based systems, a segment of audio signal is used for analysis. The transmitted audio message is found to have retrieved effectively under noisy situation.
△ Less
Submitted 4 December, 2009;
originally announced December 2009.