Search | arXiv e-print repository

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Authors: Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little

Abstract: The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we exa… ▽ More The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we introduce a unidirectional causal attention mechanism between the novel prompts, learned with limited examples, and the base prompts, learned with abundant data. This mechanism enriches the novel prompts without deteriorating the base class performance. Overall, this form of prompting helps us achieve state-of-the-art performance for GFSS on two different benchmark datasets: COCO-$20^i$ and Pascal-$5^i$, without the need for test-time optimization (or transduction). Furthermore, test-time optimization leveraging unlabelled test data can be used to improve the prompts, which we refer to as transductive prompt tuning. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024

arXiv:2310.16331 [pdf, other]

Brain-Inspired Reservoir Computing Using Memristors with Tunable Dynamics and Short-Term Plasticity

Authors: Nicholas X. Armendarez, Ahmed S. Mohamed, Anurag Dhungel, Md Razuan Hossain, Md Sakib Hasan, Joseph S. Najem

Abstract: Recent advancements in reservoir computing research have created a demand for analog devices with dynamics that can facilitate the physical implementation of reservoirs, promising faster information processing while consuming less energy and occupying a smaller area footprint. Studies have demonstrated that dynamic memristors, with nonlinear and short-term memory dynamics, are excellent candidates… ▽ More Recent advancements in reservoir computing research have created a demand for analog devices with dynamics that can facilitate the physical implementation of reservoirs, promising faster information processing while consuming less energy and occupying a smaller area footprint. Studies have demonstrated that dynamic memristors, with nonlinear and short-term memory dynamics, are excellent candidates as information-processing devices or reservoirs for temporal classification and prediction tasks. Previous implementations relied on nominally identical memristors that applied the same nonlinear transformation to the input data, which is not enough to achieve a rich state space. To address this limitation, researchers either diversified the data encoding across multiple memristors or harnessed the stochastic device-to-device variability among the memristors. However, this approach requires additional pre-processing steps and leads to synchronization issues. Instead, it is preferable to encode the data once and pass it through a reservoir layer consisting of memristors with distinct dynamics. Here, we demonstrate that ion-channel-based memristors with voltage-dependent dynamics can be controllably and predictively tuned through voltage or adjustment of the ion channel concentration to exhibit diverse dynamic properties. We show, through experiments and simulations, that reservoir layers constructed with a small number of distinct memristors exhibit significantly higher predictive and classification accuracies with a single data encoding. We found that for a second-order nonlinear dynamical system prediction task, the varied memristor reservoir experimentally achieved a normalized mean square error of 0.0015 using only five distinct memristors. Moreover, in a neural activity classification task, a reservoir of just three distinct memristors experimentally attained an accuracy of 96.5%. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2305.12025 [pdf, other]

doi 10.1002/aisy.202300346

Biomembrane-based Memcapacitive Reservoir Computing System for Energy Efficient Temporal Data Processing

Authors: Md Razuan Hossain, Ahmed Salah Mohamed, Nicholas Xavier Armendarez, Joseph S. Najem, Md Sakib Hasan

Abstract: Reservoir computing is a highly efficient machine learning framework for processing temporal data by extracting features from the input signal and mapping them into higher dimensional spaces. Physical reservoir layers have been realized using spintronic oscillators, atomic switch networks, silicon photonic modules, ferroelectric transistors, and volatile memristors. However, these devices are intr… ▽ More Reservoir computing is a highly efficient machine learning framework for processing temporal data by extracting features from the input signal and mapping them into higher dimensional spaces. Physical reservoir layers have been realized using spintronic oscillators, atomic switch networks, silicon photonic modules, ferroelectric transistors, and volatile memristors. However, these devices are intrinsically energy-dissipative due to their resistive nature, which leads to increased power consumption. Therefore, capacitive memory devices can provide a more energy-efficient approach. Here, we leverage volatile biomembrane-based memcapacitors that closely mimic certain short-term synaptic plasticity functions as reservoirs to solve classification tasks and analyze time-series data in simulation and experimentally. Our system achieves a 99.6% accuracy rate for spoken digit classification and a normalized mean square error of 7.81*10^{-4} in a second-order non-linear regression task. Furthermore, to showcase the device's real-time temporal data processing capability, we achieve 100% accuracy for a real-time epilepsy detection problem from an inputted electroencephalography (EEG) signal. Most importantly, we demonstrate that each memcapacitor consumes an average of 41.5 fJ of energy per spike, regardless of the selected input voltage pulse width, while maintaining an average power of 415 fW for a pulse width of 100 ms. These values are orders of magnitude lower than those achieved by state-of-the-art memristors used as reservoirs. Lastly, we believe the biocompatible, soft nature of our memcapacitor makes it highly suitable for computing and signal-processing applications in biological environments. △ Less

Submitted 15 November, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Supplementary information is attached under the main text

arXiv:2303.11527 [pdf]

Machine Learning Techniques for Estimating Soil Moisture from Mobile Captured Images

Authors: Muhammad Riaz Hasib Hossain, Muhammad Ashad Kabir

Abstract: Precise Soil Moisture (SM) assessment is essential in agriculture. By understanding the level of SM, we can improve yield irrigation scheduling which significantly impacts food production and other needs of the global population. The advancements in smartphone technologies and computer vision have demonstrated a non-destructive nature of soil properties, including SM. The study aims to analyze the… ▽ More Precise Soil Moisture (SM) assessment is essential in agriculture. By understanding the level of SM, we can improve yield irrigation scheduling which significantly impacts food production and other needs of the global population. The advancements in smartphone technologies and computer vision have demonstrated a non-destructive nature of soil properties, including SM. The study aims to analyze the existing Machine Learning (ML) techniques for estimating SM from soil images and understand the moisture accuracy using different smartphones and various sunlight conditions. Therefore, 629 images of 38 soil samples were taken from seven areas in Sydney, Australia, and split into four datasets based on the image-capturing devices used (iPhone 6s and iPhone 11 Pro) and the lighting circumstances (direct and indirect sunlight). A comparison between Multiple Linear Regression (MLR), Support Vector Regression (SVR), and Convolutional Neural Network (CNN) was presented. MLR was performed with higher accuracy using holdout cross-validation, where the images were captured in indirect sunlight with the Mean Absolute Error (MAE) value of 0.35, Root Mean Square Error (RMSE) value of 0.15, and R^2 value of 0.60. Nevertheless, SVR was better with MAE, RMSE, and R^2 values of 0.05, 0.06, and 0.96 for 10-fold cross-validation and 0.22, 0.06, and 0.95 for leave-one-out cross-validation when images were captured in indirect sunlight. It demonstrates a smartphone camera's potential for predicting SM by utilizing ML. In the future, software developers can develop mobile applications based on the research findings for accurate, easy, and rapid SM estimation. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 21 pages, 10 figures

arXiv:2212.03338 [pdf, other]

doi 10.1109/WACV57701.2024.00104

Framework-agnostic Semantically-aware Global Reasoning for Segmentation

Authors: Mir Rayat Imtiaz Hossain, Leonid Sigal, James J. Little

Abstract: Recent advances in pixel-level tasks (e.g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features. However, such aggregated representations, often in the form of attention, fail to model the underlying semantics of the scene (e.g. individual objects and, by extension, their interactions). In this work, we a… ▽ More Recent advances in pixel-level tasks (e.g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features. However, such aggregated representations, often in the form of attention, fail to model the underlying semantics of the scene (e.g. individual objects and, by extension, their interactions). In this work, we address the issue by proposing a component that learns to project image features into latent representations and reason between them using a transformer encoder to generate contextualized and scene-consistent representations which are fused with original image features. Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint and the union of such regions corresponds to a connected object segment. The proposed semantic global reasoning (SGR) component is end-to-end trainable and can be easily added to a wide variety of backbones (CNN or transformer-based) and segmentation heads (per-pixel or mask classification) to consistently improve the segmentation results on different datasets. In addition, our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks like object detection and segmentation, with improved performance. Furthermore, we also proposed metrics to quantify the semantics of latent tokens at both class \& instance level. △ Less

Submitted 17 April, 2024; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: Published in WACV 2024

Journal ref: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024, pp. 988-998

arXiv:2006.13364 [pdf, other]

A Privacy-preserving Mobile and Fog Computing Framework to Trace and Prevent COVID-19 Community Transmission

Authors: Md Whaiduzzaman, Md. Razon Hossain, Ahmedur Rahman Shovon, Shanto Roy, Aron Laszka, Rajkumar Buyya, Alistair Barros

Abstract: To slow down the spread of COVID-19, governments around the world are trying to identify infected people and to contain the virus by enforcing isolation and quarantine. However, it is difficult to trace people who came into contact with an infected person, which causes widespread community transmission and mass infection. To address this problem, we develop an e-government Privacy Preserving Mobil… ▽ More To slow down the spread of COVID-19, governments around the world are trying to identify infected people and to contain the virus by enforcing isolation and quarantine. However, it is difficult to trace people who came into contact with an infected person, which causes widespread community transmission and mass infection. To address this problem, we develop an e-government Privacy Preserving Mobile and Fog computing framework entitled PPMF that can trace infected and suspected cases nationwide. We use personal mobile devices with contact tracing app and two types of stationary fog nodes, named Automatic Risk Checkers (ARC) and Suspected User Data Uploader Node (SUDUN), to trace community transmission alongside maintaining user data privacy. Each user's mobile device receives a Unique Encrypted Reference Code (UERC) when registering on the central application. The mobile device and the central application both generate Rotational Unique Encrypted Reference Code (RUERC), which broadcasted using the Bluetooth Low Energy (BLE) technology. The ARCs are placed at the entry points of buildings, which can immediately detect if there are positive or suspected cases nearby. If any confirmed case is found, the ARCs broadcast pre-cautionary messages to nearby people without revealing the identity of the infected person. The SUDUNs are placed at the health centers that report test results to the central cloud application. The reported data is later used to map between infected and suspected cases. Therefore, using our proposed PPMF framework, governments can let organizations continue their economic activities without complete lockdown. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: 12 pages, 9 figures, 1 table, 1 algorithm

arXiv:1711.08585 [pdf, other]

doi 10.1007/978-3-030-01249-6_5

Exploiting temporal information for 3D pose estimation

Authors: Mir Rayat Imtiaz Hossain, James J. Little

Abstract: In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to predict from images directly, the top-performing approaches have shown the effectiveness of dividing the task of 3D pose estimation into two steps: using a s… ▽ More In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to predict from images directly, the top-performing approaches have shown the effectiveness of dividing the task of 3D pose estimation into two steps: using a state-of-the-art 2D pose estimator to estimate the 2D pose from images and then mapping them into 3D space. They also showed that a low-dimensional representation like 2D locations of a set of joints can be discriminative enough to estimate 3D pose with high accuracy. However, estimation of 3D pose for individual frames leads to temporally incoherent estimates due to independent error in each frame causing jitter. Therefore, in this work we utilize the temporal information across a sequence of 2D joint locations to estimate a sequence of 3D poses. We designed a sequence-to-sequence network composed of layer-normalized LSTM units with shortcut connections connecting the input to the output on the decoder side and imposed temporal smoothness constraint during training. We found that the knowledge of temporal consistency improves the best reported result on Human3.6M dataset by approximately $12.2\%$ and helps our network to recover temporally consistent 3D poses over a sequence of images even when the 2D pose detector fails. △ Less

Submitted 12 September, 2018; v1 submitted 23 November, 2017; originally announced November 2017.

arXiv:1501.01109 [pdf]

doi 10.5121/ijci.2014.3601

PC Guided Automatic Vehicle System

Authors: M. A. A. Mashud, M. R. Hossain, Mustari Zaman, M. A. Razzaque

Abstract: The main objective of this paper is to design and develop an automatic vehicle, fully controlled by a computer system. The vehicle designed in the present work can move in a pre-determined path and work automatically without the need of any human operator and it also controlled by human operator. Such a vehicle is capable of performing wide variety of difficult tasks in space research, domestic, s… ▽ More The main objective of this paper is to design and develop an automatic vehicle, fully controlled by a computer system. The vehicle designed in the present work can move in a pre-determined path and work automatically without the need of any human operator and it also controlled by human operator. Such a vehicle is capable of performing wide variety of difficult tasks in space research, domestic, scientific and industrial fields. For this purpose, an IBM compatible PC with Pentium microprocessor has been used which performed the function of the system controller. Its parallel printer port has been used as data communication port to interface the vehicle. A suitable software program has been developed for the system controller to send commands to the vehicle. △ Less

Submitted 6 January, 2015; originally announced January 2015.

Comments: 10 pages, International Journal on Cybernetics & Informatics (IJCI);2014

arXiv:0912.0946 [pdf]

Comparative Study of Different Guard Time Intervals to Improve the BER Performance of Wimax Systems to Minimize the Effects of ISI and ICI under Adaptive Modulation Techniques over SUI1 and AWGN Communication Channels

Authors: Md. Zahid Hasan, Mohammad Reaz Hossain, Md. Ashraful Islam, Riaz Hossain

Abstract: The WIMAX technology based on air interface standard 802.16 wireless MAN is configured in the same way as a traditional cellular network with base stations using point to multipoint architecture to drive a service over a radius up to several kilometers. The range and the Non Line of Sight (NLOS) ability of WIMAX make the system very attractive for users, but there will be slightly higher BER at… ▽ More The WIMAX technology based on air interface standard 802.16 wireless MAN is configured in the same way as a traditional cellular network with base stations using point to multipoint architecture to drive a service over a radius up to several kilometers. The range and the Non Line of Sight (NLOS) ability of WIMAX make the system very attractive for users, but there will be slightly higher BER at low SNR. The aim of this paper is the comparative study of different guard time intervals effect for improving BER at different SNR under digital modulation (QPSK, 16QAM and 64QAM) techniques and different communication channels AWGN and fading channels Stanford University Interim (SUI 1) of an WIMAX system. The comparison between these effects with Reed-Solomon (RS) encoder with Convolutional encoder (half) rated codes in FEC channel coding will be investigated. The simulation results of estimated Bit Error Rate (BER) displays that the implementation of interleaved RS code (255,239,8) with (half) rated Convolutional code of 0.25 guard time intervals under QPSK modulation technique over AWGN channel is highly effective to combat in the Wimax communication system. To complete this performance analysis in Wimax based systems, a segment of audio signal is used for analysis. The transmitted audio message is found to have retrieved effectively under noisy situation. △ Less

Submitted 4 December, 2009; originally announced December 2009.

Comments: 5 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS November 2009, ISSN 1947 5500, http://sites.google.com/site/ijcsis/

Report number: ISSN 1947 5500

Journal ref: International Journal of Computer Science and Information Security, IJCSIS, Vol. 6, No. 2, pp. 128-132, November 2009, USA

Showing 1–9 of 9 results for author: Hossain, M R