Zum Hauptinhalt springen

Showing 1–50 of 359 results for author: Khan, H

.
  1. arXiv:2408.08855  [pdf, other

    cs.CV

    DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models

    Authors: Eman Ali, Sathira Silva, Muhammad Haris Khan

    Abstract: Vision-language models (VLMs), e.g., CLIP, have shown remarkable potential in zero-shot image classification. However, adapting these models to new domains remains challenging, especially in unsupervised settings where labelled data is unavailable. Recent research has proposed pseudo-labelling approaches to adapt CLIP in an unsupervised manner using unlabelled target data. Nonetheless, these metho… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  2. arXiv:2408.07445  [pdf, other

    cs.CV

    Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Hassan Sajjad, Tom De Schepper, Markus Schedl

    Abstract: Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit deteriorated performance if one or more modalities are missing. In this work, we propose a modality invariant multimodal learning method, which is less susceptible to t… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  3. arXiv:2408.06755  [pdf, other

    cs.CV cs.CL

    Sumotosima: A Framework and Dataset for Classifying and Summarizing Otoscopic Images

    Authors: Eram Anwarul Khan, Anas Anwarul Haq Khan

    Abstract: Otoscopy is a diagnostic procedure to examine the ear canal and eardrum using an otoscope. It identifies conditions like infections, foreign bodies, ear drum perforations and ear abnormalities. We propose a novel resource efficient deep learning and transformer based framework, Sumotosima (Summarizer for otoscopic images), an end-to-end pipeline for classification followed by summarization. Our fr… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Work in Progress

  4. arXiv:2408.00498  [pdf, other

    cs.CV

    How Effective are Self-Supervised Models for Contact Identification in Videos

    Authors: Malitha Gunawardhana, Limalka Sadith, Liel David, Daniel Harari, Muhammad Haris Khan

    Abstract: The exploration of video content via Self-Supervised Learning (SSL) models has unveiled a dynamic field of study, emphasizing both the complex challenges and unique opportunities inherent in this area. Despite the growing body of research, the ability of SSL models to detect physical contacts in videos remains largely unexplored, particularly the effectiveness of methods such as downstream supervi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 15 pages, 6 figures

  5. arXiv:2407.17595  [pdf, other

    hep-ex

    Measurement of the $^8$B Solar Neutrino Flux Using the Full SNO+ Water Phase

    Authors: SNO+ Collaboration, :, A. Allega, M. R. Anderson, S. Andringa, M. Askins, D. J. Auty, A. Bacon, J. Baker, F. Barão, N. Barros, R. Bayes, E. W. Beier, A. Bialek, S. D. Biller, E. Blucher, E. Caden, E. J. Callaghan, M. Chen, S. Cheng, B. Cleveland, D. Cookman, J. Corning, M. A. Cox, R. Dehghani , et al. (93 additional authors not shown)

    Abstract: The SNO+ detector operated initially as a water Cherenkov detector. The implementation of a sealed covergas system midway through water data taking resulted in a significant reduction in the activity of $^{222}$Rn daughters in the detector and allowed the lowest background to the solar electron scattering signal above 5 MeV achieved to date. This paper reports an updated SNO+ water phase $^8$B sol… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  6. arXiv:2407.15390  [pdf, other

    cs.CL cs.AI

    ALLaM: Large Language Models for Arabic and English

    Authors: M Saiful Bari, Yazeed Alnumay, Norah A. Alzahrani, Nouf M. Alotaibi, Hisham A. Alyahya, Sultan AlRashed, Faisal A. Mirza, Shaykhah Z. Alsubaie, Hassan A. Alahmed, Ghadah Alabduljabbar, Raghad Alkhathran, Yousef Almushayqih, Raneem Alnajim, Salman Alsubaihi, Maryam Al Mansour, Majed Alrubaian, Ali Alammari, Zaki Alawami, Abdulmohsen Al-Thubaity, Ahmed Abdelali, Jeril Kuriakose, Abdalghani Abujabal, Nora Al-Twairesh, Areeb Alowisheq, Haidar Khan

    Abstract: We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT). ALLaM is carefully trained considering the values of language alignment and knowledge transfer at scale. Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining on a mixture… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  7. arXiv:2407.13813  [pdf, other

    eess.IV cs.AI q-bio.QM

    A review of handcrafted and deep radiomics in neurological diseases: transitioning from oncology to clinical neuroimaging

    Authors: Elizaveta Lavrova, Henry C. Woodruff, Hamza Khan, Eric Salmon, Philippe Lambin, Christophe Phillips

    Abstract: Medical imaging technologies have undergone extensive development, enabling non-invasive visualization of clinical information. The traditional review of medical images by clinicians remains subjective, time-consuming, and prone to human error. With the recent availability of medical imaging data, quantification have become important goals in the field. Radiomics, a methodology aimed at extracting… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  8. arXiv:2407.13715  [pdf, other

    cs.CV cs.LG

    Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning

    Authors: Ans Munir, Faisal Z. Qureshi, Muhammad Haris Khan, Mohsen Ali

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to predict unknown compositions made up of attribute and object pairs. Predicting compositions unseen during training is a challenging task. We are exploring Open World Compositional Zero-Shot Learning (OW-CZSL) in this study, where our test space encompasses all potential combinations of attributes and objects. Our approach involves utilizing the self-… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 10 pages, 6 figures

  9. arXiv:2407.11283  [pdf, other

    cs.LG

    Novel Approach for Predicting the Air Quality Index of Megacities through Attention-Enhanced Deep Multitask Spatiotemporal Learning

    Authors: Harun Khan, Joseph Tso, Nathan Nguyen, Nivaan Kaushal, Ansh Malhotra, Nayel Rehman

    Abstract: Air pollution remains one of the most formidable environmental threats to human health globally, particularly in urban areas, contributing to nearly 7 million premature deaths annually. Megacities, defined as cities with populations exceeding 10 million, are frequent hotspots of severe pollution, experiencing numerous weeks of dangerously poor air quality due to the concentration of harmful pollut… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, 3 tables

  10. arXiv:2407.06436  [pdf

    cs.HC

    Simplifying Integration of Custom Controllers in Exergames

    Authors: Hassan Ali Khan, Muhammad Asbar Javed, Amnah Khan

    Abstract: Despite of the established evidence in favor of exergames for physical rehabilitation their use is limited in Pakistan. In our user study with game developers (N=62), majority (67.7%) of the participants believed that exergames' popularity will increase if cheap alternatives of body tracking devices are available. Perhaps, custom controllers can be used as an affordable alternate input source in e… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  11. arXiv:2407.04519  [pdf, other

    cs.CV

    Success or Failure? Analyzing Segmentation Refinement with Few-Shot Segmentation

    Authors: Seonghyeon Moon, Haein Kong, Muhammad Haris Khan

    Abstract: The purpose of segmentation refinement is to enhance the initial coarse masks generated by segmentation algorithms. The refined masks are expected to capture the details and contours of the target objects. Research on segmentation refinement has developed as a response to the need for high-quality initial masks. However, to our knowledge, no method has been developed that can determine the success… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 4 pages

  12. arXiv:2407.04069  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

    Authors: Md Tahmid Rahman Laskar, Sawsan Alqahtani, M Saiful Bari, Mizanur Rahman, Mohammad Abdullah Matin Khan, Haidar Khan, Israt Jahan, Amran Bhuiyan, Chee Wei Tan, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty, Jimmy Huang

    Abstract: Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  13. arXiv:2407.01440  [pdf, other

    cs.LG

    GAT-Steiner: Rectilinear Steiner Minimal Tree Prediction Using GNNs

    Authors: Bugra Onal, Eren Dogan, Muhammad Hadir Khan, Matthew R. Guthaus

    Abstract: The Rectilinear Steiner Minimum Tree (RSMT) problem is a fundamental problem in VLSI placement and routing and is known to be NP-hard. Traditional RSMT algorithms spend a significant amount of time on finding Steiner points to reduce the total wire length or use heuristics to approximate producing sub-optimal results. We show that Graph Neural Networks (GNNs) can be used to predict optimal Steiner… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Preprint for The 2024 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2024)

  14. arXiv:2406.17190  [pdf, other

    cs.SD cs.LG eess.AS

    Sound Tagging in Infant-centric Home Soundscapes

    Authors: Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

    Abstract: Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted in IEEE/ACM CHASE 2024

  15. arXiv:2406.14498  [pdf, other

    cs.CL

    LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

    Authors: Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, Bashima Islam

    Abstract: Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpret… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review at ARR (for EMNLP 2024)

  16. arXiv:2406.08775  [pdf, other

    cs.CV

    ALINA: Advanced Line Identification and Notation Algorithm

    Authors: Mohammed Abdul Hafeez Khan, Parth Ganeriwala, Siddhartha Bhattacharyya, Natasha Neogi, Raja Muthalagu

    Abstract: Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annot… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Paper has been accepted to The 3rd CVPR Workshop on Vision Datasets Understanding, 2024

  17. arXiv:2406.06533  [pdf, other

    cs.AR cs.AI

    Pragmatic Formal Verification Methodology for Clock Domain Crossing (CDC)

    Authors: Aman Kumar, Muhammad Ul Haque Khan, Bijitendra Mittra

    Abstract: Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain… ▽ More

    Submitted 20 April, 2024; originally announced June 2024.

    Comments: Published in DVCon Europe 2023

  18. arXiv:2405.19700  [pdf, other

    hep-ex nucl-ex

    Initial measurement of reactor antineutrino oscillation at SNO+

    Authors: SNO+ Collaboration, :, A. Allega, M. R. Anderson, S. Andringa, M. Askins, D. J. Auty, A. Bacon, J. Baker, F. Barão, N. Barros, R. Bayes, E. W. Beier, T. S. Bezerra, A. Bialek, S. D. Biller, E. Blucher, E. Caden, E. J. Callaghan, M. Chen, S. Cheng, B. Cleveland, D. Cookman, J. Corning, M. A. Cox , et al. (96 additional authors not shown)

    Abstract: The SNO+ collaboration reports its first spectral analysis of long-baseline reactor antineutrino oscillation using 114 tonne-years of data. Fitting the neutrino oscillation probability to the observed energy spectrum yields constraints on the neutrino mass-squared difference $Δm^2_{21}$. In the ranges allowed by previous measurements, the best-fit $Δm^2_{21}$ is (8.85$^{+1.10}_{-1.33}$) $\times$ 1… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  19. arXiv:2405.19292  [pdf, other

    cs.MA

    Act Natural! Projecting Autonomous System Trajectories Into Naturalistic Behavior Sets

    Authors: Hamzah I. Khan, Adam J. Thorpe, David Fridovich-Keil

    Abstract: Autonomous agents operating around human actors must consider how their behaviors might affect those humans, even when not directly interacting with them. To this end, it is often beneficial to be predictable and appear naturalistic. Existing methods to address this problem use human actor intent modeling or imitation learning techniques, but these approaches rarely capture all possible motivation… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  20. arXiv:2405.14497  [pdf, other

    cs.CV

    Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

    Authors: Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir, M. Saquib Sarfraz, Mohsen Ali

    Abstract: In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set o… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  21. arXiv:2405.14323  [pdf, other

    cs.CY

    SmartCS: Enabling the Creation of ML-Powered Computer Vision Mobile Apps for Citizen Science Applications without Coding

    Authors: Fahim Hasan Khan, Akila de Silva, Gregory Dusek, James Davis, Alex Pang

    Abstract: It is undeniable that citizen science contributes to the advancement of various fields of study. There are now software tools that facilitate the development of citizen science apps. However, apps developed with these tools rely on individual human skills to correctly collect useful data. Machine learning (ML)-aided apps provide on-field guidance to citizen scientists on data collection tasks. How… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  22. arXiv:2405.13518  [pdf, other

    cs.CV

    PerSense: Personalized Instance Segmentation in Dense Images

    Authors: Muhammad Ibraheem Siddiqui, Muhammad Umer Sheikh, Hassan Abid, Muhammad Haris Khan

    Abstract: Leveraging large-scale pre-training, vision foundational models showcase notable performance benefits. While recent years have witnessed significant advancements in segmentation algorithms, existing models still face challenges to automatically segment personalized instances in dense and crowded scenarios. The primary factor behind this limitation stems from bounding box-based detections, which ar… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Technical report of PerSense

  23. arXiv:2405.12986  [pdf

    eess.IV cs.AI cs.CV

    A Novel Feature Map Enhancement Technique Integrating Residual CNN and Transformer for Alzheimer Diseases Diagnosis

    Authors: Saddam Hussain Khan

    Abstract: Alzheimer diseases (ADs) involves cognitive decline and abnormal brain protein accumulation, necessitating timely diagnosis for effective treatment. Therefore, CAD systems leveraging deep learning advancements have demonstrated success in AD detection but pose computational intricacies and the dataset minor contrast, structural, and texture variations. In this regard, a novel hybrid FME-Residual-H… ▽ More

    Submitted 25 May, 2024; v1 submitted 30 March, 2024; originally announced May 2024.

    Comments: 28 Pages, 11 Figures, 3 Tables

  24. arXiv:2405.11829  [pdf, other

    cs.LG cs.CV

    Adversarially Diversified Rehearsal Memory (ADRM): Mitigating Memory Overfitting Challenge in Continual Learning

    Authors: Hikmat Khan, Ghulam Rasool, Nidhal Carla Bouaynaya

    Abstract: Continual learning focuses on learning non-stationary data distribution without forgetting previous knowledge. Rehearsal-based approaches are commonly used to combat catastrophic forgetting. However, these approaches suffer from a problem called "rehearsal memory overfitting, " where the model becomes too specialized on limited memory samples and loses its ability to generalize effectively. As a r… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  25. arXiv:2405.07698  [pdf, other

    cs.CV

    oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving

    Authors: Abdul Hannan Khan, Syed Tahseen Raza Rizvi, Dheeraj Varma Chittari Macharavtu, Andreas Dengel

    Abstract: Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  26. arXiv:2405.06919  [pdf, other

    cs.CY cs.CL

    Automating Thematic Analysis: How LLMs Analyse Controversial Topics

    Authors: Awais Hameed Khan, Hiruni Kegalle, Rhea D'Silva, Ned Watt, Daniel Whelan-Shamy, Lida Ghahremanlou, Liam Magee

    Abstract: Large Language Models (LLMs) are promising analytical tools. They can augment human epistemic, cognitive and reasoning abilities, and support 'sensemaking', making sense of a complex environment or subject by analysing large volumes of data with a sensitivity to context and nuance absent in earlier text processing systems. This paper presents a pilot experiment that explores how LLMs can support t… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 18 pages, 6 figures

    ACM Class: K.4.2

  27. arXiv:2404.14588  [pdf

    cs.LG cs.CV

    Brain-Inspired Continual Learning-Robust Feature Distillation and Re-Consolidation for Class Incremental Learning

    Authors: Hikmat Khan, Nidhal Carla Bouaynaya, Ghulam Rasool

    Abstract: Artificial intelligence (AI) and neuroscience share a rich history, with advancements in neuroscience shaping the development of AI systems capable of human-like knowledge retention. Leveraging insights from neuroscience and existing research in adversarial and continual learning, we introduce a novel framework comprising two core concepts: feature distillation and re-consolidation. Our framework,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  28. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  29. arXiv:2404.09342  [pdf, other

    cs.CV cs.SD eess.AS

    Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

    Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More

    Submitted 22 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACM Multimedia Conference - Grand Challenge

  30. arXiv:2404.01352  [pdf, other

    physics.flu-dyn cs.AI cs.CV cs.GR

    VortexViz: Finding Vortex Boundaries by Learning from Particle Trajectories

    Authors: Akila de Silva, Nicholas Tee, Omkar Ghanekar, Fahim Hasan Khan, Gregory Dusek, James Davis, Alex Pang

    Abstract: Vortices are studied in various scientific disciplines, offering insights into fluid flow behavior. Visualizing the boundary of vortices is crucial for understanding flow phenomena and detecting flow irregularities. This paper addresses the challenge of accurately extracting vortex boundaries using deep learning techniques. While existing methods primarily train on velocity components, we propose… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Under review

  31. arXiv:2403.16194  [pdf, other

    cs.CV

    Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

    Authors: Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood, Muhammad Haris Khan

    Abstract: Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem. In pursuit of developing a robust ULD framework, we explore the potential of a recent paradigm of self-supervised learning algorithms, known as diffusion models. Some recent works have shown that these models implicitly contain important correspondence cues. Towards harnessing the potential of d… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted in CVPR 2024

  32. arXiv:2403.11674  [pdf, other

    cs.CV

    Towards Generalizing to Unseen Domains with Few Labels

    Authors: Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana, Muhammad Haris Khan

    Abstract: We approach the challenge of addressing semi-supervised domain generalization (SSDG). Specifically, our aim is to obtain a model that learns domain-generalizable features by leveraging a limited subset of labelled data alongside a substantially larger pool of unlabeled data. Existing domain generalization (DG) methods which are unable to exploit unlabeled data perform poorly compared to semi-super… ▽ More

    Submitted 7 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  33. arXiv:2403.07019  [pdf

    econ.GN

    Reasons behind the Water Crisis and its Potential Health Outcomes

    Authors: Md. Galib Ishraq Emran, Rhidi Barma, Akram Hussain Khan, Mrinmoy Roy

    Abstract: Globally, the water crisis has become a significant problem that affects developing and industrialized nations. Water shortage can harm public health by increasing the chance of contracting water-borne diseases, dehydration, and malnutrition. This study aims to examine the causes of the water problem and its likely effects on human health. The study scrutinizes the reasons behind the water crisis,… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  34. arXiv:2403.02782  [pdf, other

    cs.CV

    Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

    Authors: Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khan

    Abstract: In this paper, we explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos. Existing works have attained partial success by extensively leveraging various sources of information av… ▽ More

    Submitted 15 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures, (supplementary material: 9 pages, 5 figures), accepted to CVPR 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 , Pages 18816-18826

  35. arXiv:2402.09244  [pdf, other

    eess.SP

    Zero-energy Devices for 6G: Technical Enablers at a Glance

    Authors: Onel López, Ritesh Kumar Singh, Dinh-Thuy Phan-Huy, Efstathios Katranaras, Nafiseh Mazloum, Riku Jäntti, Hamza Khan, Osmel Rosabal, Pavlos Alexias, Prasoon Raghuwanshi, David Ruiz-Guirola, Bikramjit Singh, Andreas Höglund, Dung Pham Van, Amirhossein Azarbahram, Jeroen Famaey

    Abstract: Low-cost, resource-constrained, maintenance-free, and energy-harvesting (EH) Internet of Things (IoT) devices, referred to as zero-energy devices (ZEDs), are rapidly attracting attention from industry and academia due to their myriad of applications. To date, such devices remain primarily unsupported by modern IoT connectivity solutions due to their intrinsic fabrication, hardware, deployment, and… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 Figures

  36. arXiv:2402.01781  [pdf, other

    cs.CL cs.AI cs.LG

    When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

    Authors: Norah Alzahrani, Hisham Abdullah Alyahya, Yazeed Alnumay, Sultan Alrashed, Shaykhah Alsubaie, Yusef Almushaykeh, Faisal Mirza, Nouf Alotaibi, Nora Altwairesh, Areeb Alowisheq, M Saiful Bari, Haidar Khan

    Abstract: Large Language Model (LLM) leaderboards based on benchmark rankings are regularly used to guide practitioners in model selection. Often, the published leaderboard rankings are taken at face value - we show this is a (potentially costly) mistake. Under existing leaderboards, the relative performance of LLMs is highly sensitive to (often minute) details. We show that for popular multiple-choice ques… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: updated with ACL 2024 camera ready version

  37. arXiv:2402.00128  [pdf, other

    cs.CV

    Real-time Traffic Object Detection for Autonomous Driving

    Authors: Abdul Hannan Khan, Syed Tahseen Raza Rizvi, Andreas Dengel

    Abstract: With recent advances in computer vision, it appears that autonomous driving will be part of modern society sooner rather than later. However, there are still a significant number of concerns to address. Although modern computer vision techniques demonstrate superior performance, they tend to prioritize accuracy over efficiency, which is a crucial aspect of real-time applications. Large object dete… ▽ More

    Submitted 29 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  38. arXiv:2401.13965  [pdf, other

    cs.CV

    Improving Pseudo-labelling and Enhancing Robustness for Semi-Supervised Domain Generalization

    Authors: Adnan Khan, Mai A. Shaaban, Muhammad Haris Khan

    Abstract: Beyond attaining domain generalization (DG), visual recognition models should also be data-efficient during learning by leveraging limited labels. We study the problem of Semi-Supervised Domain Generalization (SSDG) which is crucial for real-world applications like automated healthcare. SSDG requires learning a cross-domain generalizable model when the given training data is only partially labelle… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  39. arXiv:2401.13785  [pdf, other

    cs.CV

    Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction

    Authors: Sathira Silva, Savindu Bhashitha Wannigama, Gihan Jayatilaka, Muhammad Haris Khan, Roshan Ragel

    Abstract: Holistic understanding and reasoning in 3D scenes play a vital role in the success of autonomous driving systems. The evolution of 3D semantic occupancy prediction as a pretraining task for autonomous driving and robotic downstream tasks capture finer 3D details compared to methods like 3D detection. Existing approaches predominantly focus on spatial cues such as tri-perspective view embeddings (T… ▽ More

    Submitted 4 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  40. arXiv:2401.11621  [pdf

    q-fin.ST cs.CE cs.LG

    A Novel Decision Ensemble Framework: Customized Attention-BiLSTM and XGBoost for Speculative Stock Price Forecasting

    Authors: Riaz Ud Din, Salman Ahmed, Saddam Hussain Khan

    Abstract: Forecasting speculative stock prices is essential for effective investment risk management that drives the need for the development of innovative algorithms. However, the speculative nature, volatility, and complex sequential dependencies within financial markets present inherent challenges which necessitate advanced techniques. This paper proposes a novel framework, CAB-XDE (customized attention… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 30 pages, 16 Figures, 4 Tables

  41. arXiv:2401.11358  [pdf, other

    cs.CV

    ANNA: A Deep Learning Based Dataset in Heterogeneous Traffic for Autonomous Vehicles

    Authors: Mahedi Kamal, Tasnim Fariha, Afrina Kabir Zinia, Md. Abu Syed, Fahim Hasan Khan, Md. Mahbubur Rahman

    Abstract: Recent breakthroughs in artificial intelligence offer tremendous promise for the development of self-driving applications. Deep Neural Networks, in particular, are being utilized to support the operation of semi-autonomous cars through object identification and semantic segmentation. To assess the inadequacy of the current dataset in the context of autonomous and semi-autonomous cars, we created a… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  42. arXiv:2401.09354  [pdf

    eess.AS cs.AI cs.SD

    Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World Applications

    Authors: Hania Khan, Aleena Fatima Khalid, Zaryab Hassan

    Abstract: This research investigates the transferability of Automatic Speech Recognition (ASR)-robust Natural Language Understanding (NLU) models from controlled experimental conditions to practical, real-world applications. Focused on smart home automation commands in Urdu, the study assesses model performance under diverse noise profiles, linguistic variations, and ASR error scenarios. Leveraging the Urdu… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  43. arXiv:2401.06084  [pdf, other

    gr-qc astro-ph.CO hep-ph

    Post-Newtonian effects in compact binaries with a dark matter spike: A Lagrangian approach

    Authors: Diego Montalvo, Adam Smith-Orlik, Saeed Rastgoo, Laura Sagunski, Niklas Becker, Hazkeel Khan

    Abstract: We present a simple but powerful Lagrangian method that can be used to study the post-Newtonian evolution of a compact binary system with environment, including a dark matter spike, around it, and obtain the resulting gravitational wave emission. This formalism allows one to incorporate post-Newtonian effects up to any desired known order, as well as any other environmental effect around the binar… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 16 pages, 4 figures

  44. arXiv:2312.04695  [pdf

    econ.GN

    Foreign Capital and Economic Growth: Evidence from Bangladesh

    Authors: Ummya Salma, Md. Fazlul Huq Khan, Md. Masum Billah

    Abstract: This study aims to examine the relationship between Foreign Direct Investment (FDI), personal remittances received, and official development assistance (ODA) in the economic growth of Bangladesh. The study utilizes time series data on Bangladesh from 1976 to 2021. Additionally, this research contributes to the existing literature by introducing the Foreign Capital Depthless Index (FCDI) and explor… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  45. arXiv:2312.00634  [pdf

    eess.IV cs.CV

    A Recent Survey of Vision Transformers for Medical Image Segmentation

    Authors: Asifullah Khan, Zunaira Rauf, Abdul Rehman Khan, Saima Rathore, Saddam Hussain Khan, Najmus Saher Shah, Umair Farooq, Hifsa Asif, Aqsa Asif, Umme Zahoora, Rafi Ullah Khalil, Suleman Qamar, Umme Hani Asif, Faiza Babar Khan, Abdul Majid, Jeonghwan Gwak

    Abstract: Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. Traditionally, convolutional neural networks (CNNs) dominated this domain, excelling at local feature extraction. However, their limitations in capturing long-range dependencies across image regions pose challenges for segmenting complex, inte… ▽ More

    Submitted 18 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

  46. arXiv:2311.10754  [pdf

    eess.IV cs.CV

    A Recent Survey of the Advancements in Deep Learning Techniques for Monkeypox Disease Detection

    Authors: Saddam Hussain Khan, Rashid Iqbal, Saeeda Naz

    Abstract: Monkeypox (MPox) is a zoonotic infectious disease induced by the MPox Virus, part of the poxviridae orthopoxvirus group initially discovered in Africa and gained global attention in mid-2022 with cases reported outside endemic areas. Symptoms include headaches, chills, fever, smallpox, measles, and chickenpox-like skin manifestations and the WHO officially announced MPox as a global public health… ▽ More

    Submitted 23 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: 53 pages, 16 figures, 7 tables

  47. Smell of Fire Increases Behavioural Realism in Virtual Reality: A Case Study on a Recreated MGM Grand Hotel Fire

    Authors: Humayun Khan, Daniel Nilsson

    Abstract: Virtual reality allows creating highly immersive visual and auditory experiences, making users feel physically present in the environment. This makes it an ideal platform to simulate dangerous scenarios, including fire evacuation, and study human behaviour without exposing users to harmful elements. However, human perception of the surroundings is based on the integration of multiple sensory cues… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted at IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2023, 9 pages

  48. arXiv:2311.09086  [pdf, other

    cs.CL cs.AI cs.SI

    The Uli Dataset: An Exercise in Experience Led Annotation of oGBV

    Authors: Arnav Arora, Maha Jinadoss, Cheshta Arora, Denny George, Brindaalakshmi, Haseena Dawood Khan, Kirti Rawat, Div, Ritash, Seema Mathur, Shivani Yadav, Shehla Rashid Shora, Rie Raut, Sumit Pawar, Apurva Paithane, Sonia, Vivek, Dharini Priscilla, Khairunnisha, Grace Banu, Ambika Tandon, Rishav Thakker, Rahul Dev Korra, Aatman Vaidya, Tarunima Prabhakar

    Abstract: Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of… ▽ More

    Submitted 24 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  49. arXiv:2311.06802  [pdf, other

    physics.flu-dyn

    A hybrid discrete exterior calculus and finite difference method for anelastic convection in spherical shells

    Authors: Hamid Hassan Khan, Pankaj Jagad, Matteo Parsani

    Abstract: The present work develops, verifies, and benchmarks a hybrid discrete exterior calculus and finite difference (DEC-FD) method for density-stratified thermal convection in spherical shells. Discrete exterior calculus (DEC) is notable for its coordinate independence and structure preservation properties. The hybrid DEC-FD method for Boussinesq convection has been developed by Mantravadi et al. (Mant… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 32 pages, 13 figures

  50. arXiv:2311.06226  [pdf, other

    eess.SY

    MaDEVIoT: Cyberattacks on EV Charging Can Disrupt Power Grid Operation

    Authors: Samrat Acharya, Hafiz Anwar Ullah Khan, Ramesh Karri, Yury Dvorkin

    Abstract: This paper examines the feasibility of demand-side cyberattacks on power grids launched via internet-connected high-power EV Charging Stations (EVCSs). By distorting power grid frequency and voltage, these attacks can trigger system-wide outages. Our case study focuses on Manhattan, New York, and reveals that such attacks will become feasible by 2030 with increased EV adoption. With a single EVCS… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: This paper is accepted for publication in the proceeding of IEEE ISGT NA 2024 in Washington DC, USA