Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Khan, M S U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15831  [pdf, other

    cs.CV

    Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

    Authors: Muhammad Saif Ullah Khan, Muhammad Zeshan Afzal, Didier Stricker

    Abstract: Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our datase… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: This dataset paper was originally written in 2022

  2. arXiv:2406.14370  [pdf, other

    cs.CV

    Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

    Authors: Muhammad Saif Ullah Khan, Tahira Shehzadi, Rabeya Noor, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Automated signature verification on bank checks is critical for fraud prevention and ensuring transaction authenticity. This task is challenging due to the coexistence of signatures with other textual and graphical elements on real-world documents. Verification systems must first detect the signature and then validate its authenticity, a dual challenge often overlooked by current datasets and meth… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in 16th IAPR International Workshop on Document Analysis Systems 2024

  3. arXiv:2406.13439  [pdf, other

    cs.CL

    Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

    Authors: Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra

    Abstract: Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework d… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.13302  [pdf, other

    cs.CV

    Situational Instructions Database: Task Guidance in Dynamic Environments

    Authors: Muhammad Saif Ullah Khan, Sankalp Sinha, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operati… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  5. arXiv:2405.20084  [pdf, other

    cs.CV

    Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach

    Authors: Muhammad Saif Ullah Khan, Dhavalkumar Limbachiya, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems. However, the lack of consistency in the annotated skeletons across different datasets poses challenges in developing universally applicable models. To address this challenge, we propose a novel approach integrating multi-teacher knowledge distillation with a unifie… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 15 pages (with references)

  6. arXiv:2405.03660  [pdf, other

    cs.CV

    CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

    Authors: Sankalp Sinha, Muhammad Saif Ullah Khan, Talha Uddin Sheikh, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in th… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 18 Pages, 4 Figures and Accepted in ICDAR 2024

  7. arXiv:2403.06904  [pdf, other

    cs.CV

    FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

    Authors: Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc Van Gool, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks. Our novel contributions enhance CLIP on both the vision and text sides. On the vision side, we incorporate ROI heatmaps emulating human visual attention mechanisms to emphasize subject-relevant image regio… ▽ More

    Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  8. arXiv:2403.06350  [pdf, other

    cs.CL

    IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

    Authors: Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

    Abstract: Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  9. arXiv:2401.15006  [pdf, other

    cs.CL cs.AI

    Airavata: Introducing Hindi Instruction-tuned LLM

    Authors: Jay Gala, Thanmay Jayakumar, Jaavid Aktar Husain, Aswanth Kumar M, Mohammed Safi Ur Rahman Khan, Diptesh Kanojia, Ratish Puduppully, Mitesh M. Khapra, Raj Dabre, Rudra Murthy, Anoop Kunchukuttan

    Abstract: We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additional… ▽ More

    Submitted 26 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Work in progress

  10. arXiv:2104.12203  [pdf

    cs.CV

    A novel segmentation dataset for signatures on bank checks

    Authors: Muhammad Saif Ullah Khan

    Abstract: The dataset presented provides high-resolution images of real, filled out bank checks containing various complex backgrounds, and handwritten text and signatures in the respective fields, along with both pixel-level and patch-level segmentation masks for the signatures on the checks. The images of bank checks were obtained from different sources, including other publicly available check datasets,… ▽ More

    Submitted 28 April, 2021; v1 submitted 25 April, 2021; originally announced April 2021.