Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Chauhan, H H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09879  [pdf, other

    cs.CL

    sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting

    Authors: Sanchit Ahuja, Kumar Tanmay, Hardik Hansrajbhai Chauhan, Barun Patra, Kriti Aggarwal, Luciano Del Corro, Arindam Mitra, Tejas Indulal Dhamecha, Ahmed Awadallah, Monojit Choudhary, Vishrav Chaudhary, Sunayana Sitaram

    Abstract: Despite the remarkable success of LLMs in English, there is a significant gap in performance in non-English languages. In order to address this, we introduce a novel recipe for creating a multilingual synthetic instruction tuning dataset, sPhinX, which is created by selectively translating instruction response pairs from English into 50 languages. We test the effectiveness of sPhinX by using it to… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: 20 pages, 12 tables, 5 figures

  2. arXiv:2305.14218  [pdf, other

    cs.CV cs.AI

    DUBLIN -- Document Understanding By Language-Image Network

    Authors: Kriti Aggarwal, Aditi Khandelwal, Kumar Tanmay, Owais Mohammed Khan, Qiang Liu, Monojit Choudhury, Hardik Hansrajbhai Chauhan, Subhojit Som, Vishrav Chaudhary, Saurabh Tiwary

    Abstract: Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization ability across different document types and languages. In this paper, we propose DUBLIN, which is pretrained on web pages using three novel objectives… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    ACM Class: F.2.2; I.2.7