Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Thapliyal, A V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.02171  [pdf, other

    cs.LG cs.AI

    Emergence of Abstract State Representations in Embodied Sequence Modeling

    Authors: Tian Yun, Zilai Zeng, Kunal Handa, Ashish V. Thapliyal, Bo Pang, Ellie Pavlick, Chen Sun

    Abstract: Decision making via sequence modeling aims to mimic the success of language models, where actions taken by an embodied agent are modeled as tokens to predict. Despite their promising performance, it remains unclear if embodied sequence modeling leads to the emergence of internal representations that represent the environmental state information. A model that lacks abstract state representations wo… ▽ More

    Submitted 7 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Project webpage: https://abstract-state-seqmodel.github.io/

  2. arXiv:2209.05401  [pdf, other

    cs.CL cs.CV

    MaXM: Towards Multilingual Visual Question Answering

    Authors: Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut

    Abstract: Visual Question Answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require a considerable amount of resources. In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data gene… ▽ More

    Submitted 24 October, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: EMNLP 2023 (Findings). https://github.com/google-research-datasets/maxm

  3. arXiv:2205.12522  [pdf, other

    cs.CV cs.CL

    Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset

    Authors: Ashish V. Thapliyal, Jordi Pont-Tuset, Xi Chen, Radu Soricut

    Abstract: Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets. In this paper we present the Crossmodal-3600 dataset (XM3600 in short), a geographically diverse set of 3600 images annotated with human-generated reference captions in 36 languages. The images were selected from across the world, covering regions where the 36 languages are… ▽ More

    Submitted 10 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  4. arXiv:2204.08121  [pdf, other

    cs.CV cs.CL

    End-to-end Dense Video Captioning as Sequence Generation

    Authors: Wanrong Zhu, Bo Pang, Ashish V. Thapliyal, William Yang Wang, Radu Soricut

    Abstract: Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event. Previous approaches usually follow a two-stage generative process, which first proposes a segment for each event, then renders a caption for each identified segment. Recent advances in large-scale sequence generation pretraining have seen great success in unifying tas… ▽ More

    Submitted 16 September, 2022; v1 submitted 17 April, 2022; originally announced April 2022.

    Comments: COLING 2022

  5. arXiv:2005.00246  [pdf, other

    cs.CL cs.CV cs.LG

    Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

    Authors: Ashish V. Thapliyal, Radu Soricut

    Abstract: Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with the lack of non-English annotations. We investigate potential solutions for combining existing language-generation annotations in English with translation capabilities in order to create solutions at web-scale in both do… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  6. arXiv:1909.03396  [pdf, other

    cs.CL cs.CV

    Quality Estimation for Image Captions Based on Large-scale Human Evaluations

    Authors: Tomer Levinboim, Ashish V. Thapliyal, Piyush Sharma, Radu Soricut

    Abstract: Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild. In this paper, we focus on the task of Quality Estimation (QE) for image captions, which attempts to model the caption quality from a human perspective and without access to ground-tru… ▽ More

    Submitted 1 June, 2021; v1 submitted 8 September, 2019; originally announced September 2019.

    Comments: 10 pages, 6 figures, 3 tables. Accepted to NAACL2021. https://www.aclweb.org/anthology/2021.naacl-main.253/