Zum Hauptinhalt springen

Showing 1–20 of 20 results for author: Caglayan, O

Searching in archive cs. Search in all archives.
.
  1. Supervised Visual Attention for Simultaneous Multimodal Machine Translation

    Authors: Veneta Haralampieva, Ozan Caglayan, Lucia Specia

    Abstract: Recently, there has been a surge in research in multimodal machine translation (MMT), where additional modalities such as images are used to improve translation quality of textual systems. A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially… ▽ More

    Submitted 29 June, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: Accepted to Journal of Artificial Intelligence Research (JAIR)

    Journal ref: Journal of Artificial Intelligence Research 74 (2022) 1059-1089

  2. arXiv:2106.03484  [pdf, other

    cs.CL

    BERTGEN: Multi-task Generation through BERT

    Authors: Faidon Mitzalis, Ozan Caglayan, Pranava Madhyastha, Lucia Specia

    Abstract: We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT, respectively. BERTGEN is auto-regressively trained for language generation tasks, namely image captioning, machine translation and multimodal machine translation, under a multitask setting. With a comprehensive set of evaluations, we show that BE… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL 2021 Main Conference

  3. arXiv:2102.11387  [pdf, other

    cs.CL

    Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation

    Authors: Julia Ive, Andy Mingren Li, Yishu Miao, Ozan Caglayan, Pranava Madhyastha, Lucia Specia

    Abstract: This paper addresses the problem of simultaneous machine translation (SiMT) by exploring two main concepts: (a) adaptive policies to learn a good trade-off between high translation quality and low latency; and (b) visual information to support this process by providing additional (visual) contextual information which may be available before the textual input is produced. For that, we propose a mul… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: Long paper accepted to EACL 2021, Camera-ready version

  4. arXiv:2101.10044  [pdf, other

    cs.CL cs.CV

    Cross-lingual Visual Pre-training for Multimodal Machine Translation

    Authors: Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

    Abstract: Pre-trained language models have been shown to improve performance in many natural language tasks substantially. Although the early focus of such models was single language pre-training, recent advances have resulted in cross-lingual and visual pre-training methods. In this paper, we combine these two approaches to learn visually-grounded cross-lingual representations. Specifically, we extend the… ▽ More

    Submitted 20 April, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: Accepted to EACL 2021 (Camera-ready version)

  5. arXiv:2012.07098  [pdf, other

    cs.CV

    MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

    Authors: Begum Citamak, Ozan Caglayan, Menekse Kuyu, Erkut Erdem, Aykut Erdem, Pranava Madhyastha, Lucia Specia

    Abstract: Automatic generation of video descriptions in natural language, also called video captioning, aims to understand the visual content of the video and produce a natural language sentence depicting the objects and actions in the scene. This challenging integrated vision and language problem, however, has been predominantly addressed for English. The lack of data and the linguistic properties of other… ▽ More

    Submitted 13 December, 2020; originally announced December 2020.

  6. arXiv:2010.13588  [pdf, ps, other

    cs.CL

    Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale

    Authors: Ozan Caglayan, Pranava Madhyastha, Lucia Specia

    Abstract: Automatic evaluation of language generation systems is a well-studied problem in Natural Language Processing. While novel metrics are proposed every year, a few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation, despite their known limitations. This is partly due to ease of use, and partly because researchers expect to see them and kn… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: 7 pages, accepted to COLING 2020

  7. arXiv:2009.07310  [pdf, other

    cs.CL

    Simultaneous Machine Translation with Visual Context

    Authors: Ozan Caglayan, Julia Ive, Veneta Haralampieva, Pranava Madhyastha, Loïc Barrault, Lucia Specia

    Abstract: Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible. The translation thus has to start with an incomplete source text, which is read progressively, creating the need for anticipation. In this paper, we seek to understand whether the addition of visual information can compensate for the m… ▽ More

    Submitted 13 October, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: Long paper accepted to EMNLP 2020, Camera-ready version

  8. arXiv:1911.12798  [pdf, other

    cs.CL

    Multimodal Machine Translation through Visuals and Speech

    Authors: Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann

    Abstract: Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area are spoken language translation, image-guided translation, and video-guided translation, which exploit audio and visual modalities, respectively. These tasks are… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: 34 pages, 4 tables, 8 figures. Submitted (Nov 2019) to the Machine Translation journal (Springer)

  9. arXiv:1910.13215  [pdf, other

    cs.CL

    Transformer-based Cascaded Multimodal Speech Translation

    Authors: Zixiu Wu, Ozan Caglayan, Julia Ive, Josiah Wang, Lucia Specia

    Abstract: This paper describes the cascaded multimodal speech translation systems developed by Imperial College London for the IWSLT 2019 evaluation campaign. The architecture consists of an automatic speech recognition (ASR) system followed by a Transformer-based multimodal machine translation (MMT) system. While the ASR component is identical across the experiments, the MMT model varies in terms of the wa… ▽ More

    Submitted 8 November, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: Accepted to IWSLT 2019

  10. arXiv:1910.07482  [pdf, other

    cs.CL cs.NE

    Imperial College London Submission to VATEX Video Captioning Task

    Authors: Ozan Caglayan, Zixiu Wu, Pranava Madhyastha, Josiah Wang, Lucia Specia

    Abstract: This paper describes the Imperial College London team's submission to the 2019' VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features. We then investigate the effect of dropping the encoder and the attention mechanism and instead conditioning the GRU deco… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

  11. arXiv:1903.08678  [pdf, other

    cs.CL

    Probing the Need for Visual Context in Multimodal Machine Translation

    Authors: Ozan Caglayan, Pranava Madhyastha, Lucia Specia, Loïc Barrault

    Abstract: Current work on multimodal machine translation (MMT) has suggested that the visual modality is either unnecessary or only marginally beneficial. We posit that this is a consequence of the very simple, short and repetitive sentences used in the only available dataset for the task (Multi30K), rendering the source text sufficient as context. In the general case, however, we believe that it is possibl… ▽ More

    Submitted 2 June, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

    Comments: Accepted to NAACL-HLT 2019, reviewer comments addressed, camera-ready

  12. arXiv:1811.03865  [pdf, other

    cs.CL

    Multimodal Grounding for Sequence-to-Sequence Speech Recognition

    Authors: Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze

    Abstract: Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or to recall named entities. Motivated by this, there have been many works studying the integration of visual information into the speech recognition pipeline. Spe… ▽ More

    Submitted 19 February, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: ICASSP 2019

  13. arXiv:1811.00347  [pdf, other

    cs.CL

    How2: A Large-scale Dataset for Multimodal Language Understanding

    Authors: Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze

    Abstract: In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations. We also present integrated sequence-to-sequence baselines for machine translation, automatic speech recognition, spoken language translation, and multimodal summarization. By making available data and code for several multimodal natural language tasks,… ▽ More

    Submitted 7 December, 2018; v1 submitted 1 November, 2018; originally announced November 2018.

  14. arXiv:1809.00151  [pdf, other

    cs.CL

    LIUM-CVC Submissions for WMT18 Multimodal Translation Task

    Authors: Ozan Caglayan, Adrien Bardet, Fethi Bougares, Loïc Barrault, Kai Wang, Marc Masana, Luis Herranz, Joost van de Weijer

    Abstract: This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previous multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English-Fr… ▽ More

    Submitted 1 September, 2018; originally announced September 2018.

    Comments: WMT2018

  15. arXiv:1707.04499  [pdf, other

    cs.CL

    LIUM Machine Translation Systems for WMT17 News Translation Task

    Authors: Mercedes García-Martínez, Ozan Caglayan, Walid Aransa, Adrien Bardet, Fethi Bougares, Loïc Barrault

    Abstract: This paper describes LIUM submissions to WMT17 News Translation Task for English-German, English-Turkish, English-Czech and English-Latvian language pairs. We train BPE-based attentive Neural Machine Translation systems with and without factored outputs using the open source nmtpy framework. Competitive scores were obtained by ensembling various systems and exploiting the availability of target mo… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    Comments: News Translation Task System Description paper for WMT17

  16. arXiv:1707.04481  [pdf, other

    cs.CL

    LIUM-CVC Submissions for WMT17 Multimodal Translation Task

    Authors: Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Marc Masana, Luis Herranz, Joost van de Weijer

    Abstract: This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    Comments: MMT System Description Paper for WMT17

  17. Sustainable computational science: the ReScience initiative

    Authors: Nicolas P. Rougier, Konrad Hinsen, Frédéric Alexandre, Thomas Arildsen, Lorena Barba, Fabien C. Y. Benureau, C. Titus Brown, Pierre de Buyl, Ozan Caglayan, Andrew P. Davison, Marc André Delsuc, Georgios Detorakis, Alexandra K. Diem, Damien Drix, Pierre Enel, Benoît Girard, Olivia Guest, Matt G. Hall, Rafael Neto Henriques, Xavier Hinaut, Kamil S Jaron, Mehdi Khamassi, Almar Klein, Tiina Manninen, Pietro Marchesi , et al. (20 additional authors not shown)

    Abstract: Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than tw… ▽ More

    Submitted 11 November, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

    Comments: 8 pages, 1 figure

    Journal ref: PeerJ Computer Science 3:e142 (2017)

  18. NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

    Authors: Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, Loïc Barrault

    Abstract: In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM's top-rank… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

    Comments: 10 pages, 3 figures

  19. arXiv:1609.03976  [pdf, other

    cs.CL cs.NE

    Multimodal Attention for Neural Machine Translation

    Authors: Ozan Caglayan, Loïc Barrault, Fethi Bougares

    Abstract: The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultane… ▽ More

    Submitted 13 September, 2016; originally announced September 2016.

    Comments: 10 pages, under review COLING 2016

  20. arXiv:1605.09186  [pdf, other

    cs.CL cs.LG cs.NE

    Does Multimodality Help Human and Machine for Translation and Image Captioning?

    Authors: Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Joost van de Weijer

    Abstract: This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation an… ▽ More

    Submitted 16 August, 2016; v1 submitted 30 May, 2016; originally announced May 2016.

    Comments: 7 pages, 2 figures, v4: Small clarification in section 4 title and content