Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Calixto, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12498  [pdf, other

    cs.CL cs.CV

    Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning

    Authors: Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem

    Abstract: The linguistic capabilities of Multimodal Large Language Models (MLLMs) are critical for their effective application across diverse tasks. This study aims to evaluate the performance of MLLMs on the VALSE benchmark, focusing on the efficacy of few-shot In-Context Learning (ICL), and Chain-of-Thought (CoT) prompting. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in mode… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Preprint. 33 pages, 17 Figures, 3 Tables

  2. arXiv:2311.11462  [pdf, other

    cs.CL cs.AI

    LLM aided semi-supervision for Extractive Dialog Summarization

    Authors: Nishant Mishra, Gaurav Sahu, Iacer Calixto, Ameen Abu-Hanna, Issam H. Laradji

    Abstract: Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to f… ▽ More

    Submitted 23 November, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

    Comments: to be published in EMNLP Findings

  3. arXiv:2311.07022  [pdf, other

    cs.CL cs.AI cs.CV

    ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

    Authors: Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem

    Abstract: With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm foo… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Preprint. 48 pages, 22 figures, 10 tables

  4. arXiv:2307.00897  [pdf, other

    cs.LG cs.AI

    Fixing confirmation bias in feature attribution methods via semantic match

    Authors: Giovanni Cinà, Daniel Fernandez-Llaneza, Ludovico Deponte, Nishant Mishra, Tabea E. Röber, Sandro Pezzelle, Iacer Calixto, Rob Goedhart, Ş. İlker Birbil

    Abstract: Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's… ▽ More

    Submitted 26 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

  5. arXiv:2303.15846  [pdf, other

    cs.CL cs.AI cs.LG

    Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes

    Authors: Auke Elfrink, Iacopo Vagliano, Ameen Abu-Hanna, Iacer Calixto

    Abstract: We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use larg… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: A short version of this paper has been published at the 21st International Conference on Artificial Intelligence in Medicine (AIME 2023)

    ACM Class: I.2.7

  6. arXiv:2211.04576  [pdf, other

    cs.CL cs.AI

    Detecting Euphemisms with Literal Descriptions and Visual Imagery

    Authors: İlker Kesen, Aykut Erdem, Erkut Erdem, Iacer Calixto

    Abstract: This paper describes our two-stage system for the Euphemism Detection shared task hosted by the 3rd Workshop on Figurative Language Processing in conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive or unpleasant issues like addiction and death. The ambiguous nature of euphemistic words or expressions makes it challenging to detect their actual meaning within a context. In… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 7 pages, 1 table, 1 figure. Accepted to the 3rd Workshop on Figurative Language Processing at EMNLP 2022. https://github.com/ilkerkesen/euphemism

  7. arXiv:2206.13163  [pdf, other

    cs.CL cs.AI

    Endowing Language Models with Multimodal Knowledge Graph Representations

    Authors: Ningyuan Huang, Yash R. Deshpande, Yibo Liu, Houda Alberts, Kyunghyun Cho, Clara Vania, Iacer Calixto

    Abstract: We propose a method to make natural language understanding models more parameter efficient by storing knowledge in an external knowledge graph (KG) and retrieving from this KG using a dense index. Given (possibly multilingual) downstream task data, e.g., sentences in German, we retrieve entities from the KG and use their multimodal representations to improve downstream task performance. We use the… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 14 pages with appendix, 2 figures, 15 tables

    MSC Class: 68T50 ACM Class: I.2.7; I.2.10; I.2.4

  8. VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

    Authors: Letitia Parcalabescu, Michele Cafagna, Lilitta Muradjan, Anette Frank, Iacer Calixto, Albert Gatt

    Abstract: We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modali… ▽ More

    Submitted 14 March, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: Paper accepted for publication at ACL 2022 Main; 28 pages, 4 figures, 11 tables

    MSC Class: 68Txx ACM Class: I.2.7; I.2.10

  9. arXiv:2112.03213  [pdf, other

    cs.CL

    Zero-shot hashtag segmentation for multilingual sentiment analysis

    Authors: Ruan Chaves Rodrigues, Marcelo Akira Inuzuka, Juliana Resplande Sant'Anna Gomes, Acquila Santos Rocha, Iacer Calixto, Hugo Alexandre Dantas do Nascimento

    Abstract: Hashtag segmentation, also known as hashtag decomposition, is a common step in preprocessing pipelines for social media datasets. It usually precedes tasks such as sentiment analysis and hate speech detection. For sentiment analysis in medium to low-resourced languages, previous research has demonstrated that a multilingual approach that resorts to machine translation can be competitive or superio… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 12 pages, 5 figures, 5 tables

    ACM Class: I.2.7

  10. arXiv:2012.12352  [pdf, other

    cs.CV cs.CL

    Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks

    Authors: Letitia Parcalabescu, Albert Gatt, Anette Frank, Iacer Calixto

    Abstract: We investigate the reasoning ability of pretrained vision and language (V&L) models in two tasks that require multimodal integration: (1) discriminating a correct image-sentence pair from an incorrect one, and (2) counting entities in an image. We evaluate three pretrained V&L models on these tasks: ViLBERT, ViLBERT 12-in-1 and LXMERT, in zero-shot and finetuned settings. Our results show that mod… ▽ More

    Submitted 17 June, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: Paper accepted for publication at MMSR 2021; 13 pages, 3 figures, 7 Tables

    MSC Class: 68Txx ACM Class: I.2.7; I.2.10

    Journal ref: Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR), 2021, Groningen, Netherlands (Online), Association for Computational Linguistics, p. 32--44

  11. arXiv:2012.01711  [pdf, other

    cs.LG

    A Study on the Autoregressive and non-Autoregressive Multi-label Learning

    Authors: Elham J. Barezi, Iacer Calixto, Kyunghyun Cho, Pascale Fung

    Abstract: Extreme classification tasks are multi-label tasks with an extremely large number of labels (tags). These tasks are hard because the label space is usually (i) very large, e.g. thousands or millions of labels, (ii) very sparse, i.e. very few labels apply to each input document, and (iii) highly correlated, meaning that the existence of one label changes the likelihood of predicting all other label… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  12. arXiv:2009.12313  [pdf, other

    cs.CV cs.CL

    Are scene graphs good enough to improve Image Captioning?

    Authors: Victor Milewski, Marie-Francine Moens, Iacer Calixto

    Abstract: Many top-performing image captioning models rely solely on object features computed with an object detection model to generate image descriptions. However, recent studies propose to directly use scene graphs to introduce information about object relations into captioning, hoping to better describe interactions between objects. In this work, we thoroughly investigate the use of scene graphs in imag… ▽ More

    Submitted 27 October, 2020; v1 submitted 25 September, 2020; originally announced September 2020.

    Comments: Published at AACL-IJCNLP 2020. 12 pages, 5 figures

    MSC Class: 68T50; 68T45 ACM Class: I.2.7; I.2.10

  13. arXiv:2008.09152  [pdf, other

    cs.CV cs.CL

    ImagiFilter: A resource to enable the semi-automatic mining of images at scale

    Authors: Houda Alberts, Iacer Calixto

    Abstract: Datasets (semi-)automatically collected from the web can easily scale to millions of entries, but a dataset's usefulness is directly related to how clean and high-quality its examples are. In this paper, we describe and publicly release an image dataset along with pretrained models designed to (semi-)automatically filter out undesirable images from very large image collections, possibly obtained f… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 10 pages, 6 figures, 2 tables

    ACM Class: E.0

  14. arXiv:2008.09150  [pdf, other

    cs.CL cs.AI cs.CV

    VisualSem: A High-quality Knowledge Graph for Vision and Language

    Authors: Houda Alberts, Teresa Huang, Yash Deshpande, Yibo Liu, Kyunghyun Cho, Clara Vania, Iacer Calixto

    Abstract: An exciting frontier in natural language understanding (NLU) and generation (NLG) calls for (vision-and-) language models that can efficiently access external structured knowledge repositories. However, many existing knowledge bases only cover limited domains, or suffer from noisy data, and most of all are typically hard to integrate into neural language pipelines. To fill this gap, we release Vis… ▽ More

    Submitted 20 October, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 1st Multilingual Representation Learning workshop (MRL 2021) co-located with EMNLP 2021. 15 pages, 8 figures, 6 tables

    ACM Class: E.0; E.2

  15. arXiv:2005.13013  [pdf, other

    cs.CL

    English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

    Authors: Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

    Abstract: Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings. We investigate whether English intermediate-task training is still helpful on non-English target tasks. Using nine intermediate language-understanding tasks,… ▽ More

    Submitted 30 September, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  16. arXiv:1811.00357  [pdf, other

    cs.CL

    Latent Variable Model for Multi-modal Translation

    Authors: Iacer Calixto, Miguel Rios, Wilker Aziz

    Abstract: In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formula… ▽ More

    Submitted 16 May, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

    Comments: Paper accepted at ACL 2019. Contains 8 pages (11 including references, 13 including appendix), 6 figures

    ACM Class: I.2.7

  17. arXiv:1702.01287  [pdf, other

    cs.CL

    Doubly-Attentive Decoder for Multi-modal Neural Machine Translation

    Authors: Iacer Calixto, Qun Liu, Nick Campbell

    Abstract: We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation. Our decoder learns to attend to source-language words and parts of an image independently by means of two separate attention mechanisms as… ▽ More

    Submitted 4 February, 2017; originally announced February 2017.

    Comments: 8 pages (11 including references), 2 figures

    ACM Class: I.2.7

  18. arXiv:1702.01101  [pdf, other

    cs.CL

    Multilingual Multi-modal Embeddings for Natural Language Processing

    Authors: Iacer Calixto, Qun Liu, Nick Campbell

    Abstract: We propose a novel discriminative model that learns embeddings from multilingual and multi-modal data, meaning that our model can take advantage of images and descriptions in multiple languages to improve embedding quality. To that end, we introduce a modification of a pairwise contrastive estimation optimisation function as our training objective. We evaluate our embeddings on an image-sentence r… ▽ More

    Submitted 3 February, 2017; originally announced February 2017.

    Comments: 4 pages (5 including references), no figures

    ACM Class: I.2.7

  19. arXiv:1701.06521  [pdf, other

    cs.CL

    Incorporating Global Visual Features into Attention-Based Neural Machine Translation

    Authors: Iacer Calixto, Qun Liu, Nick Campbell

    Abstract: We introduce multi-modal, attention-based neural machine translation (NMT) models which incorporate visual features into different parts of both the encoder and the decoder. We utilise global image features extracted using a pre-trained convolutional neural network and incorporate them (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to… ▽ More

    Submitted 23 January, 2017; originally announced January 2017.

    Comments: 8 pages (11 including references), 5 figures

    ACM Class: I.2.7