Skip to main content

Showing 1–19 of 19 results for author: Suglia, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03967  [pdf, other

    cs.CL cs.AI cs.RO

    Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks

    Authors: Amit Parekh, Nikolas Vitsakis, Alessandro Suglia, Ioannis Konstas

    Abstract: Evaluating the generalisation capabilities of multimodal models based solely on their performance on out-of-distribution data fails to capture their true robustness. This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models, considering architectural design, input perturbations across la… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2406.19297  [pdf, other

    cs.CV

    Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

    Authors: Malvina Nikandrou, Georgios Pantazopoulos, Ioannis Konstas, Alessandro Suglia

    Abstract: Continual learning focuses on incrementally training a model on a sequence of tasks with the aim of learning new tasks while minimizing performance drop on previous tasks. Existing approaches at the intersection of Continual Learning and Visual Question Answering (VQA) do not study how the multimodal nature of the input affects the learning dynamics of a model. In this paper, we demonstrate that e… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.18403  [pdf, other

    cs.CL

    LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

    Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

    Abstract: There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human anno… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.13807  [pdf, other

    cs.CV cs.AI cs.CL

    AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding

    Authors: Alessandro Suglia, Claudio Greco, Katie Baker, Jose L. Part, Ioannis Papaioannou, Arash Eshghi, Ioannis Konstas, Oliver Lemon

    Abstract: AI personal assistants deployed via robots or wearables require embodied understanding to collaborate with humans effectively. However, current Vision-Language Models (VLMs) primarily focus on third-person view videos, neglecting the richness of egocentric perceptual experience. To address this gap, we propose three key contributions. First, we introduce the Egocentric Video Understanding Dataset… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Code available https://github.com/alanaai/EVUD

  5. arXiv:2405.04403  [pdf, other

    cs.CV cs.CL

    Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks

    Authors: Georgios Pantazopoulos, Amit Parekh, Malvina Nikandrou, Alessandro Suglia

    Abstract: Augmenting Large Language Models (LLMs) with image-understanding capabilities has resulted in a boom of high-performing Vision-Language models (VLMs). While studying the alignment of LLMs to human values has received widespread attention, the safety of VLMs has not received the same attention. In this paper, we explore the impact of jailbreaking on three state-of-the-art VLMs, each using a distinc… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  6. arXiv:2404.13594  [pdf, other

    cs.CV cs.AI

    Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers

    Authors: Georgios Pantazopoulos, Alessandro Suglia, Oliver Lemon, Arash Eshghi

    Abstract: An effective method for combining frozen large language models (LLM) and visual encoders involves a resampler module that creates a `visual prompt' which is provided to the LLM, along with the textual prompt. While this approach has enabled impressive performance across many coarse-grained tasks like image captioning and visual question answering, more fine-grained tasks that require spatial under… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  7. arXiv:2401.03321  [pdf, other

    cs.CL

    PIXAR: Auto-Regressive Language Modeling in Pixel Space

    Authors: Yintao Tai, Xiyang Liao, Alessandro Suglia, Antonio Vergari

    Abstract: Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to generate text. Therefore, they can… ▽ More

    Submitted 23 February, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

  8. arXiv:2312.04736  [pdf, other

    cs.CL cs.AI

    Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning

    Authors: Sabrina McCallum, Max Taylor-Davies, Stefano V. Albrecht, Alessandro Suglia

    Abstract: Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning. One possible way to help bridge this gap be to provide RL agents with richer, more human-like feedback expressed in natural language. To investigate this idea, we first extend BabyAI to automatically generate language feedback from the envi… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted at Workshop on Goal-conditioned Reinforcement Learning, NeurIPS 2023

  9. arXiv:2312.02431  [pdf, other

    cs.CL cs.AI

    Visually Grounded Language Learning: a review of language games, datasets, tasks, and models

    Authors: Alessandro Suglia, Ioannis Konstas, Oliver Lemon

    Abstract: In recent years, several machine learning models have been proposed. They are trained with a language modelling objective on large-scale text-only data. With such pretraining, they can achieve impressive results on many Natural Language Understanding and Generation tasks. However, many facets of meaning cannot be learned by ``listening to the radio" only. In the literature, many Vision+Language (V… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Preprint for JAIR before copyediting

  10. arXiv:2311.04067  [pdf, other

    cs.LG cs.AI cs.CV

    Multitask Multimodal Prompted Training for Interactive Embodied Task Completion

    Authors: Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia

    Abstract: Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation. To tackle these challenges, we propose an Embodied MultiModal Agent (EMMA): a unified encoder-decoder model that reasons over images and trajectories, and casts action predi… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  11. arXiv:2307.15554  [pdf, other

    cs.CL

    'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

    Authors: Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi, Helen Hastie

    Abstract: Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee. Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarificational Exchanges (CE): a Clarification Request (CR) and a response. Here, we argue that the ability to generate and respond to CRs im… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted at SIGDIAL'23 (upcoming). Repository with code and experiments available at https://github.com/JChiyah/what-are-you-referring-to

  12. arXiv:2211.04534  [pdf, other

    cs.CV cs.CL

    Going for GOAL: A Resource for Grounded Football Commentaries

    Authors: Alessandro Suglia, José Lopes, Emanuele Bastianelli, Andrea Vanzo, Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser

    Abstract: Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer')… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Preprint formatted using the ACM Multimedia template (8 pages + appendix)

  13. arXiv:2210.00044  [pdf, other

    cs.LG

    Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering

    Authors: Mavina Nikandrou, Lu Yu, Alessandro Suglia, Ioannis Konstas, Verena Rieser

    Abstract: Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge. Although continual learning has been widely studied in computer vision, its application to Vision+Language tasks is not that straightforward, as settings can be parameterized in multiple ways according to their input modalities. In this paper, we present a detailed study of how diff… ▽ More

    Submitted 20 January, 2024; v1 submitted 30 September, 2022; originally announced October 2022.

  14. arXiv:2202.12645  [pdf, other

    cs.CL cs.AI

    Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

    Authors: Javier Chiyah-Garcia, Alessandro Suglia, José Lopes, Arash Eshghi, Helen Hastie

    Abstract: Anaphoric expressions, such as pronouns and referential descriptions, are situated with respect to the linguistic context of prior turns, as well as, the immediate visual environment. However, a speaker's referential descriptions do not always uniquely identify the referent, leading to ambiguities in need of resolution through subsequent clarificational exchanges. Thus, effective Ambiguity Detecti… ▽ More

    Submitted 26 July, 2023; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: Accepted to AAAI 2022 DSTC10 Workshop

  15. arXiv:2108.04927  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion

    Authors: Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme

    Abstract: Language-guided robots performing home and office tasks must navigate in and interact with the world. Grounding language instructions against visual observations and actions to take in an environment is an open challenge. We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task… ▽ More

    Submitted 4 November, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: Accepted at Novel Ideas in Learning-to-Learn through Interaction (NILLI) workshop @ EMNLP 2021

  16. arXiv:2102.00424  [pdf, other

    cs.CL cs.CV cs.LG

    An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games

    Authors: Alessandro Suglia, Yonatan Bisk, Ioannis Konstas, Antonio Vergari, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

    Abstract: Guessing games are a prototypical instance of the "learning by interacting" paradigm. This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA). We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted paper for the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)

  17. arXiv:2011.02917  [pdf, other

    cs.CL cs.CV cs.LG

    Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games

    Authors: Alessandro Suglia, Antonio Vergari, Ioannis Konstas, Yonatan Bisk, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon

    Abstract: In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, re… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: Accepted to the International Conference on Computational Linguistics (COLING) 2020

  18. arXiv:2006.02174  [pdf, other

    cs.CL cs.AI cs.LG

    CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

    Authors: Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon

    Abstract: Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

    Comments: Accepted to the Annual Conference of the Association for Computational Linguistics (ACL) 2020

  19. arXiv:1702.02367  [pdf, ps, other

    cs.CL

    Iterative Multi-document Neural Attention for Multiple Answer Prediction

    Authors: Claudio Greco, Alessandro Suglia, Pierpaolo Basile, Gaetano Rossiello, Giovanni Semeraro

    Abstract: People have information needs of varying complexity, which can be solved by an intelligent agent able to answer questions formulated in a proper way, eventually considering user context and preferences. In a scenario in which the user profile can be considered as a question, intelligent agents able to answer questions can be used to find the most relevant answers for a given user. In this work we… ▽ More

    Submitted 8 February, 2017; originally announced February 2017.

    Comments: Paper accepted and presented at the Deep Understanding and Reasoning: A challenge for Next-generation Intelligent Agents (URANIA) workshop, held in the context of the AI*IA 2016 conference