Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Geigle, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14496  [pdf, other

    cs.CV cs.CL

    African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification

    Authors: Gregor Geigle, Radu Timofte, Goran Glavaš

    Abstract: Recent Large Vision-Language Models (LVLMs) demonstrate impressive abilities on numerous image understanding and reasoning tasks. The task of fine-grained object classification (e.g., distinction between \textit{animal species}), however, has been probed insufficiently, despite its downstream importance. We fill this evaluation gap by creating \texttt{FOCI} (\textbf{F}ine-grained \textbf{O}bject \… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.14492  [pdf, other

    cs.CV cs.CL

    Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?

    Authors: Gregor Geigle, Radu Timofte, Goran Glavaš

    Abstract: Large vision-language models (LVLMs) have recently dramatically pushed the state of the art in image captioning and many image understanding tasks (e.g., visual question answering). LVLMs, however, often \textit{hallucinate} and produce captions that mention concepts that cannot be found in the image. These hallucinations erode the trustworthiness of LVLMs and are arguably among the main obstacles… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2401.16468  [pdf, other

    cs.CV cs.LG eess.IV

    InstructIR: High-Quality Image Restoration Following Human Instructions

    Authors: Marcos V. Conde, Gregor Geigle, Radu Timofte

    Abstract: Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions… ▽ More

    Submitted 7 July, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: European Conference on Computer Vision (ECCV) 2024

  4. arXiv:2307.06930  [pdf, other

    cs.CV cs.CL

    mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

    Authors: Gregor Geigle, Abhay Jain, Radu Timofte, Goran Glavaš

    Abstract: Modular vision-language models (Vision-LLMs) align pretrained image encoders with (frozen) large language models (LLMs) and post-hoc condition LLMs to `understand' the image input. With the abundance of readily available high-quality English image-text data as well as strong monolingual English LLMs, the research focus has been on English-only Vision-LLMs. Multilingual vision-language models are s… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: ALVR Workshop 2024

  5. arXiv:2306.08658  [pdf, other

    cs.CL cs.CV

    Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

    Authors: Gregor Geigle, Radu Timofte, Goran Glavaš

    Abstract: Vision-and-language (VL) models with separate encoders for each modality (e.g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval. They are, however, mostly evaluated in English as multilingual benchmarks are limited in availability. We introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of ImageNet labels to… ▽ More

    Submitted 12 June, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2024

  6. arXiv:2210.06379  [pdf, other

    cs.CV cs.CL

    One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks

    Authors: Gregor Geigle, Chen Cecilia Liu, Jonas Pfeiffer, Iryna Gurevych

    Abstract: Current multimodal models, aimed at solving Vision and Language (V+L) tasks, predominantly repurpose Vision Encoders (VE) as feature extractors. While many VEs -- of different architectures, trained on different data and objectives -- are publicly available, they are not designed for the downstream V+L tasks. Nonetheless, most current work assumes that a \textit{single} pre-trained VE can serve as… ▽ More

    Submitted 8 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Repl4NLP 2023

  7. arXiv:2203.13693  [pdf, other

    cs.CL cs.IR

    UKP-SQUARE: An Online Platform for Question Answering Research

    Authors: Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych

    Abstract: Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that cons… ▽ More

    Submitted 28 March, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022 Demo Track

  8. arXiv:2109.06082  [pdf, other

    cs.CL

    xGQA: Cross-Lingual Visual Question Answering

    Authors: Jonas Pfeiffer, Gregor Geigle, Aishwarya Kamath, Jan-Martin O. Steitz, Stefan Roth, Ivan Vulić, Iryna Gurevych

    Abstract: Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset to 7 typologically divers… ▽ More

    Submitted 17 March, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Findings of ACL 2022

  9. arXiv:2104.07081  [pdf, other

    cs.CL

    TWEAC: Transformer with Extendable QA Agent Classifiers

    Authors: Gregor Geigle, Nils Reimers, Andreas Rücklé, Iryna Gurevych

    Abstract: Question answering systems should help users to access knowledge on a broad range of topics and to answer a wide array of different questions. Most systems fall short of this expectation as they are only specialized in one particular setting, e.g., answering factual questions with Wikipedia data. To overcome this limitation, we propose composing multiple QA agents within a meta-QA system. We argue… ▽ More

    Submitted 16 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

  10. arXiv:2103.11920  [pdf, other

    cs.CV cs.CL

    Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

    Authors: Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, Iryna Gurevych

    Abstract: Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and ineff… ▽ More

    Submitted 18 February, 2022; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: TACL 2022

  11. arXiv:2010.11918  [pdf, other

    cs.LG cs.CL

    AdapterDrop: On the Efficiency of Adapters in Transformers

    Authors: Andreas Rücklé, Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, Iryna Gurevych

    Abstract: Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inf… ▽ More

    Submitted 5 October, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: EMNLP 2021