Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Mañas, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10429  [pdf, other

    cs.CV cs.AI

    Consistency-diversity-realism Pareto fronts of conditional image generative models

    Authors: Pietro Astolfi, Marlene Careil, Melissa Hall, Oscar Mañas, Matthew Muckley, Jakob Verbeek, Adriana Romero Soriano, Michal Drozdzal

    Abstract: Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative models as it would enable their use as world simulators. For these models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in gener… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2403.17804  [pdf, other

    cs.CV cs.CL

    Improving Text-to-Image Consistency via Automatic Prompt Optimization

    Authors: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

    Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  4. arXiv:2310.02567  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Improving Automatic VQA Evaluation Using Large Language Models

    Authors: Oscar Mañas, Benno Krojer, Aishwarya Agrawal

    Abstract: 8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the… ▽ More

    Submitted 10 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at AAAI 2024 (main track)

  5. arXiv:2210.07179  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

    Authors: Oscar Mañas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal

    Abstract: Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation… ▽ More

    Submitted 14 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at EACL 2023 (main track); 26 pages, 21 figures, 6 tables; Pau Rodriguez and Saba Ahmadi had equal contributions

  6. arXiv:2103.16607  [pdf, other

    cs.CV

    Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

    Authors: Oscar Mañas, Alexandre Lacoste, Xavier Giro-i-Nieto, David Vazquez, Pau Rodriguez

    Abstract: Remote sensing and automatic earth monitoring are key to solve global-scale challenges such as disaster prevention, land use monitoring, or tackling climate change. Although there exist vast amounts of remote sensing data, most of it remains unlabeled and thus inaccessible for supervised learning algorithms. Transfer learning approaches can reduce the data requirements of deep learning algorithms.… ▽ More

    Submitted 3 May, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

  7. arXiv:2007.02180  [pdf, other

    eess.IV cs.CV

    A Weakly Supervised Consistency-based Learning Method for COVID-19 Segmentation in CT Images

    Authors: Issam Laradji, Pau Rodriguez, Oscar Mañas, Keegan Lensink, Marco Law, Lironne Kurzman, William Parker, David Vazquez, Derek Nowrouzezahrai

    Abstract: Coronavirus Disease 2019 (COVID-19) has spread aggressively across the world causing an existential health crisis. Thus, having a system that automatically detects COVID-19 in tomography (CT) images can assist in quantifying the severity of the illness. Unfortunately, labelling chest CT scans requires significant domain expertise, time, and effort. We address these labelling challenges by only req… ▽ More

    Submitted 7 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.