Zum Hauptinhalt springen

Showing 1–20 of 20 results for author: Romero-Soriano, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11988  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Decomposed evaluations of geographic disparities in text-to-image models

    Authors: Abhishek Sureddy, Dishant Padalia, Nandhinee Periyakaruppa, Oindrila Saha, Adina Williams, Adriana Romero-Soriano, Megan Richards, Polina Kirichenko, Melissa Hall

    Abstract: Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these dispa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.04551  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

    Authors: Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases. Recent work has found that state-of-the-art models struggle to depict everyday objects with the true diversity of the real world and have notable gaps between geographic regions. In this work, we aim to increase the diversity of generated images of common objects… ▽ More

    Submitted 2 August, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2403.17804  [pdf, other

    cs.CV cs.CL

    Improving Text-to-Image Consistency via Automatic Prompt Optimization

    Authors: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

    Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  4. arXiv:2403.14421  [pdf, other

    cs.LG cs.CR cs.CV

    DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

    Authors: Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo

    Abstract: Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guara… ▽ More

    Submitted 13 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  5. arXiv:2401.01990  [pdf, other

    cs.CV cs.AI cs.LG

    GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning

    Authors: Aarash Feizi, Randall Balestriero, Adriana Romero-Soriano, Reihaneh Rabbany

    Abstract: We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised Learning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS… ▽ More

    Submitted 9 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  6. arXiv:2312.08578  [pdf, other

    cs.CV

    A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

    Authors: Jack Urbanek, Florian Bordes, Pietro Astolfi, Mary Williamson, Vasu Sharma, Adriana Romero-Soriano

    Abstract: Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of available curated captions are far too short to capture the rich visual detail in an image. To show the value of dense and highly-aligned image-text pairs, we collect the Densely Captioned Images (DCI) dataset, containing 7805 natural images human-annotated with ma… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  7. arXiv:2310.00158  [pdf, other

    cs.CV cs.AI cs.LG

    Feedback-guided Data Synthesis for Imbalanced Classification

    Authors: Reyhane Askari Hemmat, Mohammad Pezeshki, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Current status quo in machine learning is to use static datasets of real images for training, which often come from long-tailed distributions. With the recent advances in generative models, researchers have started augmenting these static datasets with synthetic data, reporting moderate performance improvements on classification tasks. We hypothesize that these performance gains are limited by the… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  8. arXiv:2305.17589  [pdf, other

    cs.LG cs.AI

    Graph Inductive Biases in Transformers without Message Passing

    Authors: Liheng Ma, Chen Lin, Derek Lim, Adriana Romero-Soriano, Puneet K. Dokania, Mark Coates, Philip Torr, Ser-Nam Lim

    Abstract: Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used i… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Published as a conference paper at ICML 2023; 17 pages

    Journal ref: PMLR 202 (2023) 23321-23337

  9. arXiv:2305.08675  [pdf

    cs.CV

    Improved baselines for vision-language pre-training

    Authors: Enrico Fini, Pietro Astolfi, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal

    Abstract: Contrastive learning has emerged as an efficient framework to learn multimodal representations. CLIP, a seminal work in this area, achieved impressive results by training on paired image-text data using the contrastive loss. Recent work claims improvements over CLIP using additional non-contrastive losses inspired from self-supervised learning. However, it is sometimes hard to disentangle the cont… ▽ More

    Submitted 4 November, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: TMLR, featured certification; changelog at https://openreview.net/forum?id=a7nvXxNmdV

    Journal ref: Transactions on Machine Learning Research, 10/2023, issn 2835-8856

  10. arXiv:2304.13722  [pdf, other

    cs.CV

    Controllable Image Generation via Collage Representations

    Authors: Arantxa Casanova, Marlène Careil, Adriana Romero-Soriano, Christopher J. Pal, Jakob Verbeek, Michal Drozdzal

    Abstract: Recent advances in conditional generative image models have enabled impressive results. On the one hand, text-based conditional models have achieved remarkable generation quality, by leveraging large-scale datasets of image-text pairs. To enable fine-grained controllability, however, text-based models require long prompts, whose details may be ignored by the model. On the other hand, layout-based… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  11. arXiv:2303.09677  [pdf, other

    cs.CV

    Instance-Conditioned GAN Data Augmentation for Representation Learning

    Authors: Pietro Astolfi, Arantxa Casanova, Jakob Verbeek, Pascal Vincent, Adriana Romero-Soriano, Michal Drozdzal

    Abstract: Data augmentation has become a crucial component to train state-of-the-art visual representation models. However, handcrafting combinations of transformations that lead to improved performances is a laborious task, which can result in visually unrealistic samples. To overcome these limitations, recent works have explored the use of generative models as learnable data augmentation tools, showing pr… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: TMLR reviews at https://openreview.net/forum?id=1n7q9mxG3T&referrer=%5BTMLR%5D(%2Fgroup%3Fid%3DTMLR)

  12. arXiv:2302.07960  [pdf, other

    cs.LG cs.HC

    Learning to Substitute Ingredients in Recipes

    Authors: Bahare Fatemi, Quentin Duval, Rohit Girdhar, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Recipe personalization through ingredient substitution has the potential to help people meet their dietary needs and preferences, avoid potential allergens, and ease culinary exploration in everyone's kitchen. To address ingredient substitution, we build a benchmark, composed of a dataset of substitution pairs with standardized splits, evaluation metrics, and baselines. We further introduce Graph-… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  13. arXiv:2301.00512  [pdf, other

    cs.LG

    On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects

    Authors: Sumana Basu, Marc-André Legault, Adriana Romero-Soriano, Doina Precup

    Abstract: Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Ma… ▽ More

    Submitted 1 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023

  14. arXiv:2210.00978  [pdf, other

    cs.CV

    Uncertainty-Driven Active Vision for Implicit Scene Reconstruction

    Authors: Edward J. Smith, Michal Drozdzal, Derek Nowrouzezahrai, David Meger, Adriana Romero-Soriano

    Abstract: Multi-view implicit scene reconstruction methods have become increasingly popular due to their ability to represent complex scene details. Recent efforts have been devoted to improving the representation of input information and to reducing the number of views required to obtain high quality reconstructions. Yet, perhaps surprisingly, the study of which views to select to maximally improve scene u… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  15. arXiv:2207.10200  [pdf, other

    cs.CV cs.DB

    Revisiting Hotels-50K and Hotel-ID

    Authors: Aarash Feizi, Arantxa Casanova, Adriana Romero-Soriano, Reihaneh Rabbany

    Abstract: In this paper, we propose revisited versions for two recent hotel recognition datasets: Hotels50K and Hotel-ID. The revisited versions provide evaluation setups with different levels of difficulty to better align with the intended real-world application, i.e. countering human trafficking. Real-world scenarios involve hotels and locations that are not captured in the current data sets, therefore it… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ICML 2022 DataPerf Workshop

  16. arXiv:2203.16392  [pdf, other

    eess.IV cs.CV

    On learning adaptive acquisition policies for undersampled multi-coil MRI reconstruction

    Authors: Tim Bakker, Matthew Muckley, Adriana Romero-Soriano, Michal Drozdzal, Luis Pineda

    Abstract: Most current approaches to undersampled multi-coil MRI reconstruction focus on learning the reconstruction model for a fixed, equidistant acquisition trajectory. In this paper, we study the problem of joint learning of the reconstruction model together with acquisition policies. To this end, we extend the End-to-End Variational Network with learnable acquisition policies that can adapt to differen… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to MIDL 2022 as conference paper

  17. arXiv:2110.13100  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Parameter Prediction for Unseen Deep Architectures

    Authors: Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero-Soriano

    Abstract: Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of di… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021 camera ready, the code is available at https://github.com/facebookresearch/ppuda

  18. arXiv:2109.05070  [pdf, other

    cs.CV cs.LG

    Instance-Conditioned GAN

    Authors: Arantxa Casanova, Marlène Careil, Jakob Verbeek, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Generative Adversarial Networks (GANs) can generate near photo realistic images in narrow domains such as human faces. Yet, modeling complex distributions of datasets such as ImageNet and COCO-Stuff remains challenging in unconditional settings. In this paper, we take inspiration from kernel density estimation techniques and introduce a non-parametric approach to modeling distributions of complex… ▽ More

    Submitted 4 November, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted at NeurIPS2021

  19. arXiv:2105.04037  [pdf, ps, other

    cs.LG

    Graph Attention Networks with Positional Embeddings

    Authors: Liheng Ma, Reihaneh Rabbany, Adriana Romero-Soriano

    Abstract: Graph Neural Networks (GNNs) are deep learning methods which provide the current state of the art performance in node classification tasks. GNNs often assume homophily -- neighboring nodes having similar features and labels--, and therefore may not be at their full potential when dealing with non-homophilic graphs. In this work, we focus on addressing this limitation and enable Graph Attention Net… ▽ More

    Submitted 24 October, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

  20. arXiv:2012.04027  [pdf, other

    cs.CV cs.AI

    Generating unseen complex scenes: are we there yet?

    Authors: Arantxa Casanova, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: Although recent complex scene conditional generation models generate increasingly appealing scenes, it is very hard to assess which models perform better and why. This is often due to models being trained to fit different data splits, and defining their own experimental setups. In this paper, we propose a methodology to compare complex scene conditional generation models, and provide an in-depth a… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.