Skip to main content

Showing 1–50 of 120 results for author: Akata, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10910  [pdf, other

    cs.CV cs.LG

    DataDream: Few-shot Guided Dataset Generation

    Authors: Jae Myung Kim, Jessica Bader, Stephan Alaniz, Cordelia Schmid, Zeynep Akata

    Abstract: While text-to-image diffusion models have been shown to achieve state-of-the-art results in image synthesis, they have yet to prove their effectiveness in downstream applications. Previous work has proposed to generate data for image classifier training given limited real data access. However, these methods struggle to generate in-distribution images or depict fine-grained features, thereby hinder… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.07829  [pdf, other

    cs.LG cs.CV stat.ML

    Disentangled Representation Learning through Geometry Preservation with the Gromov-Monge Gap

    Authors: Théo Uscidda, Luca Eyring, Karsten Roth, Fabian Theis, Zeynep Akata, Marco Cuturi

    Abstract: Learning disentangled representations in an unsupervised manner is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. While remarkably difficult to solve in general, recent works have shown that disentanglement is provably achievable under additional assumptions that can leverage geometrical constraints, such as… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  3. arXiv:2407.03004  [pdf, other

    cs.CL cs.AI

    SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research

    Authors: Meghal Dani, Muthu Jeyanthi Prakash, Zeynep Akata, Stefanie Liebe

    Abstract: Large Language Models have shown promising results in their ability to encode general medical knowledge in standard medical question-answering datasets. However, their potential application in clinical practice requires evaluation in domain-specific tasks, where benchmarks are largely missing. In this study semioLLM, we test the ability of state-of-the-art LLMs (GPT-3.5, GPT-4, Mixtral 8x7B, and Q… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  4. arXiv:2406.09384  [pdf, other

    cs.LG cs.CV

    Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models

    Authors: Lukas Thede, Karsten Roth, Olivier J. Hénaff, Matthias Bethge, Zeynep Akata

    Abstract: With the advent and recent ubiquity of foundation models, continual learning (CL) has recently shifted from continual training from scratch to the continual adaptation of pretrained models, seeing particular success on rehearsal-free CL benchmarks (RFCL). To achieve this, most proposed methods adapt and restructure parameter-efficient finetuning techniques (PEFT) to suit the continual nature of th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 3rd Conference on Lifelong Learning Agents (CoLLAs) 2024

  5. arXiv:2406.04312  [pdf, other

    cs.CV

    ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

    Authors: Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata

    Abstract: Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optim… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint

  6. arXiv:2405.20271  [pdf, other

    cs.LG cs.CL cs.CV

    ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

    Authors: Massimo Bini, Karsten Roth, Zeynep Akata, Anna Khoreva

    Abstract: Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effecti… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024. Code available at https://github.com/mwbini/ether

  7. arXiv:2405.01531  [pdf, other

    cs.LG cs.AI cs.CV

    Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models

    Authors: Nishad Singhi, Jae Myung Kim, Karsten Roth, Zeynep Akata

    Abstract: Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Crucially, the CBM design inherently allows for human interventions, in which expert users are given the ability to modify potentially misaligned concept choices to influence the decision behavior of the model in an interpretable fashion. However, existing appro… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  8. arXiv:2404.06309  [pdf, other

    cs.CV

    Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

    Authors: David Kurzendörfer, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

    Abstract: Audio-visual zero-shot learning methods commonly build on features extracted from pre-trained models, e.g. video or audio classification models. However, existing benchmarks predate the popularization of large multi-modal models, such as CLIP and CLAP. In this work, we explore such large pre-trained models to obtain features, i.e. CLIP for visual features, and CLAP for audio features. Furthermore,… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPRw 2024 (L3D-IVU)

  9. arXiv:2402.13791  [pdf, other

    cs.LG

    Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

    Authors: Adrian Höhl, Ivica Obadic, Miguel Ángel Fernández Torres, Hiba Najjar, Dario Oliveira, Zeynep Akata, Andreas Dengel, Xiao Xiang Zhu

    Abstract: In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still mis… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  10. arXiv:2312.03759  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.DL

    How should the advent of large language models affect the practice of science?

    Authors: Marcel Binz, Stephan Alaniz, Adina Roskies, Balazs Aczel, Carl T. Bergstrom, Colin Allen, Daniel Schad, Dirk Wulff, Jevin D. West, Qiong Zhang, Richard M. Shiffrin, Samuel J. Gershman, Ven Popov, Emily M. Bender, Marco Marelli, Matthew M. Botvinick, Zeynep Akata, Eric Schulz

    Abstract: Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schu… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  11. arXiv:2311.15100  [pdf, other

    cs.CV cs.AI cs.LG

    Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation

    Authors: Luca Eyring, Dominik Klein, Théo Uscidda, Giovanni Palla, Niki Kilbertus, Zeynep Akata, Fabian Theis

    Abstract: In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, whi… ▽ More

    Submitted 11 March, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: ICLR 2024

  12. arXiv:2311.08396  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Zero-shot audio captioning with audio-language model guidance and audio context keywords

    Authors: Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata

    Abstract: Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audio content that contains spoken language into text, audio captioning is commonly concerned with ambient sounds, or sounds produced by a human performing an action. Inspired by zero-shot image captionin… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023 - Machine Learning for Audio Workshop (Oral)

  13. arXiv:2311.05043  [pdf, other

    cs.CV cs.AI cs.CL

    Zero-shot Translation of Attention Patterns in VQA Models to Natural Language

    Authors: Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

    Abstract: Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). Z… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Published in GCPR 2023

  14. arXiv:2310.17653  [pdf, other

    cs.LG cs.CV

    Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

    Authors: Karsten Roth, Lukas Thede, Almut Sophia Koepke, Oriol Vinyals, Olivier Hénaff, Zeynep Akata

    Abstract: Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from the data. Using public model libraries comprising thousands of models trained on canonical datasets like ImageNet, we observe that for arbitrary pairings of pre… ▽ More

    Submitted 26 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 (spotlight)

  15. arXiv:2310.15999  [pdf, other

    cs.CV cs.LG

    Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships

    Authors: Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

    Abstract: Recent advances in fine-grained representation learning leverage local-to-global (emergent) relationships for achieving state-of-the-art results. The relational representations relied upon by such methods, however, are abstract. We aim to deconstruct this abstraction by expressing them as interpretable graphs over image views. We begin by theoretically showing that abstract relational representati… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Neural Information Processing Systems (NeurIPS) 2023

  16. arXiv:2310.09291  [pdf, other

    cs.CV

    Vision-by-Language for Training-Free Compositional Image Retrieval

    Authors: Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

    Abstract: Given an image and a target modification (e.g an image of the Eiffel tower and the text "without people and at night-time"), Compositional Image Retrieval (CIR) aims to retrieve the relevant target image in a database. While supervised approaches rely on annotating triplets that is costly (i.e. query image, textual modification, and target image), recent research sidesteps this need by using large… ▽ More

    Submitted 26 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  17. arXiv:2309.15086  [pdf, other

    cs.CV

    Video-adverb retrieval with compositional adverb-action embeddings

    Authors: Thomas Hummel, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

    Abstract: Retrieving adverbs that describe an action in a video poses a crucial step towards fine-grained video understanding. We propose a framework for video-to-adverb retrieval (and vice versa) that aligns video embeddings with their matching compositional adverb-action text embedding in a joint embedding space. The compositional adverb-action text embedding is learned using a residual gating mechanism,… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: BMVC 2023 (Oral)

  18. arXiv:2309.03869  [pdf, other

    cs.CV

    Text-to-feature diffusion for audio-visual few-shot learning

    Authors: Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

    Abstract: Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process. A challenging and underexplored, yet much cheaper, setup is few-shot learning from video data. In particular, the inherently multi-modal nature of video data with sound and visual information has not been leveraged extensively for… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: DAGM GCPR 2023

  19. arXiv:2309.03173  [pdf, other

    cs.CV

    PDiscoNet: Semantically consistent part discovery for fine-grained recognition

    Authors: Robert van der Klis, Stephan Alaniz, Massimiliano Mancini, Cassio F. Dantas, Dino Ienco, Zeynep Akata, Diego Marcos

    Abstract: Fine-grained classification often requires recognizing specific object parts, such as beak shape and wing patterns for birds. Encouraging a fine-grained classification model to first detect such parts and then using them to infer the class could help us gauge whether the model is indeed looking at the right details better than with interpretability methods that provide a single attribution map. We… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 9 pages, 8 figures, ICCV

  20. arXiv:2309.02102  [pdf, other

    cs.CV cs.AI cs.LG

    Iterative Superquadric Recomposition of 3D Objects from Multiple Views

    Authors: Stephan Alaniz, Massimiliano Mancini, Zeynep Akata

    Abstract: Humans are good at recomposing novel objects, i.e. they can identify commonalities between unknown objects from general structure to finer detail, an ability difficult to replicate by machines. We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision. To achieve this, we optimize the super… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted at ICCV 2023

  21. arXiv:2309.01617  [pdf, other

    cs.CV cs.AI cs.LG

    DeViL: Decoding Vision features into Language

    Authors: Meghal Dani, Isabel Rio-Torto, Stephan Alaniz, Zeynep Akata

    Abstract: Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks. In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned. Our DeViL method decodes vision features into language, not only highlighting the attribution locations but also generating textual descript… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted at GCPR 2023 (Oral)

  22. arXiv:2308.10599  [pdf, other

    cs.CV cs.LG

    Image-free Classifier Injection for Zero-Shot Classification

    Authors: Anders Christensen, Massimiliano Mancini, A. Sophia Koepke, Ole Winther, Zeynep Akata

    Abstract: Zero-shot learning models achieve remarkable results on image classification for samples from classes that were not seen during training. However, such models must be trained from scratch with specialised methods: therefore, access to a training dataset is required when the need for zero-shot classification arises. In this paper, we aim to equip pre-trained models with zero-shot classification cap… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  23. arXiv:2307.10865  [pdf, other

    cs.LG stat.ML

    Addressing caveats of neural persistence with deep graph persistence

    Authors: Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke

    Abstract: Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. Whilst this captures useful informatio… ▽ More

    Submitted 20 November, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Transactions on Machine Learning Research (TMLR), 2023

  24. arXiv:2307.00398  [pdf, other

    cs.CV cs.AI cs.LG

    ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

    Authors: Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

    Abstract: Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inh… ▽ More

    Submitted 28 September, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: ICCV 2023

  25. arXiv:2306.07282  [pdf, other

    cs.CV cs.LG

    Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

    Authors: Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata

    Abstract: The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3. In particular, averaging over LLM-generated class descriptors, e.g. "waffle, which has a round shape", can notably improve generalization performance. In this work, we critically study this behavior and propose Wa… ▽ More

    Submitted 16 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted to ICCV 2023. Main paper with 9 pages

  26. arXiv:2305.17520  [pdf, other

    cs.CV cs.AI

    USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense Active Learning for Super-resolution

    Authors: Vikrant Rangnekar, Uddeshya Upadhyay, Zeynep Akata, Biplab Banerjee

    Abstract: Dense regression is a widely used approach in computer vision for tasks such as image super-resolution, enhancement, depth estimation, etc. However, the high cost of annotation and labeling makes it challenging to achieve accurate results. We propose incorporating active learning into dense regression models to address this problem. Active learning allows models to select the most informative samp… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted at UAI 2023

  27. arXiv:2305.14930  [pdf, other

    cs.AI cs.CL cs.LG

    In-Context Impersonation Reveals Large Language Models' Strengths and Biases

    Authors: Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, Zeynep Akata

    Abstract: In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social ident… ▽ More

    Submitted 26 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published in NeurIPS 2023 (Spotlight)

  28. arXiv:2305.13308  [pdf, other

    cs.CV

    If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

    Authors: Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

    Abstract: Despite their impressive capabilities, diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt, where generated images may not contain all the mentioned objects, attributes or relations. To alleviate these issues, recent works proposed post-hoc methods to improve model faithfulness without costly retraining, by modifying how the model utilizes the input prompt. In this… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  29. arXiv:2305.12907  [pdf, other

    cs.CL cs.AI cs.LG

    Meta-in-context learning in large language models

    Authors: Julian Coda-Forno, Marcel Binz, Zeynep Akata, Matthew Botvinick, Jane X. Wang, Eric Schulz

    Abstract: Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learnin… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  30. arXiv:2304.11111  [pdf, other

    cs.CL cs.AI cs.LG

    Inducing anxiety in large language models increases exploration and bias

    Authors: Julian Coda-Forno, Kristin Witte, Akshay K. Jagadish, Marcel Binz, Zeynep Akata, Eric Schulz

    Abstract: Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these m… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  31. arXiv:2304.03391  [pdf, other

    cs.CV

    Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

    Authors: Jae Myung Kim, A. Sophia Koepke, Cordelia Schmid, Zeynep Akata

    Abstract: Cross-modal retrieval methods are the preferred tool to search databases for the text that best matches a query image and vice versa. However, image-text retrieval models commonly learn to memorize spurious correlations in the training data, such as frequent object co-occurrence, instead of looking at the actual underlying reasons for the prediction in the image. For image-text retrieval, this man… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: CVPR'23 MULA Workshop

  32. arXiv:2304.01804  [pdf, other

    cs.CV

    Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

    Authors: Youngwook Kim, Jae Myung Kim, Jieun Jeong, Cordelia Schmid, Zeynep Akata, Jungwoo Lee

    Abstract: Due to the expensive costs of collecting labels in multi-label classification datasets, partially annotated multi-label classification has become an emerging field in computer vision. One baseline approach to this task is to assume unobserved labels as negative labels, but this assumption induces label noise as a form of false negative. To understand the negative impact caused by false negative la… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: CVPR2023 Camera-ready

  33. arXiv:2302.11012  [pdf, other

    cs.LG cs.AI cs.CV

    Likelihood Annealing: Fast Calibrated Uncertainty for Regression

    Authors: Uddeshya Upadhyay, Jae Myung Kim, Cordelia Schmidt, Bernhard Schölkopf, Zeynep Akata

    Abstract: Recent advances in deep learning have shown that uncertainty estimation is becoming increasingly important in applications such as medical imaging, natural language processing, and autonomous systems. However, accurately quantifying uncertainty remains a challenging problem, especially in regression tasks where the output space is continuous. Deep learning approaches that allow uncertainty estimat… ▽ More

    Submitted 2 July, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

  34. arXiv:2212.07911  [pdf, other

    cs.CV

    Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

    Authors: Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele

    Abstract: For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as we… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at WACV 2023

  35. arXiv:2211.13264  [pdf, other

    cs.CV cs.LG

    Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

    Authors: Yuchen Ma, Yanbei Chen, Zeynep Akata

    Abstract: Recent advances have indicated the strengths of self-supervised pre-training for improving representation learning on downstream tasks. Existing works often utilize self-supervised pre-trained models by fine-tuning on downstream tasks. However, fine-tuning does not generalize to the case when one needs to build a customized model architecture different from the self-supervised model. In this work,… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: British Machine Vision Conference (BMVC 2022)

  36. arXiv:2211.03186  [pdf, ps, other

    cs.LG cs.CV

    Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

    Authors: Zafir Stojanovski, Karsten Roth, Zeynep Akata

    Abstract: Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distr… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: First Workshop on Interpolation Regularizers and Beyond, NeurIPS 2022 (Spotlight) and Workshop on Distribution Shifts, NeurIPS 2022

  37. arXiv:2210.14222  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    PlanT: Explainable Planning Transformers via Object-Level Representations

    Authors: Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, Andreas Geiger

    Abstract: Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a nove… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: CoRL 2022. Project Page: https://www.katrinrenz.de/plant/

  38. arXiv:2210.10486  [pdf, other

    cs.CV cs.LG

    Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval

    Authors: Abhra Chaudhuri, Massimiliano Mancini, Yanbei Chen, Zeynep Akata, Anjan Dutta

    Abstract: Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information. As instances from different modalities can often provide complementary information describing the underlying concept, we propose a cross-attention framework for Vision Transformers (XModalViT) that fuses modality-specific information instead of discard… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: British Machine Vision Conference (BMVC) 2022

  39. arXiv:2210.07347  [pdf, other

    cs.LG stat.ML

    Disentanglement of Correlated Factors via Hausdorff Factorized Support

    Authors: Karsten Roth, Mark Ibrahim, Zeynep Akata, Pascal Vincent, Diane Bouchacourt

    Abstract: A grand goal in deep learning research is to learn representations capable of generalizing across distribution shifts. Disentanglement is one promising direction aimed at aligning a model's representation with the underlying factors generating the data (e.g. color or background). Existing disentanglement methods, however, rely on an often unrealistic assumption: that factors are statistically inde… ▽ More

    Submitted 25 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to ICLR 2023

  40. arXiv:2210.02149  [pdf, other

    cs.CV cs.LG

    Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

    Authors: Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

    Abstract: Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object. We propose Relational Proxies, a novel approach that leverages the relational information between the global and local views of an object for encoding its semantic label. Star… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Neural Information Processing Systems (NeurIPS) 2022

  41. arXiv:2209.02536  [pdf, other

    cs.CV cs.AI

    Semantic Image Synthesis with Semantically Coupled VQ-Model

    Authors: Stephan Alaniz, Thomas Hummel, Zeynep Akata

    Abstract: Semantic image synthesis enables control over unconditional image generation by allowing guidance on what is being generated. We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images. Instead of training an autoregressive Transformer on separately learned conditioning latents and image latents, we find that jointly learning the conditio… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: ICLR 2022 DGM4HSD

  42. arXiv:2208.11296  [pdf, other

    cs.CV cs.AI cs.LG

    Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

    Authors: Yanbei Chen, Massimiliano Mancini, Xiatian Zhu, Zeynep Akata

    Abstract: State-of-the-art deep learning models are often trained with a large amount of costly labeled training data. However, requiring exhaustive manual annotations may degrade the model's generalizability in the limited-label regime. Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data. Recent progress in these paradigms has ind… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

  43. arXiv:2207.13543  [pdf, other

    cs.CV cs.AI

    Abstracting Sketches through Simple Primitives

    Authors: Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, Zeynep Akata

    Abstract: Humans show high-level of abstraction capabilities in games that require quickly communicating object information. They decompose the message content into multiple parts and communicate them in an interpretable protocol. Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primi… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: European Conference on Computer Vision (ECCV) 2022

  44. arXiv:2207.09966  [pdf, other

    cs.CV

    Temporal and cross-modal attention for audio-visual zero-shot learning

    Authors: Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

    Abstract: Audio-visual generalised zero-shot learning for video classification requires understanding the relations between the audio and visual information in order to be able to recognise samples from novel, previously unseen classes at test time. The natural semantic and temporal alignment between audio and visual data in video data can be exploited to learn powerful representations that generalise to un… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  45. arXiv:2207.06873  [pdf, other

    cs.CV cs.AI

    BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks

    Authors: Uddeshya Upadhyay, Shyamgopal Karthik, Yanbei Chen, Massimiliano Mancini, Zeynep Akata

    Abstract: High-quality calibrated uncertainty estimates are crucial for numerous real-world applications, especially for deep learning-based deployed ML systems. While Bayesian deep learning techniques allow uncertainty estimation, training them with large-scale datasets is an expensive process that does not always yield models competitive with non-Bayesian counterparts. Moreover, many of the high-performin… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022. Code is available at https://github.com/ExplainableML/BayesCap

  46. arXiv:2207.03784  [pdf, other

    cs.LG stat.ML

    A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning

    Authors: Michael Kirchhof, Karsten Roth, Zeynep Akata, Enkelejda Kasneci

    Abstract: Proxy-based Deep Metric Learning (DML) learns deep representations by embedding images close to their class representatives (proxies), commonly with respect to the angle between them. However, this disregards the embedding norm, which can carry additional beneficial context such as class- or image-intrinsic uncertainty. In addition, proxy-based DML struggles to learn class-internal structures. To… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: Accepted as conference paper at ECCV 2022

  47. arXiv:2206.07387  [pdf, other

    cs.LG cs.CV

    The Manifold Hypothesis for Gradient-Based Explanations

    Authors: Sebastian Bordt, Uddeshya Upadhyay, Zeynep Akata, Ulrike von Luxburg

    Abstract: When do gradient-based explanation algorithms provide perceptually-aligned explanations? We propose a criterion: the feature attributions need to be aligned with the tangent space of the data manifold. To provide evidence for this hypothesis, we introduce a framework based on variational autoencoders that allows to estimate and generate image manifolds. Through experiments across a range of differ… ▽ More

    Submitted 15 July, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Extended version of a CVPR Workshop paper, available at https://openaccess.thecvf.com/content/CVPR2023W/XAI4CV/papers/Bordt_The_Manifold_Hypothesis_for_Gradient-Based_Explanations_CVPRW_2023_paper.pdf

  48. arXiv:2206.06404  [pdf, other

    cs.CV cs.AI cs.LG

    Compositional Mixture Representations for Vision and Text

    Authors: Stephan Alaniz, Marco Federici, Zeynep Akata

    Abstract: Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU), CVPR 2022

  49. arXiv:2206.03740  [pdf, other

    cs.CV

    Large Loss Matters in Weakly Supervised Multi-Label Classification

    Authors: Youngwook Kim, Jae Myung Kim, Zeynep Akata, Jungwoo Lee

    Abstract: Weakly supervised multi-label classification (WSML) task, which is to learn a multi-label classification using partially observed labels per image, is becoming increasingly important due to its huge annotation cost. In this work, we first regard unobserved labels as negative labels, casting the WSML task into noisy multi-label classification. From this point of view, we empirically observe that me… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: CVPR 2022. First two authors contributed equally

  50. arXiv:2205.06784  [pdf, other

    cs.CV

    KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning

    Authors: Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

    Abstract: The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images, given only a subset of them during training and no prior on the unseen compositions. In this setting, models operate on a huge output space, containing all possible state-object compositions. While previous works tackle the problem by learning embeddings for the compositio… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: CVPR 2022