Zum Hauptinhalt springen

Showing 1–50 of 90 results for author: Ferrari, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20034  [pdf, other

    cs.CV

    MaskInversion: Localized Embeddings via Optimization of Explainability Maps

    Authors: Walid Bousselham, Sofian Chaybouti, Christian Rupprecht, Vittorio Ferrari, Hilde Kuehne

    Abstract: Vision-language foundation models such as CLIP have achieved tremendous results in global vision-language alignment, but still show some limitations in creating representations for specific image regions. % To address this problem, we propose MaskInversion, a method that leverages the feature representations of pre-trained foundation models, such as CLIP, to generate a context-aware embedding for… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Project page: https://walidbousselham.com/MaskInversion

  2. arXiv:2407.10730  [pdf, other

    cs.CV cs.PF

    ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation

    Authors: Lucas Alvarenga, Victor Ferrari, Rafael Souza, Marcio Pereira, Guido Araujo

    Abstract: Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the comparison of different convolution algorithms is an error-prone task as it requires specific data layouts and system resources. Failure to address these require… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 5 pages, 3 figures, presented on MLArchSys workshop of ISCA'2024

  3. arXiv:2404.05465  [pdf, other

    cs.CV cs.LG

    HAMMR: HierArchical MultiModal React agents for generic VQA

    Authors: Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

    Abstract: Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal probl… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  4. arXiv:2312.00878  [pdf, other

    cs.CV cs.AI

    Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

    Authors: Walid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne

    Abstract: Vision-language foundation models have shown remarkable performance in various zero-shot settings such as image retrieval, classification, or captioning. But so far, those models seem to fall behind when it comes to zero-shot localization of referential expressions and objects in images. As a result, they need to be fine-tuned for this task. In this paper, we show that pretrained vision-language (… ▽ More

    Submitted 14 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Code available at https://github.com/WalBouss/GEM

  5. arXiv:2312.00357  [pdf

    eess.IV cs.CV cs.LG

    A Generalizable Deep Learning System for Cardiac MRI

    Authors: Rohan Shad, Cyril Zakka, Dhamanpreet Kaur, Robyn Fong, Ross Warren Filice, John Mongan, Kimberly Kalianos, Nishith Khandwala, David Eng, Matthew Leipzig, Walter Witschey, Alejandro de Feria, Victor Ferrari, Euan Ashley, Michael A. Acker, Curtis Langlotz, William Hiesinger

    Abstract: Cardiac MRI allows for a comprehensive assessment of myocardial structure, function, and tissue characteristics. Here we describe a foundational vision system for cardiac MRI, capable of representing the breadth of human cardiovascular disease and health. Our deep learning model is trained via self-supervised contrastive learning, by which visual concepts in cine-sequence cardiac MRI scans are lea… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 21 page main manuscript, 4 figures. Supplementary Appendix and code will be made available on publication

    ACM Class: I.2.10

  6. arXiv:2311.04587  [pdf, other

    cs.SE

    Log Statements Generation via Deep Learning: Widening the Support Provided to Developers

    Authors: Antonio Mastropaolo, Valentina Ferrari, Luca Pascarella, Gabriele Bavota

    Abstract: Logging assists in monitoring events that transpire during the execution of software. Previous research has highlighted the challenges confronted by developers when it comes to logging, including dilemmas such as where to log, what data to record, and which log level to employ (e.g., info, fatal). In this context, we introduced LANCE, an approach rooted in deep learning (DL) that has demonstrated… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  7. arXiv:2308.16139  [pdf, other

    cs.CV cs.DB cs.LG

    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

    Authors: Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen , et al. (132 additional authors not shown)

    Abstract: Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of Shape… ▽ More

    Submitted 12 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 16 pages

    MSC Class: 68T01

  8. arXiv:2308.11606  [pdf, other

    cs.CV cs.CL

    StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

    Authors: Emanuele Bugliarello, Hernan Moraldo, Ruben Villegas, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Han Zhang, Dumitru Erhan, Vittorio Ferrari, Pieter-Jan Kindermans, Paul Voigtlaender

    Abstract: Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark for video generation requires data annotated over time, which contrasts with the single caption used often in video datasets. To fill this gap, we collect compre… ▽ More

    Submitted 12 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: NeurIPS D&B 2023

  9. arXiv:2306.09224  [pdf, other

    cs.CV

    Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

    Authors: Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

    Abstract: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi… ▽ More

    Submitted 24 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: ICCV'23

  10. arXiv:2306.09109  [pdf, other

    cs.CV

    NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations

    Authors: Varun Jampani, Kevis-Kokitsi Maninis, Andreas Engelhardt, Arjun Karpur, Karen Truong, Kyle Sargent, Stefan Popov, André Araujo, Ricardo Martin-Brualla, Kaushal Patel, Daniel Vlasic, Vittorio Ferrari, Ameesh Makadia, Ce Liu, Yuanzhen Li, Howard Zhou

    Abstract: Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search… ▽ More

    Submitted 13 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 camera ready. Project page: https://navidataset.github.io

  11. arXiv:2306.09077  [pdf, other

    cs.CV cs.GR

    Estimating Generic 3D Room Structures from 2D Annotations

    Authors: Denys Rozumnyi, Stefan Popov, Kevis-Kokitsi Maninis, Matthias Nießner, Vittorio Ferrari

    Abstract: Indoor rooms are among the most common use cases in 3D scene understanding. Current state-of-the-art methods for this task are driven by large annotated datasets. Room layouts are especially important, consisting of structural elements in 3D, such as wall, floor, and ceiling. However, they are difficult to annotate, especially on pure RGB video. We propose a novel method to produce generic 3D room… ▽ More

    Submitted 21 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: https://github.com/google-research/cad-estate Accepted at 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

  12. arXiv:2306.09011  [pdf, other

    cs.CV

    CAD-Estate: Large-scale CAD Model Annotation in RGB Videos

    Authors: Kevis-Kokitsi Maninis, Stefan Popov, Matthias Nießner, Vittorio Ferrari

    Abstract: We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects. We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation. Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor. Many steps are… ▽ More

    Submitted 14 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Project page: https://github.com/google-research/cad-estate

  13. arXiv:2304.06419  [pdf, other

    cs.CV cs.GR

    Tracking by 3D Model Estimation of Unknown Objects in Videos

    Authors: Denys Rozumnyi, Jiri Matas, Marc Pollefeys, Vittorio Ferrari, Martin R. Oswald

    Abstract: Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. Our representation tackles… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  14. arXiv:2303.04739  [pdf, other

    cs.CV cs.AR cs.LG cs.PF

    Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions

    Authors: Victor Ferrari, Rafael Sousa, Marcio Pereira, João P. L. de Carvalho, José Nelson Amaral, José Moreira, Guido Araujo

    Abstract: Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv: a direct-convolution algorithm based on a MLIR/LLVM code-generation toolchain that can be integrated into machine-learning compilers . This algorithm introduce… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: 15 pages, 11 figures

  15. arXiv:2302.12948  [pdf, other

    cs.LG cs.AI cs.CV

    Agile Modeling: From Concept to Classifier in Minutes

    Authors: Otilia Stretcu, Edward Vendrow, Kenji Hata, Krishnamurthy Viswanathan, Vittorio Ferrari, Sasan Tavakkol, Wenlei Zhou, Aditya Avinash, Enming Luo, Neil Gordon Alldrin, MohammadHossein Bateni, Gabriel Berger, Andrew Bunner, Chun-Ta Lu, Javier A Rey, Giulia DeSalvo, Ranjay Krishna, Ariel Fuxman

    Abstract: The application of computer vision to nuanced subjective use cases is growing. While crowdsourcing has served the vision community well for most objective tasks (such as labeling a "zebra"), it now falters on tasks where there is substantial subjectivity in the concept (such as identifying "gourmet tuna"). However, empowering any user to develop a classifier for their concept is technically diffic… ▽ More

    Submitted 12 May, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

  16. arXiv:2302.11217  [pdf, other

    cs.CV

    Connecting Vision and Language with Video Localized Narratives

    Authors: Paul Voigtlaender, Soravit Changpinyo, Jordi Pont-Tuset, Radu Soricut, Vittorio Ferrari

    Abstract: We propose Video Localized Narratives, a new form of multimodal video annotations connecting vision and language. In the original Localized Narratives, annotators speak and move their mouse simultaneously on an image, thus grounding each word with a mouse trace segment. However, this is challenging on a video. Our new protocol empowers annotators to tell the story of a video with Localized Narrati… ▽ More

    Submitted 15 March, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted at CVPR 2023

  17. Evaluation of the potential of Near Infrared Hyperspectral Imaging for monitoring the invasive brown marmorated stink bug

    Authors: Veronica Ferrari, Rosalba Calvini, Bas Boom, Camilla Menozzi, Aravind Krishnaswamy Rangarajan, Lara Maistrello, Peter Offermans, Alessandro Ulrici

    Abstract: The brown marmorated stink bug (BMSB), Halyomorpha halys, is an invasive insect pest of global importance that damages several crops, compromising agri-food production. Field monitoring procedures are fundamental to perform risk assessment operations, in order to promptly face crop infestations and avoid economical losses. To improve pest management, spectral cameras mounted on Unmanned Aerial Veh… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: Accepted manuscript

    Journal ref: Chemometrics and Intelligent Laboratory Systems, 2023, 234, 104751

  18. arXiv:2212.11920  [pdf, other

    cs.CV

    Beyond SOT: Tracking Multiple Generic Objects at Once

    Authors: Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc Van Gool, Alina Kuznetsova

    Abstract: Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the la… ▽ More

    Submitted 25 February, 2024; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: accepted by WACV'24

  19. arXiv:2210.14142  [pdf, other

    cs.CV

    From colouring-in to pointillism: revisiting semantic segmentation supervision

    Authors: Rodrigo Benenson, Vittorio Ferrari

    Abstract: The prevailing paradigm for producing semantic segmentation training data relies on densely labelling each pixel of each image in the training set, akin to colouring-in books. This approach becomes a bottleneck when scaling up in the number of images, classes, and annotators. Here we propose instead a pointillist approach for semantic segmentation annotation, where only point-wise yes/no questions… ▽ More

    Submitted 17 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Open Images V7 available at https://g.co/dataset/open-images

  20. arXiv:2210.07670  [pdf, other

    cs.CV

    Multi-View Photometric Stereo Revisited

    Authors: Berk Kaya, Suryansh Kumar, Carlos Oliveira, Vittorio Ferrari, Luc Van Gool

    Abstract: Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images. Although popular methods for MVPS can provide outstanding results, they are often complex to execute and limited to isotropic material objects. To address such limitations, we present a simple, practical approach to MVPS, which works well for isotropic as well as other objec… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted for publication at IEEE/CVF WACV 2023. Draft info: 10 pages, 5 figure, and 3 tables

  21. arXiv:2206.04453  [pdf, other

    cs.CV

    The Missing Link: Finding label relations across datasets

    Authors: Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

    Abstract: Computer vision is driven by the many datasets available for training or evaluating novel methods. However, each dataset has a different set of class labels, visual definition of classes, images following a specific distribution, annotation protocols, etc. In this paper we explore the automatic discovery of visual-semantic relations between labels across datasets. We aim to understand how instance… ▽ More

    Submitted 9 August, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: ECCV 2022

  22. arXiv:2204.01403  [pdf, other

    cs.CV

    How stable are Transferability Metrics evaluations?

    Authors: Andrea Agostinelli, Michal Pándy, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

    Abstract: Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely on custom experimental setups which differ across papers, leading to inconsistent conclusions about which transferability metrics work best. In this pa… ▽ More

    Submitted 20 October, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: ECCV 2022

  23. arXiv:2203.13296  [pdf, other

    cs.CV

    RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers

    Authors: Michał J. Tyszkiewicz, Kevis-Kokitsi Maninis, Stefan Popov, Vittorio Ferrari

    Abstract: We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos. It relies on two alternative ways to represent its knowledge: as a global 3D grid of features and an array of view-specific 2D grids. We progressively exchange information between the two with a dedicated bidirectional attention mechanism. We exploit knowledge about the image formation pr… ▽ More

    Submitted 26 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: ECCV 2022 camera ready

  24. arXiv:2202.13071  [pdf, other

    cs.CV

    Uncertainty-Aware Deep Multi-View Photometric Stereo

    Authors: Berk Kaya, Suryansh Kumar, Carlos Oliveira, Vittorio Ferrari, Luc Van Gool

    Abstract: This paper presents a simple and effective solution to the longstanding classical multi-view photometric stereo (MVPS) problem. It is well-known that photometric stereo (PS) is excellent at recovering high-frequency surface details, whereas multi-view stereo (MVS) can help remove the low-frequency distortion due to PS and retain the global geometry of the shape. This paper proposes an approach tha… ▽ More

    Submitted 28 March, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Accepted for publication in IEEE/CVF CVPR 2022. (11 Pages, 6 Figures, 3 Tables)

  25. arXiv:2111.14643  [pdf, other

    cs.CV cs.GR

    Urban Radiance Fields

    Authors: Konstantinos Rematas, Andrew Liu, Pratul P. Srinivasan, Jonathan T. Barron, Andrea Tagliasacchi, Thomas Funkhouser, Vittorio Ferrari

    Abstract: The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments (e.g., Street View). Given a sequence of posed RGB images and lidar sweeps acquired by cameras and scanners moving through an outdoor scene, we produce a model from which 3D surfaces can be extracted and novel RGB… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: Project: https://urban-radiance-fields.github.io/

  26. arXiv:2111.14465  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in Videos

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Marc Pollefeys

    Abstract: We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. To this end, we model the blurred appearance of a fast moving object in a generative fashion by parametrizing its 3D position, rotation, velocity, acceleration, bounces, shape, and texture over the duration of a predefined time window spanning multiple frames. Using dif… ▽ More

    Submitted 7 April, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: CVPR 2022 camera-ready

    Journal ref: 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  27. arXiv:2111.13011  [pdf, other

    cs.CV

    Transferability Metrics for Selecting Source Model Ensembles

    Authors: Andrea Agostinelli, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

    Abstract: We address the problem of ensemble selection in transfer learning: Given a large pool of source models we want to select an ensemble of models which, after fine-tuning on the target training set, yields the best performance on the target test set. Since fine-tuning all possible ensembles is computationally prohibitive, we aim at predicting performance on the target dataset using a computationally… ▽ More

    Submitted 31 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

  28. arXiv:2111.12780  [pdf, other

    cs.CV

    Transferability Estimation using Bhattacharyya Class Separability

    Authors: Michal Pándy, Andrea Agostinelli, Jasper Uijlings, Vittorio Ferrari, Thomas Mensink

    Abstract: Transfer learning has become a popular method for leveraging pre-trained models in computer vision. However, without performing computationally expensive fine-tuning, it is difficult to quantify which pre-trained source models are suitable for a specific target task, or, conversely, to which tasks a pre-trained source model can be easily adapted to. In this work, we propose Gaussian Bhattacharyya… ▽ More

    Submitted 11 April, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Accepted for CVPR 2022

  29. arXiv:2110.05621  [pdf, other

    cs.CV

    Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo

    Authors: Francesco Sarno, Suryansh Kumar, Berk Kaya, Zhiwu Huang, Vittorio Ferrari, Luc Van Gool

    Abstract: We present an automated machine learning approach for uncalibrated photometric stereo (PS). Our work aims at discovering lightweight and computationally efficient PS neural networks with excellent surface normal accuracy. Unlike previous uncalibrated deep PS networks, which are handcrafted and carefully tuned, we leverage differentiable neural architecture search (NAS) strategy to find uncalibrate… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at IEEE/CVF, WACV 2022. (11 pages)

  30. arXiv:2110.05594  [pdf, other

    cs.CV

    Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo

    Authors: Berk Kaya, Suryansh Kumar, Francesco Sarno, Vittorio Ferrari, Luc Van Gool

    Abstract: We present a modern solution to the multi-view photometric stereo problem (MVPS). Our work suitably exploits the image formation model in a MVPS experimental setup to recover the dense 3D reconstruction of an object from images. We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at IEEE/CVF WACV 2022. 18 pages

  31. arXiv:2106.08762  [pdf, other

    cs.CV

    Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Marc Pollefeys

    Abstract: We address the novel task of jointly reconstructing the 3D shape, texture, and motion of an object from a single motion-blurred image. While previous approaches address the deblurring problem only in the 2D image domain, our proposed rigorous modeling of all object properties in the 3D domain enables the correct description of arbitrary object motion. This leads to significantly better image decom… ▽ More

    Submitted 26 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  32. A Step Toward More Inclusive People Annotations for Fairness

    Authors: Candice Schumann, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, Caroline Pantofaru

    Abstract: The Open Images Dataset contains approximately 9 million images and is a widely accepted dataset for computer vision research. As is common practice for large datasets, the annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image. In this paper, we present a new set of annotations on a subset of the Open Images dataset called the MIAP… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Journal ref: AIES (2021)

  33. arXiv:2103.13318  [pdf, other

    cs.CV

    Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types

    Authors: Thomas Mensink, Jasper Uijlings, Alina Kuznetsova, Michael Gygli, Vittorio Ferrari

    Abstract: Transfer learning enables to re-use knowledge learned on a source task to help learning a target task. A simple form of transfer learning is common in current state-of-the-art computer vision models, i.e. pre-training a model for image classification on the ILSVRC dataset, and then fine-tune on any target task. However, previous systematic studies of transfer learning have been limited and the cir… ▽ More

    Submitted 20 November, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted for future publication in TPAMI

  34. arXiv:2102.08860  [pdf, other

    cs.CV cs.GR

    ShaRF: Shape-conditioned Radiance Fields from a Single View

    Authors: Konstantinos Rematas, Ricardo Martin-Brualla, Vittorio Ferrari

    Abstract: We present a method for estimating neural scenes representations of objects given only a single image. The core of our method is the estimation of a geometric scaffold for the object and its use as a guide for the reconstruction of the underlying radiance field. Our formulation is based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, wit… ▽ More

    Submitted 23 June, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

    Comments: Project page: http://www.krematas.com/sharf/index.html

  35. arXiv:2102.04980  [pdf, other

    cs.CV cs.CL

    Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval

    Authors: Soravit Changpinyo, Jordi Pont-Tuset, Vittorio Ferrari, Radu Soricut

    Abstract: Most existing image retrieval systems use text queries as a way for the user to express what they are looking for. However, fine-grained image retrieval often requires the ability to also express where in the image the content they are looking for is. The text modality can only cumbersomely express such localization preferences, whereas pointing is a more natural fit. In this paper, we propose an… ▽ More

    Submitted 24 August, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: IEEE/CVF International Conference on Computer Vision (ICCV 2021)

  36. arXiv:2012.12554  [pdf, other

    cs.CV cs.HC

    Efficient video annotation with visual interpolation and frame selection guidance

    Authors: A. Kuznetsova, A. Talati, Y. Luo, K. Simmons, V. Ferrari

    Abstract: We introduce a unified framework for generic video annotation with bounding boxes. Video annotation is a longstanding problem, as it is a tedious and time-consuming process. We tackle two important challenges of video annotation: (1) automatic temporal interpolation and extrapolation of bounding boxes provided by a human annotator on a subset of all frames, and (2) automatic selection of frames to… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: accepted to WACV 2021

  37. arXiv:2012.11575  [pdf, other

    cs.CV

    From Points to Multi-Object 3D Reconstruction

    Authors: Francis Engelmann, Konstantinos Rematas, Bastian Leibe, Vittorio Ferrari

    Abstract: We propose a method to detect and reconstruct multiple 3D objects from a single RGB image. The key idea is to optimize for detection, alignment and shape jointly over all objects in the RGB image, while focusing on realistic and physically plausible reconstructions. To this end, we propose a keypoint detector that localizes objects as center points and directly predicts all object properties, incl… ▽ More

    Submitted 21 June, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: CVPR2021 - Project Page: https://francisengelmann.github.io/points2objects/

  38. arXiv:2012.06777  [pdf, other

    cs.CV

    Uncalibrated Neural Inverse Rendering for Photometric Stereo of General Surfaces

    Authors: Berk Kaya, Suryansh Kumar, Carlos Oliveira, Vittorio Ferrari, Luc Van Gool

    Abstract: This paper presents an uncalibrated deep neural network framework for the photometric stereo problem. For training models to solve the problem, existing neural network-based methods either require exact light directions or ground-truth surface normals of the object or both. However, in practice, it is challenging to procure both of this information precisely, which restricts the broader adoption o… ▽ More

    Submitted 17 April, 2021; v1 submitted 12 December, 2020; originally announced December 2020.

    Comments: Accepted for publication at CVPR 2021. Document info: 18 pages, 21 Figures, 5 tables. (Minor typo corrected)

  39. arXiv:2012.04641  [pdf, other

    cs.CV

    Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos

    Authors: Kevis-Kokitsi Maninis, Stefan Popov, Matthias Nießner, Vittorio Ferrari

    Abstract: We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method can process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally globa… ▽ More

    Submitted 25 January, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: T-PAMI 2022 | Video: https://www.youtube.com/watch?v=R1cXg0vpwe4 | Project page: https://www.kmaninis.com/vid2cad/

  40. DeFMO: Deblurring and Shape Recovery of Fast Moving Objects

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Jiri Matas, Marc Pollefeys

    Abstract: Objects moving at high speed appear significantly blurred when captured with cameras. The blurry appearance is especially ambiguous when the object has complex shape or texture. In such cases, classical methods, or even humans, are unable to recover the object's appearance and motion. We propose a method that, given a single image with its estimated background, outputs the object's appearance and… ▽ More

    Submitted 30 March, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: CVPR 2021 camera-ready

    Journal ref: 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  41. arXiv:2007.08173  [pdf, other

    cs.CV

    Efficient Full Image Interactive Segmentation by Leveraging Within-image Appearance Similarity

    Authors: Mykhaylo Andriluka, Stefano Pellegrini, Stefan Popov, Vittorio Ferrari

    Abstract: We propose a new approach to interactive full-image semantic segmentation which enables quickly collecting training data for new datasets with previously unseen semantic classes (A demo is available at https://youtu.be/yUk8D5gEX-o). We leverage a key observation: propagation from labeled to unlabeled pixels does not necessarily require class-specific knowledge, but can be done purely based on appe… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  42. arXiv:2004.12989  [pdf, other

    cs.CV

    CoReNet: Coherent 3D scene reconstruction from a single RGB image

    Authors: Stefan Popov, Pablo Bauszat, Vittorio Ferrari

    Abstract: Advances in deep learning techniques have allowed recent work to reconstruct the shape of a single object given only one RBG image as input. Building on common encoder-decoder architectures for this task, we propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that… ▽ More

    Submitted 5 August, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

    Comments: ECCV 2020, camera ready, oral

  43. arXiv:2004.03898  [pdf, other

    cs.LG cs.CV stat.ML

    Towards Reusable Network Components by Learning Compatible Representations

    Authors: Michael Gygli, Jasper Uijlings, Vittorio Ferrari

    Abstract: This paper proposes to make a first step towards compatible and hence reusable network components. Rather than training networks for different tasks independently, we adapt the training process to produce network components that are compatible across tasks. In particular, we split a network into two components, a features extractor and a target task head, and propose various approaches to accompli… ▽ More

    Submitted 16 December, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Preprint; To be presented at AAAI 2021

  44. arXiv:1912.07009  [pdf, other

    cs.CV

    C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

    Authors: Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, Vittorio Ferrari

    Abstract: Flow-based generative models have highly desirable properties like exact log-likelihood evaluation and exact latent-variable inference, however they are still in their infancy and have not received as much attention as alternative generative models. In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for… ▽ More

    Submitted 3 April, 2020; v1 submitted 15 December, 2019; originally announced December 2019.

  45. arXiv:1912.04591  [pdf, other

    cs.CV cs.GR

    Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool

    Authors: Konstantinos Rematas, Vittorio Ferrari

    Abstract: We present a neural rendering framework that maps a voxelized scene into a high quality image. Highly-textured objects and scene element interactions are realistically rendered by our method, despite having a rough representation as an input. Moreover, our approach allows controllable rendering: geometric and appearance modifications in the input are accurately propagated to the output. The user c… ▽ More

    Submitted 6 April, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Additional results: http://www.krematas.com/nvr/index.html

  46. arXiv:1912.03098  [pdf, other

    cs.CV

    Connecting Vision and Language with Localized Narratives

    Authors: Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari

    Abstract: We propose Localized Narratives, a new form of multimodal image annotations connecting vision and language. We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing. Since the voice and the mouse pointer are synchronized, we can localize every single word in the description. This dense visual grounding takes the form of a… ▽ More

    Submitted 20 July, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

    Comments: ECCV 2020 Camera Ready

  47. arXiv:1912.00384  [pdf, other

    cs.CV

    Training Object Detectors from Few Weakly-Labeled and Many Unlabeled Images

    Authors: Zhaohui Yang, Miaojing Shi, Chao Xu, Vittorio Ferrari, Yannis Avrithis

    Abstract: Weakly-supervised object detection attempts to limit the amount of supervision by dispensing the need for bounding boxes, but still assumes image-level labels on the entire training set. In this work, we study the problem of training an object detector from one or few images with image-level labels and a larger set of completely unlabeled images. This is an extreme case of semi-supervised learning… ▽ More

    Submitted 20 July, 2021; v1 submitted 1 December, 2019; originally announced December 2019.

    Comments: Accepted by Pattern Recognition

  48. arXiv:1911.12709  [pdf, other

    cs.CV

    Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections

    Authors: Theodora Kontogianni, Michael Gygli, Jasper Uijlings, Vittorio Ferrari

    Abstract: In interactive object segmentation a user collaborates with a computer vision model to segment an object. Recent works employ convolutional neural networks for this task: Given an image and a set of corrections made by the user as input, they output a segmentation mask. These approaches achieve strong performance by training on large datasets but they keep the model parameters unchanged at test ti… ▽ More

    Submitted 8 November, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: ECCV 2020 Camera Ready

  49. arXiv:1906.06798  [pdf, other

    cs.CV

    Panoptic Image Annotation with a Collaborative Assistant

    Authors: Jasper R. R. Uijlings, Mykhaylo Andriluka, Vittorio Ferrari

    Abstract: This paper aims to reduce the time to annotate images for panoptic segmentation, which requires annotating segmentation masks and class labels for all object instances and stuff regions. We formulate our approach as a collaborative process between an annotator and an automated assistant who take turns to jointly annotate an image using a predefined pool of segments. Actions performed by the annota… ▽ More

    Submitted 15 December, 2020; v1 submitted 16 June, 2019; originally announced June 2019.

  50. arXiv:1906.01542  [pdf, other

    cs.CV

    Natural Vocabulary Emerges from Free-Form Annotations

    Authors: Jordi Pont-Tuset, Michael Gygli, Vittorio Ferrari

    Abstract: We propose an approach for annotating object classes using free-form text written by undirected and untrained annotators. Free-form labeling is natural for annotators, they intuitively provide very specific and exhaustive labels, and no training stage is necessary. We first collect 729 labels on 15k images using 124 different annotators. Then we automatically enrich the structure of these free-for… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.