Skip to main content

Showing 1–50 of 210 results for author: Sebe, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13372  [pdf, other

    cs.CV

    Any Image Restoration with Efficient Automatic Degradation Adaptation

    Authors: Bin Ren, Eduard Zamfir, Yawei Li, Zongwei Wu, Danda Pani Paudel, Radu Timofte, Nicu Sebe, Luc Van Gool

    Abstract: With the emergence of mobile devices, there is a growing demand for an efficient model to restore any degraded image for better perceptual quality. However, existing models often require specific learning modules tailored for each degradation, resulting in complex architectures and high computation costs. Different from previous work, in this paper, we propose a unified manner to achieve joint emb… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Efficient Any Image Restoration

  2. arXiv:2407.10484  [pdf, other

    cs.CV cs.LG

    Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Gaowen Liu, Nicu Sebe

    Abstract: Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations. GCP typically performs classification of the covariance matrices by applying matrix function normalization, such as matrix logarithm or power, followed by a Euclidean classifier. However, covariance matrices inherently… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 3 figures

  3. arXiv:2407.09826  [pdf, other

    cs.CV

    3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

    Authors: Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang

    Abstract: In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model. Specifically, our method exploits the superior generalization ability of the 2D vision-langu… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  4. arXiv:2407.08374  [pdf, other

    cs.CV

    Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization

    Authors: Jinlong Li, Zequn Jie, Elisa Ricci, Lin Ma, Nicu Sebe

    Abstract: Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering from task overfitting when finetuned on a small data set. In this paper, we introduce an orthogonal finetuning method for efficiently updating pretra… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  5. arXiv:2407.05862  [pdf, other

    cs.CV

    Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

    Authors: Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-ba… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

  6. arXiv:2407.02607  [pdf, other

    math.DG cs.LG math.MG

    Product Geometries on Cholesky Manifolds with Applications to SPD Manifolds

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Nicu Sebe

    Abstract: This paper presents two new metrics on the Symmetric Positive Definite (SPD) manifold via the Cholesky manifold, i.e., the space of lower triangular matrices with positive diagonal elements. We first unveil that the existing popular Riemannian metric on the Cholesky manifold can be generally characterized as the product metric of a Euclidean metric and a Riemannian metric on the space of n-dimensi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figures

    MSC Class: 47A64; 26E60; 53C22; 15B48; 58D17; 53C20; 58B20

  7. arXiv:2407.01375  [pdf, other

    cs.CV

    TransferAttn: Transferable-guided Attention Is All You Need for Video Domain Adaptation

    Authors: André Sacilotti, Samuel Felipe dos Santos, Nicu Sebe, Jurandy Almeida

    Abstract: Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video domain adaptation has still been little explored. Our key idea is to use the transformer layers as a feature encoder and incorporate spati… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2406.06813  [pdf, other

    cs.CV

    Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation

    Authors: Dong Zhao, Shuang Wang, Qi Zang, Licheng Jiao, Nicu Sebe, Zhun Zhong

    Abstract: We study source-free unsupervised domain adaptation (SFUDA) for semantic segmentation, which aims to adapt a source-trained model to the target domain without accessing the source data. Many works have been proposed to address this challenging problem, among which uncertainty-based self-training is a predominant approach. However, without comprehensive denoising mechanisms, they still largely fall… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 2024 Conference on Computer Vision and Pattern Recognition

    Journal ref: (2024 Conference on Computer Vision and Pattern Recognition)

  9. arXiv:2405.20008  [pdf, other

    cs.CV

    Sharing Key Semantics in Transformer Makes Efficient Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Ming-Hsuan Yang, Nicu Sebe

    Abstract: Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the Vision Transformers (ViTs) emergence has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated objec… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 9 pages

  10. arXiv:2405.13637  [pdf, other

    cs.CV cs.AI cs.LG

    Curriculum Direct Preference Optimization for Diffusion and Consistency Models

    Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah

    Abstract: Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employ… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  11. arXiv:2405.07801  [pdf, other

    cs.CV

    Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

    Authors: Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian

    Abstract: Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependen… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures

  12. arXiv:2404.14568  [pdf, other

    cs.CV

    UVMap-ID: A Controllable and Personalized UV Map Generative Model

    Authors: Weijie Wang, Jichao Zhang, Chang Liu, Xia Li, Xingqian Xu, Humphrey Shi, Nicu Sebe, Bruno Lepri

    Abstract: Recently, diffusion models have made significant strides in synthesizing realistic 2D human images based on provided text prompts. Building upon this, researchers have extended 2D text-to-image diffusion models into the 3D domain for generating human textures (UV Maps). However, some important problems about UV Map Generative models are still not solved, i.e., how to generate personalized texture… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  13. arXiv:2404.07990  [pdf, other

    cs.CV cs.AI

    OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

    Authors: Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, Nicu Sebe

    Abstract: Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In th… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Highlight - Code: https://github.com/Picsart-AI-Research/OpenBias

  14. arXiv:2404.07560  [pdf, other

    cs.RO cs.AI

    Socially Pertinent Robots in Gerontological Healthcare

    Authors: Xavier Alameda-Pineda, Angus Addlesee, Daniel Hernández García, Chris Reinke, Soraya Arias, Federica Arrigoni, Alex Auternaud, Lauriane Blavette, Cigdem Beyan, Luis Gomez Camara, Ohad Cohen, Alessandro Conti, Sébastien Dacunha, Christian Dondrup, Yoav Ellinson, Francesco Ferro, Sharon Gannot, Florian Gras, Nancie Gunson, Radu Horaud, Moreno D'Incà, Imad Kimouche, Séverin Lemaignan, Oliver Lemon, Cyril Liotard , et al. (19 additional authors not shown)

    Abstract: Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilitie… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  15. arXiv:2403.11261  [pdf, ps, other

    cs.LG cs.AI cs.MS

    A Lie Group Approach to Riemannian Batch Normalization

    Authors: Ziheng Chen, Yue Song, Yunmei Liu, Nicu Sebe

    Abstract: Manifold-valued measurements exist in numerous applications within computer vision and machine learning. Recent studies have extended Deep Neural Networks (DNNs) to manifolds, and concomitantly, normalization techniques have also been adapted to several manifolds, referred to as Riemannian normalization. Nonetheless, most of the existing Riemannian normalization methods have been derived in an ad… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024

  16. arXiv:2403.07369  [pdf, other

    cs.CV

    Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery

    Authors: Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, Zhun Zhong

    Abstract: In this paper, we study the problem of Generalized Category Discovery (GCD), which aims to cluster unlabeled data from both known and unknown categories using the knowledge of labeled data from known categories. Current GCD methods rely on only visual cues, which however neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories. To address thi… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  17. arXiv:2403.07028  [pdf, other

    cs.LG cs.AI math.OC

    An Efficient Learning-based Solver Comparable to Metaheuristics for the Capacitated Arc Routing Problem

    Authors: Runze Guo, Feng Xue, Anlong Ming, Nicu Sebe

    Abstract: Recently, neural networks (NN) have made great strides in combinatorial optimization. However, they face challenges when solving the capacitated arc routing problem (CARP) which is to find the minimum-cost tour covering all required edges on a graph, while within capacity constraints. In tackling CARP, NN-based approaches tend to lag behind advanced metaheuristics, since they lack directed arc mod… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  18. arXiv:2402.02634  [pdf, other

    cs.CV cs.LG eess.IV

    Key-Graph Transformer for Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 6 figures

  19. arXiv:2402.02339  [pdf, other

    cs.CV cs.AI cs.LG

    Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

    Authors: Ti Wang, Mengyuan Liu, Hong Liu, Bin Ren, Yingxuan You, Wenhao Li, Nicu Sebe, Xia Li

    Abstract: Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, whi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  20. arXiv:2401.13837  [pdf, other

    cs.CV

    Democratizing Fine-grained Visual Recognition with Large Language Models

    Authors: Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci

    Abstract: Identifying subordinate-level categories from images is a longstanding task in computer vision and is referred to as fine-grained visual recognition (FGVR). It has tremendous significance in real-world applications since an average layperson does not excel at differentiating species of birds or mushrooms due to subtle differences among the species. A major bottleneck in developing FGVR systems is… ▽ More

    Submitted 10 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted as a conference paper at ICLR 2024; Project page: https://projfiner.github.io/

  21. arXiv:2401.07721  [pdf, other

    cs.CV

    Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

    Authors: Hao Tang, Ling Shao, Nicu Sebe, Luc Van Gool

    Abstract: We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for challenging graph-constrained architectural layout generation tasks. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and gl… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted to TPAMI, an extended version of a paper published in CVPR2023. arXiv admin note: substantial text overlap with arXiv:2303.08225

  22. arXiv:2401.03407  [pdf, other

    cs.CV

    Bilateral Reference for High-Resolution Dichotomous Image Segmentation

    Authors: Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe

    Abstract: We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction proce… ▽ More

    Submitted 25 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Version 5, with updated DIS performance, accuracy-efficiency comparison, and 3rd-party applications

  23. arXiv:2401.02473  [pdf, other

    cs.CV

    VASE: Object-Centric Appearance and Shape Manipulation of Real Videos

    Authors: Elia Peruzzo, Vidit Goel, Dejia Xu, Xingqian Xu, Yifan Jiang, Zhangyang Wang, Humphrey Shi, Nicu Sebe

    Abstract: Recently, several works tackled the video editing task fostered by the success of large-scale text-to-image generative models. However, most of these methods holistically edit the frame using the text, exploiting the prior given by foundation diffusion models and focusing on improving the temporal consistency across frames. In this work, we introduce a framework that is object-centric and is desig… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Project Page https://helia95.github.io/vase-website/

  24. arXiv:2312.06331  [pdf, other

    cs.CV

    Semantic Connectivity-Driven Pseudo-labeling for Cross-domain Segmentation

    Authors: Dong Zhao, Ruizhi Yang, Shuang Wang, Qi Zang, Yang Hu, Licheng Jiao, Nicu Sebe, Zhun Zhong

    Abstract: Presently, self-training stands as a prevailing approach in cross-domain semantic segmentation, enhancing model efficacy by training with pixels assigned with reliable pseudo-labels. However, we find two critical limitations in this paradigm. (1) The majority of reliable pixels exhibit a speckle-shaped pattern and are primarily located in the central semantic region. This presents challenges for t… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  25. arXiv:2312.03046  [pdf, other

    cs.CV

    Diversified in-domain synthesis with efficient fine-tuning for few-shot classification

    Authors: Victor G. Turrisi da Costa, Nicola Dall'Asen, Yiming Wang, Nicu Sebe, Elisa Ricci

    Abstract: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. A recent research direction for improving few-shot classifiers involves augmenting the labelled samples with synthetic images created by state-of-the-art text-to-image generation models. Following this trend, we propose Diversified In-domain Synthesis with Efficient Fine-tuning (DI… ▽ More

    Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 14 pages, 6 figures, 8 tables

  26. arXiv:2312.03032  [pdf, other

    cs.CV

    Zero-Shot Point Cloud Registration

    Authors: Weijie Wang, Guofeng Mei, Bin Ren, Xiaoshui Huang, Fabio Poiesi, Luc Van Gool, Nicu Sebe, Bruno Lepri

    Abstract: Learning-based point cloud registration approaches have significantly outperformed their traditional counterparts. However, they typically require extensive training on specific datasets. In this paper, we propose , the first zero-shot point cloud registration approach that eliminates the need for training on point cloud datasets. The cornerstone of ZeroReg is the novel transfer of image features… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  27. arXiv:2311.13959  [pdf, other

    cs.LG cs.CV

    RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection

    Authors: Yue Song, Nicu Sebe, Wei Wang

    Abstract: The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely deter… ▽ More

    Submitted 27 November, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: submitted to T-PAMI. arXiv admin note: substantial text overlap with arXiv:2209.08590

  28. arXiv:2311.12028  [pdf, other

    cs.CV cs.AI cs.LG

    Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

    Authors: Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, Nicu Sebe

    Abstract: Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D human pose estimation fr… ▽ More

    Submitted 27 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024, Open Sourced

  29. arXiv:2311.01573  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Improving Fairness using Vision-Language Driven Image Augmentation

    Authors: Moreno D'Incà, Christos Tzelepis, Ioannis Patras, Nicu Sebe

    Abstract: Fairness is crucial when training a deep-learning discriminative model, especially in the facial domain. Models tend to correlate specific characteristics (such as age and skin color) with unrelated attributes (downstream tasks), resulting in biases which do not correspond to reality. It is common knowledge that these correlations are present in the data and are then transferred to the models duri… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in WACV 2024

  30. arXiv:2309.13167  [pdf, other

    cs.LG cs.CV

    Flow Factorized Representation Learning

    Authors: Yue Song, T. Anderson Keller, Nicu Sebe, Max Welling

    Abstract: A prominent goal of representation learning research is to achieve representations which are factorized in a useful manner with respect to the ground truth factors of variation. The fields of disentangled and equivariant representation learning have approached this ideal from a range of complimentary perspectives; however, to date, most approaches have proven to either be ill-specified or insuffic… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: NeurIPS23

  31. arXiv:2309.11464  [pdf, other

    cs.CV

    Budget-Aware Pruning: Handling Multiple Domains with Less Parameters

    Authors: Samuel Felipe dos Santos, Rodrigo Berriel, Thiago Oliveira-Santos, Nicu Sebe, Jurandy Almeida

    Abstract: Deep learning has achieved state-of-the-art performance on several computer vision tasks and domains. Nevertheless, it still has a high computational cost and demands a significant amount of parameters. Such requirements hinder the use in resource-limited environments and demand both software and hardware optimization. Another limitation is that deep models are usually specialized into a single do… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.08101

  32. arXiv:2309.11417   

    cs.CV

    CNNs for JPEGs: A Study in Computational Cost

    Authors: Samuel Felipe dos Santos, Nicu Sebe, Jurandy Almeida

    Abstract: Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purpose… ▽ More

    Submitted 22 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: A previous version of this work had already been submitted to ArXiv and is available at arXiv:2012.14426. Instead of maintaining two different submissions, we decided to submit a replacement for the previous submission

  33. arXiv:2309.08964  [pdf, other

    cs.CV

    Tightening Classification Boundaries in Open Set Domain Adaptation through Unknown Exploitation

    Authors: Lucas Fernando Alvarenga e Silva, Nicu Sebe, Jurandy Almeida

    Abstract: Convolutional Neural Networks (CNNs) have brought revolutionary advances to many research areas due to their capacity of learning from raw data. However, when those methods are applied to non-controllable environments, many different factors can degrade the model's expected performance, such as unlabeled datasets with different levels of domain shift and category shift. Particularly, when both iss… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Journal ref: 36th SIBGRAPI - Conference on Graphics, Patterns, and Images (SIBGRAPI'23), 2023, pp. 1-6

  34. arXiv:2309.01104  [pdf, other

    cs.CV cs.CR cs.LG cs.MM

    Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection

    Authors: Weijie Wang, Zhengyu Zhao, Nicu Sebe, Bruno Lepri

    Abstract: Malicious use of deepfakes leads to serious public concerns and reduces people's trust in digital media. Although effective deepfake detectors have been proposed, they are substantially vulnerable to adversarial attacks. To evaluate the detector's robustness, recent studies have explored various attacks. However, all existing attacks are limited to 2D image perturbations, which are hard to transla… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  35. arXiv:2308.14619  [pdf, other

    cs.CV

    Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation

    Authors: Cristiano Saltori, Fabio Galasso, Giuseppe Fiameni, Nicu Sebe, Fabio Poiesi, Elisa Ricci

    Abstract: Deep-learning models for 3D point cloud semantic segmentation exhibit limited generalization capabilities when trained and tested on data captured with different sensors or in varying environments due to domain shift. Domain adaptation methods can be employed to mitigate this domain shift, for instance, by simulating sensor noise, developing domain-agnostic generators, or training point cloud comp… ▽ More

    Submitted 29 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: TPAMI. arXiv admin note: text overlap with arXiv:2207.09778

  36. Interactive Neural Painting

    Authors: Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

    Abstract: In the last few years, Neural Painting (NP) techniques became capable of producing extremely realistic artworks. This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP. Considering a setting where a user looks at a scene and tries to reproduce it on a painting, our objective is to develop a computational framework to assist the… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: This is a preprint version of the paper to appear at Computer Vision and Image Understanding (CVIU). The final journal version will be available at https://www.sciencedirect.com/science/article/pii/S1077314223001583

    Journal ref: 10.1016/j.cviu.2023.103778

  37. arXiv:2307.12084  [pdf, other

    cs.CV

    Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis

    Authors: Hao Tang, Guolei Sun, Nicu Sebe, Luc Van Gool

    Abstract: We propose a novel ECGAN for the challenging semantic image synthesis task. Although considerable improvements have been achieved by the community in the recent period, the quality of synthesized images is far from satisfactory due to three largely unresolved challenges. 1) The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and str… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted to TPAMI, an extended version of a paper published in ICLR2023. arXiv admin note: substantial text overlap with arXiv:2003.13898

  38. arXiv:2307.09416  [pdf, other

    cs.CV cs.CL

    Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

    Authors: Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe

    Abstract: Research in Image Generation has recently made significant progress, particularly boosted by the introduction of Vision-Language models which are able to produce high-quality visual content based on textual inputs. Despite ongoing advancements in terms of generation quality and realism, no methodical frameworks have been defined yet to quantitatively measure the quality of the generated content an… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted as oral at ACM MultiMedia 2023 (Brave New Ideas track)

  39. arXiv:2307.08012  [pdf, other

    cs.CV

    Householder Projector for Unsupervised Latent Semantics Discovery

    Authors: Yue Song, Jichao Zhang, Nicu Sebe, Wei Wang

    Abstract: Generative Adversarial Networks (GANs), especially the recent style-based generators (StyleGANs), have versatile semantics in the structured latent space. Latent semantics discovery methods emerge to move around the latent code such that only one factor varies during the traversal. Recently, an unsupervised method proposed a promising direction to directly use the eigenvectors of the projection ma… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: ICCV23

  40. arXiv:2305.15753  [pdf, other

    cs.CV

    T2TD: Text-3D Generation Model based on Prior Knowledge Guidance

    Authors: Weizhi Nie, Ruidong Chen, Weijie Wang, Bruno Lepri, Nicu Sebe

    Abstract: In recent years, 3D models have been utilized in many applications, such as auto-driver, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the ability of human beings to complement… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  41. arXiv:2305.14107  [pdf, other

    cs.CV

    Federated Generalized Category Discovery

    Authors: Nan Pu, Zhun Zhong, Xinyuan Ji, Nicu Sebe

    Abstract: Generalized category discovery (GCD) aims at grouping unlabeled samples from known and unknown classes, given labeled data of known classes. To meet the recent decentralization trend in the community, we introduce a practical yet challenging task, namely Federated GCD (Fed-GCD), where the training data are distributively stored in local clients and cannot be shared among clients. The goal of Fed-G… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 17 pages, 3 figures

  42. arXiv:2305.11288  [pdf, other

    cs.LG

    Riemannian Multinomial Logistics Regression for SPD Neural Networks

    Authors: Ziheng Chen, Yue Song, Gaowen Liu, Ramana Rao Kompella, Xiaojun Wu, Nicu Sebe

    Abstract: Deep neural networks for learning Symmetric Positive Definite (SPD) matrices are gaining increasing attention in machine learning. Despite the significant progress, most existing SPD networks use traditional Euclidean classifiers on an approximated space rather than intrinsic classifiers that accurately capture the geometry of SPD manifolds. Inspired by Hyperbolic Neural Networks (HNNs), we propos… ▽ More

    Submitted 20 March, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to CVPR 2024

  43. arXiv:2304.12944  [pdf, other

    cs.LG cs.CV

    Latent Traversals in Generative Models as Potential Flows

    Authors: Yue Song, T. Anderson Keller, Nicu Sebe, Max Welling

    Abstract: Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in… ▽ More

    Submitted 1 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: ICML 2023

  44. arXiv:2303.17546  [pdf, other

    cs.CV cs.AI cs.LG

    PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

    Authors: Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi

    Abstract: Generative image editing has recently witnessed extremely fast-paced growth. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we tackle the task by perceiving the images as an amalgamation… ▽ More

    Submitted 8 April, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR 2024, Project page https://vidit98.github.io/publication/conference-paper/pair_diff.html

  45. arXiv:2303.17393  [pdf, ps, other

    cs.CV

    Dynamic Conceptional Contrastive Learning for Generalized Category Discovery

    Authors: Nan Pu, Zhun Zhong, Nicu Sebe

    Abstract: Generalized category discovery (GCD) is a recently proposed open-world problem, which aims to automatically cluster partially labeled data. The main challenge is that the unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories. This leads traditional novel category discovery (NCD) methods to be incapacitated for GCD, due to their… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 10 pages, 5 figures, accepted by CVPR2023

  46. arXiv:2303.15975  [pdf, other

    cs.CV cs.LG

    Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery

    Authors: Mingxuan Liu, Subhankar Roy, Zhun Zhong, Nicu Sebe, Elisa Ricci

    Abstract: Discovering novel concepts in unlabelled datasets and in a continuous manner is an important desideratum of lifelong learners. In the literature such problems have been partially addressed under very restricted settings, where novel classes are learned by jointly accessing a related labelled set (e.g., NCD) or by leveraging only a supervisedly pre-trained model (e.g., class-iNCD). In this work we… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted as a conference paper to ICPR 2024

  47. arXiv:2303.15477  [pdf, other

    cs.LG

    Adaptive Riemannian Metrics on SPD Manifolds

    Authors: Ziheng Chen, Yue Song, Tianyang Xu, Zhiwu Huang, Xiao-Jun Wu, Nicu Sebe

    Abstract: Symmetric Positive Definite (SPD) matrices have received wide attention in machine learning due to their intrinsic capacity of encoding underlying structural correlation in data. To reflect the non-Euclidean geometry of SPD manifolds, many successful Riemannian metrics have been proposed. However, existing fixed metric tensors might lead to sub-optimal performance for SPD matrices learning, especi… ▽ More

    Submitted 18 May, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

  48. arXiv:2303.11296  [pdf, other

    cs.CV

    Attribute-preserving Face Dataset Anonymization via Latent Code Optimization

    Authors: Simone Barattin, Christos Tzelepis, Ioannis Patras, Nicu Sebe

    Abstract: This work addresses the problem of anonymizing the identity of faces in a dataset of images, such that the privacy of those depicted is not violated, while at the same time the dataset is useful for downstream task such as for training machine learning models. To the best of our knowledge, we are the first to explicitly address this issue and deal with two major drawbacks of the existing state-of-… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted for publication in CVPR 2023

  49. arXiv:2303.09270  [pdf, other

    cs.CV

    SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

    Authors: Zipeng Xu, Songlong Xing, Enver Sangineto, Nicu Sebe

    Abstract: Owing to the power of vision-language foundation models, e.g., CLIP, the area of image synthesis has seen recent important advances. Particularly, for style transfer, CLIP enables transferring more general and abstract styles without collecting the style images in advance, as the style can be efficiently described with natural language, and the result is optimized by minimizing the CLIP similarity… ▽ More

    Submitted 2 November, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: WACV 2024

  50. arXiv:2303.09268  [pdf, other

    cs.CV

    StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model

    Authors: Zipeng Xu, Enver Sangineto, Nicu Sebe

    Abstract: Despite the progress made in the style transfer task, most previous work focus on transferring only relatively simple features like color or texture, while missing more abstract concepts such as overall art expression or painter-specific traits. However, these abstract semantics can be captured by models like DALL-E or CLIP, which have been trained using huge datasets of images and textual documen… ▽ More

    Submitted 9 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: ICCV 2023