Zum Hauptinhalt springen

Showing 1–50 of 227 results for author: Sebe, N

.
  1. arXiv:2408.16700  [pdf, other

    cs.CV

    GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models

    Authors: Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Xingqian Xu, Humphrey Shi, Nicu Sebe

    Abstract: Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a priority to prevent the dissemination and perpetuation of biases. However, existing studies in bias detection focus on closed sets of predefined biases (… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Under review. Code: https://github.com/Moreno98/GradBias

  2. arXiv:2408.14600  [pdf, other

    cs.CV

    PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

    Authors: Yidi Li, Jiahao Wen, Bin Ren, Wenhao Li, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features within regions of interest can lead to information loss and limitations in local feature representation. To tackle these challenges, we propose a novel two… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 3D Object Detection

  3. arXiv:2408.14585  [pdf, other

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Audio-Visual Speaker Tracking with Incomplete Modalities

  4. arXiv:2408.10906  [pdf, other

    cs.CV

    ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

    Authors: Qi Ma, Yue Li, Bin Ren, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Danda Pani Paudel

    Abstract: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, w… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.10703  [pdf, other

    cs.CV

    Large Language Models for Multimodal Deformable Image Registration

    Authors: Mingrui Ma, Weijie Wang, Jie Ning, Jianfeng He, Nicu Sebe, Bruno Lepri

    Abstract: The challenge of Multimodal Deformable Image Registration (MDIR) lies in the conversion and alignment of features between images of different modalities. Generative models (GMs) cannot retain the necessary information enough from the source modality to the target one, while non-GMs struggle to align features across these two modalities. In this paper, we propose a novel coarse-to-fine MDIR framewo… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.08093  [pdf, other

    cs.CV cs.MM

    When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding

    Authors: Pingping Zhang, Jinlong Li, Meng Wang, Nicu Sebe, Sam Kwong, Shiqi Wang

    Abstract: Existing codecs are designed to eliminate intrinsic redundancies to create a compact representation for compression. However, strong external priors from Multimodal Large Language Models (MLLMs) have not been explicitly explored in video compression. Herein, we introduce a unified paradigm for Cross-Modality Video Coding (CMVC), which is a pioneering approach to explore multimodality representatio… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  7. arXiv:2408.06687  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Image Modeling: A Survey

    Authors: Vlad Hondru, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, Nicu Sebe

    Abstract: In this work, we survey recent studies on masked image modeling (MIM), an approach that emerged as a powerful self-supervised learning technique in computer vision. The MIM task involves masking some information, e.g. pixels, patches, or even latent representations, and training a model, usually an autoencoder, to predicting the missing information by using the context available in the visible par… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  8. Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

    Authors: Xuri Ge, Junchen Fu, Fuhai Chen, Shan An, Nicu Sebe, Joemon M. Jose

    Abstract: Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notable limitation, i.e., focusing only on the accuracy of AU recognition and overlooking explanations of corresponding AU states. In this paper, we propos… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, 4 tables

    Journal ref: ACM Multimedia 2024

  9. arXiv:2407.20175  [pdf, other

    cs.CV

    Towards Localized Fine-Grained Control for Facial Expression Generation

    Authors: Tuomas Varanka, Huai-Qian Khor, Yante Li, Mengting Wei, Hanwei Kung, Nicu Sebe, Guoying Zhao

    Abstract: Generative models have surged in popularity recently due to their ability to produce high-quality images and video. However, steering these models to produce images with specific attributes and precise control remains challenging. Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  10. arXiv:2407.13372  [pdf, other

    cs.CV

    Any Image Restoration with Efficient Automatic Degradation Adaptation

    Authors: Bin Ren, Eduard Zamfir, Yawei Li, Zongwei Wu, Danda Pani Paudel, Radu Timofte, Nicu Sebe, Luc Van Gool

    Abstract: With the emergence of mobile devices, there is a growing demand for an efficient model to restore any degraded image for better perceptual quality. However, existing models often require specific learning modules tailored for each degradation, resulting in complex architectures and high computation costs. Different from previous work, in this paper, we propose a unified manner to achieve joint emb… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Efficient Any Image Restoration

  11. arXiv:2407.10484  [pdf, other

    cs.CV cs.LG

    Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Gaowen Liu, Nicu Sebe

    Abstract: Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations. GCP typically performs classification of the covariance matrices by applying matrix function normalization, such as matrix logarithm or power, followed by a Euclidean classifier. However, covariance matrices inherently… ▽ More

    Submitted 20 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 3 figures

  12. arXiv:2407.09826  [pdf, other

    cs.CV

    3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

    Authors: Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang

    Abstract: In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model. Specifically, our method exploits the superior generalization ability of the 2D vision-langu… ▽ More

    Submitted 29 August, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  13. arXiv:2407.08374  [pdf, other

    cs.CV

    Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization

    Authors: Jinlong Li, Zequn Jie, Elisa Ricci, Lin Ma, Nicu Sebe

    Abstract: Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering from task overfitting when finetuned on a small data set. In this paper, we introduce an orthogonal finetuning method for efficiently updating pretra… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  14. arXiv:2407.05862  [pdf, other

    cs.CV

    Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

    Authors: Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-ba… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

  15. arXiv:2407.02607  [pdf, other

    math.DG cs.LG math.MG

    Product Geometries on Cholesky Manifolds with Applications to SPD Manifolds

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Nicu Sebe

    Abstract: This paper presents two new metrics on the Symmetric Positive Definite (SPD) manifold via the Cholesky manifold, i.e., the space of lower triangular matrices with positive diagonal elements. We first unveil that the existing popular Riemannian metric on the Cholesky manifold can be generally characterized as the product metric of a Euclidean metric and a Riemannian metric on the space of n-dimensi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figures

    MSC Class: 47A64; 26E60; 53C22; 15B48; 58D17; 53C20; 58B20

  16. arXiv:2407.01375  [pdf, other

    cs.CV

    TransferAttn: Transferable-guided Attention Is All You Need for Video Domain Adaptation

    Authors: André Sacilotti, Samuel Felipe dos Santos, Nicu Sebe, Jurandy Almeida

    Abstract: Unsupervised domain adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Although vision transformers (ViT) achieve state-of-the-art performance in many computer vision tasks, their use in video domain adaptation has still been little explored. Our key idea is to use the transformer layers as a feature encoder and incorporate spati… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  17. arXiv:2406.06813  [pdf, other

    cs.CV

    Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation

    Authors: Dong Zhao, Shuang Wang, Qi Zang, Licheng Jiao, Nicu Sebe, Zhun Zhong

    Abstract: We study source-free unsupervised domain adaptation (SFUDA) for semantic segmentation, which aims to adapt a source-trained model to the target domain without accessing the source data. Many works have been proposed to address this challenging problem, among which uncertainty-based self-training is a predominant approach. However, without comprehensive denoising mechanisms, they still largely fall… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 2024 Conference on Computer Vision and Pattern Recognition

    Journal ref: (2024 Conference on Computer Vision and Pattern Recognition)

  18. arXiv:2405.20008  [pdf, other

    cs.CV

    Sharing Key Semantics in Transformer Makes Efficient Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Ming-Hsuan Yang, Nicu Sebe

    Abstract: Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the Vision Transformers (ViTs) emergence has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated objec… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 9 pages

  19. arXiv:2405.13637  [pdf, other

    cs.CV cs.AI cs.LG

    Curriculum Direct Preference Optimization for Diffusion and Consistency Models

    Authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah

    Abstract: Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employ… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  20. arXiv:2405.07801  [pdf, other

    cs.CV

    Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

    Authors: Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian

    Abstract: Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependen… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures

  21. arXiv:2404.14568  [pdf, other

    cs.CV

    UVMap-ID: A Controllable and Personalized UV Map Generative Model

    Authors: Weijie Wang, Jichao Zhang, Chang Liu, Xia Li, Xingqian Xu, Humphrey Shi, Nicu Sebe, Bruno Lepri

    Abstract: Recently, diffusion models have made significant strides in synthesizing realistic 2D human images based on provided text prompts. Building upon this, researchers have extended 2D text-to-image diffusion models into the 3D domain for generating human textures (UV Maps). However, some important problems about UV Map Generative models are still not solved, i.e., how to generate personalized texture… ▽ More

    Submitted 9 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to ACMMM2024

  22. arXiv:2404.07990  [pdf, other

    cs.CV cs.AI

    OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

    Authors: Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, Nicu Sebe

    Abstract: Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In th… ▽ More

    Submitted 5 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Highlight - Code: https://github.com/Picsart-AI-Research/OpenBias

  23. arXiv:2404.07560  [pdf, other

    cs.RO cs.AI

    Socially Pertinent Robots in Gerontological Healthcare

    Authors: Xavier Alameda-Pineda, Angus Addlesee, Daniel Hernández García, Chris Reinke, Soraya Arias, Federica Arrigoni, Alex Auternaud, Lauriane Blavette, Cigdem Beyan, Luis Gomez Camara, Ohad Cohen, Alessandro Conti, Sébastien Dacunha, Christian Dondrup, Yoav Ellinson, Francesco Ferro, Sharon Gannot, Florian Gras, Nancie Gunson, Radu Horaud, Moreno D'Incà, Imad Kimouche, Séverin Lemaignan, Oliver Lemon, Cyril Liotard , et al. (19 additional authors not shown)

    Abstract: Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilitie… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  24. arXiv:2403.11261  [pdf, ps, other

    cs.LG cs.AI cs.MS

    A Lie Group Approach to Riemannian Batch Normalization

    Authors: Ziheng Chen, Yue Song, Yunmei Liu, Nicu Sebe

    Abstract: Manifold-valued measurements exist in numerous applications within computer vision and machine learning. Recent studies have extended Deep Neural Networks (DNNs) to manifolds, and concomitantly, normalization techniques have also been adapted to several manifolds, referred to as Riemannian normalization. Nonetheless, most of the existing Riemannian normalization methods have been derived in an ad… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024

  25. arXiv:2403.08556  [pdf, other

    cs.CV cs.AI

    SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

    Authors: Yihao Liu, Feng Xue, Anlong Ming, Mingshuai Zhao, Huadong Ma, Nicu Sebe

    Abstract: In the last year, universal monocular metric depth estimation (universal MMDE) has gained considerable attention, serving as the foundation model for various multimedia tasks, such as video and image editing. Nonetheless, current approaches face challenges in maintaining consistent accuracy across diverse scenes without scene-specific parameters and pre-training, hindering the practicality of MMDE… ▽ More

    Submitted 14 August, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM MultiMedia 24, Project Page: xuefeng-cvr.github.io/SM4Depth

  26. arXiv:2403.07369  [pdf, other

    cs.CV

    Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery

    Authors: Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, Zhun Zhong

    Abstract: In this paper, we study the problem of Generalized Category Discovery (GCD), which aims to cluster unlabeled data from both known and unknown categories using the knowledge of labeled data from known categories. Current GCD methods rely on only visual cues, which however neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories. To address thi… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  27. arXiv:2403.07028  [pdf, other

    cs.LG cs.AI math.OC

    An Efficient Learning-based Solver Comparable to Metaheuristics for the Capacitated Arc Routing Problem

    Authors: Runze Guo, Feng Xue, Anlong Ming, Nicu Sebe

    Abstract: Recently, neural networks (NN) have made great strides in combinatorial optimization. However, they face challenges when solving the capacitated arc routing problem (CARP) which is to find the minimum-cost tour covering all required edges on a graph, while within capacity constraints. In tackling CARP, NN-based approaches tend to lag behind advanced metaheuristics, since they lack directed arc mod… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  28. arXiv:2402.02634  [pdf, other

    cs.CV cs.LG eess.IV

    Key-Graph Transformer for Image Restoration

    Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 6 figures

  29. arXiv:2402.02339  [pdf, other

    cs.CV cs.AI cs.LG

    Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

    Authors: Ti Wang, Mengyuan Liu, Hong Liu, Bin Ren, Yingxuan You, Wenhao Li, Nicu Sebe, Xia Li

    Abstract: Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, whi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  30. arXiv:2402.01288  [pdf, other

    math.OC

    Induced Norm Analysis of Linear Systems for Nonnegative Input Signals

    Authors: Yoshio Ebihara, Noboru Sebe, Hayato Waki, Dimitri Peaucelle, Sophie Tarbouriech, Victor Magron, Tomomichi Hagiwara

    Abstract: This paper is concerned with the analysis of the $L_p\ (p\in[1,\infty), p=\infty)$ induced norms of continuous-time linear systems where input signals are restricted to be nonnegative. This norm is referred to as the $L_{p+}$ induced norm in this paper. It has been shown recently that the $L_{2+}$ induced norm is effective for the stability analysis of nonlinear feedback systems where the nonlinea… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 12 pages, 3 figures. A preliminary version of this paper was presented at ECC 2022 (arXiv:2401.03242) and IFAC WC 2023

  31. arXiv:2401.13837  [pdf, other

    cs.CV

    Democratizing Fine-grained Visual Recognition with Large Language Models

    Authors: Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci

    Abstract: Identifying subordinate-level categories from images is a longstanding task in computer vision and is referred to as fine-grained visual recognition (FGVR). It has tremendous significance in real-world applications since an average layperson does not excel at differentiating species of birds or mushrooms due to subtle differences among the species. A major bottleneck in developing FGVR systems is… ▽ More

    Submitted 10 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted as a conference paper at ICLR 2024; Project page: https://projfiner.github.io/

  32. arXiv:2401.07721  [pdf, other

    cs.CV

    Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

    Authors: Hao Tang, Ling Shao, Nicu Sebe, Luc Van Gool

    Abstract: We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for challenging graph-constrained architectural layout generation tasks. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and gl… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted to TPAMI, an extended version of a paper published in CVPR2023. arXiv admin note: substantial text overlap with arXiv:2303.08225

  33. arXiv:2401.03407  [pdf, other

    cs.CV

    Bilateral Reference for High-Resolution Dichotomous Image Segmentation

    Authors: Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe

    Abstract: We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction proce… ▽ More

    Submitted 24 July, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Version 6, the final version of the journal with a fixed institute

  34. arXiv:2401.03242  [pdf, other

    math.OC

    $L_{2+}$ Induced Norm Analysis of Continuous-Time LTI Systems Using Positive Filters and Copositive Programming

    Authors: Yoshio Ebihara, Hayato Waki, Noboru Sebe, Victor Magron, Dimitri Peaucelle, Sophie Tarbouriech

    Abstract: This paper is concerned with the analysis of the $L_{2}$ induced norm of continuous-time LTI systems where the input signals are restricted to be nonnegative. This induced norm is referred to as the $L_{2+}$ induced norm in this paper. It has been shown very recently that the $L_{2+}$ induced norm is particularly useful for the stability analysis of nonlinear feedback systems constructed from line… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: 9 pages, 3 figures, Proceedings of the European Control Conference (ECC) 2022

  35. arXiv:2401.02473  [pdf, other

    cs.CV

    VASE: Object-Centric Appearance and Shape Manipulation of Real Videos

    Authors: Elia Peruzzo, Vidit Goel, Dejia Xu, Xingqian Xu, Yifan Jiang, Zhangyang Wang, Humphrey Shi, Nicu Sebe

    Abstract: Recently, several works tackled the video editing task fostered by the success of large-scale text-to-image generative models. However, most of these methods holistically edit the frame using the text, exploiting the prior given by foundation diffusion models and focusing on improving the temporal consistency across frames. In this work, we introduce a framework that is object-centric and is desig… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Project Page https://helia95.github.io/vase-website/

  36. arXiv:2312.06331  [pdf, other

    cs.CV

    Semantic Connectivity-Driven Pseudo-labeling for Cross-domain Segmentation

    Authors: Dong Zhao, Ruizhi Yang, Shuang Wang, Qi Zang, Yang Hu, Licheng Jiao, Nicu Sebe, Zhun Zhong

    Abstract: Presently, self-training stands as a prevailing approach in cross-domain semantic segmentation, enhancing model efficacy by training with pixels assigned with reliable pseudo-labels. However, we find two critical limitations in this paradigm. (1) The majority of reliable pixels exhibit a speckle-shaped pattern and are primarily located in the central semantic region. This presents challenges for t… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  37. arXiv:2312.03046  [pdf, other

    cs.CV

    Diversified in-domain synthesis with efficient fine-tuning for few-shot classification

    Authors: Victor G. Turrisi da Costa, Nicola Dall'Asen, Yiming Wang, Nicu Sebe, Elisa Ricci

    Abstract: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. A recent research direction for improving few-shot classifiers involves augmenting the labelled samples with synthetic images created by state-of-the-art text-to-image generation models. Following this trend, we propose Diversified In-domain Synthesis with Efficient Fine-tuning (DI… ▽ More

    Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 14 pages, 6 figures, 8 tables

  38. arXiv:2312.03032  [pdf, other

    cs.CV

    Zero-Shot Point Cloud Registration

    Authors: Weijie Wang, Guofeng Mei, Bin Ren, Xiaoshui Huang, Fabio Poiesi, Luc Van Gool, Nicu Sebe, Bruno Lepri

    Abstract: Learning-based point cloud registration approaches have significantly outperformed their traditional counterparts. However, they typically require extensive training on specific datasets. In this paper, we propose , the first zero-shot point cloud registration approach that eliminates the need for training on point cloud datasets. The cornerstone of ZeroReg is the novel transfer of image features… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  39. arXiv:2311.13959  [pdf, other

    cs.LG cs.CV

    RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection

    Authors: Yue Song, Nicu Sebe, Wei Wang

    Abstract: The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely deter… ▽ More

    Submitted 27 November, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: submitted to T-PAMI. arXiv admin note: substantial text overlap with arXiv:2209.08590

  40. arXiv:2311.12028  [pdf, other

    cs.CV cs.AI cs.LG

    Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

    Authors: Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, Nicu Sebe

    Abstract: Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D human pose estimation fr… ▽ More

    Submitted 27 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024, Open Sourced

  41. arXiv:2311.01573  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Improving Fairness using Vision-Language Driven Image Augmentation

    Authors: Moreno D'Incà, Christos Tzelepis, Ioannis Patras, Nicu Sebe

    Abstract: Fairness is crucial when training a deep-learning discriminative model, especially in the facial domain. Models tend to correlate specific characteristics (such as age and skin color) with unrelated attributes (downstream tasks), resulting in biases which do not correspond to reality. It is common knowledge that these correlations are present in the data and are then transferred to the models duri… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in WACV 2024

  42. arXiv:2309.13167  [pdf, other

    cs.LG cs.CV

    Flow Factorized Representation Learning

    Authors: Yue Song, T. Anderson Keller, Nicu Sebe, Max Welling

    Abstract: A prominent goal of representation learning research is to achieve representations which are factorized in a useful manner with respect to the ground truth factors of variation. The fields of disentangled and equivariant representation learning have approached this ideal from a range of complimentary perspectives; however, to date, most approaches have proven to either be ill-specified or insuffic… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: NeurIPS23

  43. arXiv:2309.11464  [pdf, other

    cs.CV

    Budget-Aware Pruning: Handling Multiple Domains with Less Parameters

    Authors: Samuel Felipe dos Santos, Rodrigo Berriel, Thiago Oliveira-Santos, Nicu Sebe, Jurandy Almeida

    Abstract: Deep learning has achieved state-of-the-art performance on several computer vision tasks and domains. Nevertheless, it still has a high computational cost and demands a significant amount of parameters. Such requirements hinder the use in resource-limited environments and demand both software and hardware optimization. Another limitation is that deep models are usually specialized into a single do… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.08101

  44. arXiv:2309.11417   

    cs.CV

    CNNs for JPEGs: A Study in Computational Cost

    Authors: Samuel Felipe dos Santos, Nicu Sebe, Jurandy Almeida

    Abstract: Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purpose… ▽ More

    Submitted 22 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: A previous version of this work had already been submitted to ArXiv and is available at arXiv:2012.14426. Instead of maintaining two different submissions, we decided to submit a replacement for the previous submission

  45. arXiv:2309.08964  [pdf, other

    cs.CV

    Tightening Classification Boundaries in Open Set Domain Adaptation through Unknown Exploitation

    Authors: Lucas Fernando Alvarenga e Silva, Nicu Sebe, Jurandy Almeida

    Abstract: Convolutional Neural Networks (CNNs) have brought revolutionary advances to many research areas due to their capacity of learning from raw data. However, when those methods are applied to non-controllable environments, many different factors can degrade the model's expected performance, such as unlabeled datasets with different levels of domain shift and category shift. Particularly, when both iss… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Journal ref: 36th SIBGRAPI - Conference on Graphics, Patterns, and Images (SIBGRAPI'23), 2023, pp. 1-6

  46. arXiv:2309.01104  [pdf, other

    cs.CV cs.CR cs.LG cs.MM

    Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection

    Authors: Weijie Wang, Zhengyu Zhao, Nicu Sebe, Bruno Lepri

    Abstract: Malicious use of deepfakes leads to serious public concerns and reduces people's trust in digital media. Although effective deepfake detectors have been proposed, they are substantially vulnerable to adversarial attacks. To evaluate the detector's robustness, recent studies have explored various attacks. However, all existing attacks are limited to 2D image perturbations, which are hard to transla… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  47. arXiv:2308.14619  [pdf, other

    cs.CV

    Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation

    Authors: Cristiano Saltori, Fabio Galasso, Giuseppe Fiameni, Nicu Sebe, Fabio Poiesi, Elisa Ricci

    Abstract: Deep-learning models for 3D point cloud semantic segmentation exhibit limited generalization capabilities when trained and tested on data captured with different sensors or in varying environments due to domain shift. Domain adaptation methods can be employed to mitigate this domain shift, for instance, by simulating sensor noise, developing domain-agnostic generators, or training point cloud comp… ▽ More

    Submitted 29 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: TPAMI. arXiv admin note: text overlap with arXiv:2207.09778

  48. Interactive Neural Painting

    Authors: Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

    Abstract: In the last few years, Neural Painting (NP) techniques became capable of producing extremely realistic artworks. This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP. Considering a setting where a user looks at a scene and tries to reproduce it on a painting, our objective is to develop a computational framework to assist the… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: This is a preprint version of the paper to appear at Computer Vision and Image Understanding (CVIU). The final journal version will be available at https://www.sciencedirect.com/science/article/pii/S1077314223001583

    Journal ref: 10.1016/j.cviu.2023.103778

  49. arXiv:2307.12084  [pdf, other

    cs.CV

    Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis

    Authors: Hao Tang, Guolei Sun, Nicu Sebe, Luc Van Gool

    Abstract: We propose a novel ECGAN for the challenging semantic image synthesis task. Although considerable improvements have been achieved by the community in the recent period, the quality of synthesized images is far from satisfactory due to three largely unresolved challenges. 1) The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and str… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted to TPAMI, an extended version of a paper published in ICLR2023. arXiv admin note: substantial text overlap with arXiv:2003.13898

  50. arXiv:2307.09416  [pdf, other

    cs.CV cs.CL

    Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

    Authors: Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe

    Abstract: Research in Image Generation has recently made significant progress, particularly boosted by the introduction of Vision-Language models which are able to produce high-quality visual content based on textual inputs. Despite ongoing advancements in terms of generation quality and realism, no methodical frameworks have been defined yet to quantitatively measure the quality of the generated content an… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted as oral at ACM MultiMedia 2023 (Brave New Ideas track)