Search | arXiv e-print repository

Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search

Authors: Kirill Paramonov, Jia-Xing Zhong, Umberto Michieli, Jijoong Moon, Mete Ozay

Abstract: In this paper, we address a recent trend in robotic home appliances to include vision systems on personal devices, capable of personalizing the appliances on the fly. In particular, we formulate and address an important technical task of personal object search, which involves localization and identification of personal items of interest on images captured by robotic appliances, with each item refe… ▽ More In this paper, we address a recent trend in robotic home appliances to include vision systems on personal devices, capable of personalizing the appliances on the fly. In particular, we formulate and address an important technical task of personal object search, which involves localization and identification of personal items of interest on images captured by robotic appliances, with each item referenced only by a few annotated images. The task is crucial for robotic home appliances and mobile systems, which need to process personal visual scenes or to operate with particular personal objects (e.g., for grasping or navigation). In practice, personal object search presents two main technical challenges. First, a robot vision system needs to be able to distinguish between many fine-grained classes, in the presence of occlusions and clutter. Second, the strict resource requirements for the on-device system restrict the usage of most state-of-the-art methods for few-shot learning and often prevent on-device adaptation. In this work, we propose Swiss DINO: a simple yet effective framework for one-shot personal object search based on the recent DINOv2 transformer model, which was shown to have strong zero-shot generalization properties. Swiss DINO handles challenging on-device personalized scene understanding requirements and does not require any adaptation training. We show significant improvement (up to 55%) in segmentation and recognition accuracy compared to the common lightweight solutions, and significant footprint reduction of backbone inference time (up to 100x) and GPU consumption (up to 10x) compared to the heavy transformer-based solutions. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 8 pages, 2 figures, accepted to IROS2024

arXiv:2407.06450 [pdf, other]

Enhanced Model Robustness to Input Corruptions by Per-corruption Adaptation of Normalization Statistics

Authors: Elena Camuffo, Umberto Michieli, Simone Milani, Jijoong Moon, Mete Ozay

Abstract: Developing a reliable vision system is a fundamental challenge for robotic technologies (e.g., indoor service robots and outdoor autonomous robots) which can ensure reliable navigation even in challenging environments such as adverse weather conditions (e.g., fog, rain), poor lighting conditions (e.g., over/under exposure), or sensor degradation (e.g., blurring, noise), and can guarantee high perf… ▽ More Developing a reliable vision system is a fundamental challenge for robotic technologies (e.g., indoor service robots and outdoor autonomous robots) which can ensure reliable navigation even in challenging environments such as adverse weather conditions (e.g., fog, rain), poor lighting conditions (e.g., over/under exposure), or sensor degradation (e.g., blurring, noise), and can guarantee high performance in safety-critical functions. Current solutions proposed to improve model robustness usually rely on generic data augmentation techniques or employ costly test-time adaptation methods. In addition, most approaches focus on addressing a single vision task (typically, image recognition) utilising synthetic data. In this paper, we introduce Per-corruption Adaptation of Normalization statistics (PAN) to enhance the model robustness of vision systems. Our approach entails three key components: (i) a corruption type identification module, (ii) dynamic adjustment of normalization layer statistics based on identified corruption type, and (iii) real-time update of these statistics according to input data. PAN can integrate seamlessly with any convolutional model for enhanced accuracy in several robot vision tasks. In our experiments, PAN obtains robust performance improvement on challenging real-world corrupted image datasets (e.g., OpenLoris, ExDark, ACDC), where most of the current solutions tend to fail. Moreover, PAN outperforms the baseline models by 20-30% on synthetic benchmarks in object recognition tasks. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Journal ref: International Conference on Intelligent Robots and Systems (IROS), 2024

arXiv:2407.01193 [pdf, other]

Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection

Authors: Francesco Barbato, Umberto Michieli, Jijoong Moon, Pietro Zanuttigh, Mete Ozay

Abstract: Recent years have seen object detection robotic systems deployed in several personal devices (e.g., home robots and appliances). This has highlighted a challenge in their design, i.e., they cannot efficiently update their knowledge to distinguish between general classes and user-specific instances (e.g., a dog vs. user's dog). We refer to this challenging task as Instance-level Personalized Object… ▽ More Recent years have seen object detection robotic systems deployed in several personal devices (e.g., home robots and appliances). This has highlighted a challenge in their design, i.e., they cannot efficiently update their knowledge to distinguish between general classes and user-specific instances (e.g., a dog vs. user's dog). We refer to this challenging task as Instance-level Personalized Object Detection (IPOD). The personalization task requires many samples for model tuning and optimization in a centralized server, raising privacy concerns. An alternative is provided by approaches based on recent large-scale Foundation Models, but their compute costs preclude on-device applications. In our work we tackle both problems at the same time, designing a Few-Shot IPOD strategy called AuXFT. We introduce a conditional coarse-to-fine few-shot learner to refine the coarse predictions made by an efficient object detector, showing that using an off-the-shelf model leads to poor personalization due to neural collapse. Therefore, we introduce a Translator block that generates an auxiliary feature space where features generated by a self-supervised model (e.g., DINOv2) are distilled without impacting the performance of the detector. We validate AuXFT on three publicly available datasets and one in-house benchmark designed for the IPOD task, achieving remarkable gains in all considered scenarios with excellent time-complexity trade-off: AuXFT reaches a performance of 80% its upper bound at just 32% of the inference time, 13% of VRAM and 19% of the model size. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted at IROS 2024, 8 pages, 4 figures, 6 tables

arXiv:2406.14563 [pdf, other]

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Authors: Hasan Abed Al Kader Hammoud, Umberto Michieli, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem, Mete Ozay

Abstract: Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popu… ▽ More Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2404.01397 [pdf, other]

Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition

Authors: Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay

Abstract: Nowadays, users demand for increased personalization of vision systems to localize and identify personal instances of objects (e.g., my dog rather than dog) from a few-shot dataset only. Despite outstanding results of deep networks on classical label-abundant benchmarks (e.g., those of the latest YOLOv8 model for standard object detection), they struggle to maintain within-class variability to rep… ▽ More Nowadays, users demand for increased personalization of vision systems to localize and identify personal instances of objects (e.g., my dog rather than dog) from a few-shot dataset only. Despite outstanding results of deep networks on classical label-abundant benchmarks (e.g., those of the latest YOLOv8 model for standard object detection), they struggle to maintain within-class variability to represent different instances rather than object categories only. We construct an Object-conditioned Bag of Instances (OBoI) based on multi-order statistics of extracted features, where generic object detection models are extended to search and identify personal instances from the OBoI's metric space, without need for backpropagation. By relying on multi-order statistics, OBoI achieves consistent superior accuracy in distinguishing different instances. In the results, we achieve 77.1% personal object recognition accuracy in case of 18 personal instances, showing about 12% relative gain over the state of the art. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: ICASSP 2024. Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other

arXiv:2403.14335 [pdf, other]

FFT-based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images

Authors: Elena Camuffo, Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay

Abstract: Improving model robustness in case of corrupted images is among the key challenges to enable robust vision systems on smart devices, such as robotic agents. Particularly, robust test-time performance is imperative for most of the applications. This paper presents a novel approach to improve robustness of any classification model, especially on severely corrupted images. Our method (FROST) employs… ▽ More Improving model robustness in case of corrupted images is among the key challenges to enable robust vision systems on smart devices, such as robotic agents. Particularly, robust test-time performance is imperative for most of the applications. This paper presents a novel approach to improve robustness of any classification model, especially on severely corrupted images. Our method (FROST) employs high-frequency features to detect input image corruption type, and select layer-wise feature normalization statistics. FROST provides the state-of-the-art results for different models and datasets, outperforming competitors on ImageNet-C by up to 37.1% relative gain, improving baseline of 40.9% mCE on severe corruptions. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: ICASSP 2024. Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other

Journal ref: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

arXiv:2402.18614 [pdf, other]

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains

Authors: Hafiz Tiomoko Ali, Umberto Michieli, Ji Joong Moon, Daehyun Kim, Mete Ozay

Abstract: The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability… ▽ More The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: ICASSP 2024. Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other

arXiv:2402.18449 [pdf, other]

HOP to the Next Tasks and Domains for Continual Learning in NLP

Authors: Umberto Michieli, Mete Ozay

Abstract: Continual Learning (CL) aims to learn a sequence of problems (i.e., tasks and domains) by transferring knowledge acquired on previous problems, whilst avoiding forgetting of past ones. Different from previous approaches which focused on CL for one NLP task or domain in a specific use-case, in this paper, we address a more general CL setting to learn from a sequence of problems in a unique framewor… ▽ More Continual Learning (CL) aims to learn a sequence of problems (i.e., tasks and domains) by transferring knowledge acquired on previous problems, whilst avoiding forgetting of past ones. Different from previous approaches which focused on CL for one NLP task or domain in a specific use-case, in this paper, we address a more general CL setting to learn from a sequence of problems in a unique framework. Our method, HOP, permits to hop across tasks and domains by addressing the CL problem along three directions: (i) we employ a set of adapters to generalize a large pre-trained model to unseen problems, (ii) we compute high-order moments over the distribution of embedded representations to distinguish independent and correlated statistics across different tasks and domains, (iii) we process this enriched information with auxiliary heads specialized for each end problem. Extensive experimental campaign on 4 NLP applications, 5 benchmarks and 2 CL setups demonstrates the effectiveness of our HOP. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: AAAI 2024. Main + supplmentary

arXiv:2402.18402 [pdf, other]

doi 10.1145/3625468.3647623

A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation

Authors: Francesco Barbato, Umberto Michieli, Mehmet Kerim Yucel, Pietro Zanuttigh, Mete Ozay

Abstract: In multimedia understanding tasks, corrupted samples pose a critical challenge, because when fed to machine learning models they lead to performance degradation. In the past, three groups of approaches have been proposed to handle noisy data: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All the a… ▽ More In multimedia understanding tasks, corrupted samples pose a critical challenge, because when fed to machine learning models they lead to performance degradation. In the past, three groups of approaches have been proposed to handle noisy data: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All the aforementioned approaches come with drawbacks that limit their applicability; the first has high computational costs and requires pairs of clean-corrupted data for training, while the others only allow deployment of the same task/network they were trained on (\ie, when upstream and downstream task/network are the same). In this paper, we propose SyMPIE to solve these shortcomings. To this end, we design a small, modular, and efficient (just 2GFLOPs to process a Full HD image) system to enhance input data for robust downstream multimedia understanding with minimal computational cost. Our SyMPIE is pre-trained on an upstream task/network that should not match the downstream ones and does not need paired clean-corrupted samples. Our key insight is that most input corruptions found in real-world tasks can be modeled through global operations on color channels of images or spatial filters with small kernels. We validate our approach on multiple datasets and tasks, such as image classification (on ImageNetC, ImageNetC-Bar, VizWiz, and a newly proposed mixed corruption benchmark named ImageNetC-mixed) and semantic segmentation (on Cityscapes, ACDC, and DarkZurich) with consistent improvements of about 5\% relative accuracy gain across the board. The code of our approach and the new ImageNetC-mixed benchmark will be made available upon publication. △ Less

Submitted 29 February, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted at ACM MMSys'24. 10 pages, 7 figures, 8 tables

arXiv:2309.10479 [pdf, other]

RECALL+: Adversarial Web-based Replay for Continual Learning in Semantic Segmentation

Authors: Chang Liu, Giulia Rizzoli, Francesco Barbato, Andrea Maracani, Marco Toldo, Umberto Michieli, Yi Niu, Pietro Zanuttigh

Abstract: Catastrophic forgetting of previous knowledge is a critical issue in continual learning typically handled through various regularization strategies. However, existing methods struggle especially when several incremental steps are performed. In this paper, we extend our previous approach (RECALL) and tackle forgetting by exploiting unsupervised web-crawled data to retrieve examples of old classes f… ▽ More Catastrophic forgetting of previous knowledge is a critical issue in continual learning typically handled through various regularization strategies. However, existing methods struggle especially when several incremental steps are performed. In this paper, we extend our previous approach (RECALL) and tackle forgetting by exploiting unsupervised web-crawled data to retrieve examples of old classes from online databases. In contrast to the original methodology, which did not incorporate an assessment of web-based data, the present work proposes two advanced techniques: an adversarial approach and an adaptive threshold strategy. These methods are utilized to meticulously choose samples from web data that exhibit strong statistical congruence with the no longer available training data. Furthermore, we improved the pseudo-labeling scheme to achieve a more accurate labeling of web data that also considers classes being learned in the current step. Experimental results show that this enhanced approach achieves remarkable results, particularly when the incremental scenario spans multiple steps. △ Less

Submitted 16 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

arXiv:2307.12660 [pdf, other]

Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics

Authors: Umberto Michieli, Pablo Peso Parada, Mete Ozay

Abstract: Keyword Spotting (KWS) models on embedded devices should adapt fast to new user-defined words without forgetting previous ones. Embedded devices have limited storage and computational resources, thus, they cannot save samples or update large models. We consider the setup of embedded online continual learning (EOCL), where KWS models with frozen backbone are trained to incrementally recognize new w… ▽ More Keyword Spotting (KWS) models on embedded devices should adapt fast to new user-defined words without forgetting previous ones. Embedded devices have limited storage and computational resources, thus, they cannot save samples or update large models. We consider the setup of embedded online continual learning (EOCL), where KWS models with frozen backbone are trained to incrementally recognize new words from a non-repeated stream of samples, seen one at a time. To this end, we propose Temporal Aware Pooling (TAP) which constructs an enriched feature space computing high-order moments of speech features extracted by a pre-trained backbone. Our method, TAP-SLDA, updates a Gaussian model for each class on the enriched feature space to effectively use audio representations. In experimental analyses, TAP-SLDA outperforms competitors on several setups, backbones, and baselines, bringing a relative average gain of 11.3% on the GSC dataset. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: INTERSPEECH 2023

arXiv:2307.12659 [pdf, other]

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Authors: Edward Fish, Umberto Michieli, Mete Ozay

Abstract: Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small s… ▽ More Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small set of unlabelled samples from the target domain. To this end, we propose myQASR, a mixed-precision quantization method that generates tailored quantization schemes for diverse users under any memory requirement with no fine-tuning. myQASR automatically evaluates the quantization sensitivity of network layers by analysing the full-precision activation values. We are then able to generate a personalised mixed-precision quantization scheme for any pre-determined memory budget. Results for large-scale ASR models show how myQASR improves performance for specific genders, languages, and speakers. △ Less

Submitted 11 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: INTERSPEECH 2023. Code is available at https://github.com/SamsungLabs/myQASR

arXiv:2307.09827 [pdf, other]

Online Continual Learning for Robust Indoor Object Recognition

Authors: Umberto Michieli, Mete Ozay

Abstract: Vision systems mounted on home robots need to interact with unseen classes in changing environments. Robots have limited computational resources, labelled data and storage capability. These requirements pose some unique challenges: models should adapt without forgetting past knowledge in a data- and parameter-efficient way. We characterize the problem as few-shot (FS) online continual learning (OC… ▽ More Vision systems mounted on home robots need to interact with unseen classes in changing environments. Robots have limited computational resources, labelled data and storage capability. These requirements pose some unique challenges: models should adapt without forgetting past knowledge in a data- and parameter-efficient way. We characterize the problem as few-shot (FS) online continual learning (OCL), where robotic agents learn from a non-repeated stream of few-shot data updating only a few model parameters. Additionally, such models experience variable conditions at test time, where objects may appear in different poses (e.g., horizontal or vertical) and environments (e.g., day or night). To improve robustness of CL agents, we propose RobOCLe, which; 1) constructs an enriched feature space computing high order statistical moments from the embedded features of samples; and 2) computes similarity between high order statistics of the samples on the enriched feature space, and predicts their class labels. We evaluate robustness of CL models to train/test augmentations in various cases. We show that different moments allow RobOCLe to capture different properties of deformations, providing higher robustness with no decrease of inference speed. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: IROS 2023

arXiv:2301.11145 [pdf, other]

Learning from Mistakes: Self-Regularizing Hierarchical Representations in Point Cloud Semantic Segmentation

Authors: Elena Camuffo, Umberto Michieli, Simone Milani

Abstract: Recent advances in autonomous robotic technologies have highlighted the growing need for precise environmental analysis. LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding by acting directly on raw content provided by sensors. Recent solutions showed how different learning techniques can be used to improve the performance of the model, without any archi… ▽ More Recent advances in autonomous robotic technologies have highlighted the growing need for precise environmental analysis. LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding by acting directly on raw content provided by sensors. Recent solutions showed how different learning techniques can be used to improve the performance of the model, without any architectural or dataset change. Following this trend, we present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model. First, classes are clustered into macro groups according to mutual prediction errors; then, the learning process is regularized by: (1) aligning class-conditional prototypical feature representation for both fine and coarse classes, (2) weighting instances with a per-class fairness index. Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture; indeed, experimental results showed that it enables state-of-the-art performances on different architectures, datasets and tasks, while ensuring more balanced class-wise results and faster convergence. △ Less

Submitted 19 December, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Journal ref: IEEE Transactions on Multimedia (TMM), 2023

arXiv:2210.07016 [pdf, other]

Learning with Style: Continual Semantic Segmentation Across Tasks and Domains

Authors: Marco Toldo, Umberto Michieli, Pietro Zanuttigh

Abstract: Deep learning models dealing with image understanding in real-world settings must be able to adapt to a wide variety of tasks across different domains. Domain adaptation and class incremental learning deal with domain and task variability separately, whereas their unified solution is still an open problem. We tackle both facets of the problem together, taking into account the semantic shift within… ▽ More Deep learning models dealing with image understanding in real-world settings must be able to adapt to a wide variety of tasks across different domains. Domain adaptation and class incremental learning deal with domain and task variability separately, whereas their unified solution is still an open problem. We tackle both facets of the problem together, taking into account the semantic shift within both input and label spaces. We start by formally introducing continual learning under task and domain shift. Then, we address the proposed setup by using style transfer techniques to extend knowledge across domains when learning incremental tasks and a robust distillation framework to effectively recollect task knowledge under incremental domain shift. The devised framework (LwS, Learning with Style) is able to generalize incrementally acquired task knowledge across all the domains encountered, proving to be robust against catastrophic forgetting. Extensive experimental evaluation on multiple autonomous driving datasets shows how the proposed method outperforms existing approaches, which prove to be ill-equipped to deal with continual semantic segmentation under both task and domain shift. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 16 pages, 7 figures

arXiv:2210.02326 [pdf, other]

Learning Across Domains and Devices: Style-Driven Source-Free Domain Adaptation in Clustered Federated Learning

Authors: Donald Shenaj, Eros Fanì, Marco Toldo, Debora Caldarola, Antonio Tavera, Umberto Michieli, Marco Ciccone, Pietro Zanuttigh, Barbara Caputo

Abstract: Federated Learning (FL) has recently emerged as a possible way to tackle the domain shift in real-world Semantic Segmentation (SS) without compromising the private nature of the collected data. However, most of the existing works on FL unrealistically assume labeled data in the remote clients. Here we propose a novel task (FFREEDA) in which the clients' data is unlabeled and the server accesses a… ▽ More Federated Learning (FL) has recently emerged as a possible way to tackle the domain shift in real-world Semantic Segmentation (SS) without compromising the private nature of the collected data. However, most of the existing works on FL unrealistically assume labeled data in the remote clients. Here we propose a novel task (FFREEDA) in which the clients' data is unlabeled and the server accesses a source labeled dataset for pre-training only. To solve FFREEDA, we propose LADD, which leverages the knowledge of the pre-trained model by employing self-supervision with ad-hoc regularization techniques for local training and introducing a novel federated clustered aggregation scheme based on the clients' style. Our experiments show that our algorithm is able to efficiently tackle the new task outperforming existing approaches. The code is available at https://github.com/Erosinho13/LADD. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: WACV 2023; 11 pages manuscript, 6 pages supplemental material

arXiv:2204.09788 [pdf, other]

SELMA: SEmantic Large-scale Multimodal Acquisitions in Variable Weather, Daytime and Viewpoints

Authors: Paolo Testolina, Francesco Barbato, Umberto Michieli, Marco Giordani, Pietro Zanuttigh, Michele Zorzi

Abstract: Accurate scene understanding from multiple sensors mounted on cars is a key requirement for autonomous driving systems. Nowadays, this task is mainly performed through data-hungry deep learning techniques that need very large amounts of data to be trained. Due to the high cost of performing segmentation labeling, many synthetic datasets have been proposed. However, most of them miss the multi-sens… ▽ More Accurate scene understanding from multiple sensors mounted on cars is a key requirement for autonomous driving systems. Nowadays, this task is mainly performed through data-hungry deep learning techniques that need very large amounts of data to be trained. Due to the high cost of performing segmentation labeling, many synthetic datasets have been proposed. However, most of them miss the multi-sensor nature of the data, and do not capture the significant changes introduced by the variation of daytime and weather conditions. To fill these gaps, we introduce SELMA, a novel synthetic dataset for semantic segmentation that contains more than 30K unique waypoints acquired from 24 different sensors including RGB, depth, semantic cameras and LiDARs, in 27 different atmospheric and daytime conditions, for a total of more than 20M samples. SELMA is based on CARLA, an open-source simulator for generating synthetic data in autonomous driving scenarios, that we modified to increase the variability and the diversity in the scenes and class sets, and to align it with other benchmark datasets. As shown by the experimental evaluation, SELMA allows the efficient training of standard and multi-modal deep learning architectures, and achieves remarkable results on real-world data. SELMA is free and publicly available, thus supporting open science and research. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: 14 figures, 14 tables. This paper has been submitted to IEEE. Copyright may change without notice

arXiv:2201.06974 [pdf, other]

doi 10.1016/j.imavis.2022.104426

Continual Coarse-to-Fine Domain Adaptation in Semantic Segmentation

Authors: Donald Shenaj, Francesco Barbato, Umberto Michieli, Pietro Zanuttigh

Abstract: Deep neural networks are typically trained in a single shot for a specific task and data distribution, but in real world settings both the task and the domain of application can change. The problem becomes even more challenging in dense predictive tasks, such as semantic segmentation, and furthermore most approaches tackle the two problems separately. In this paper we introduce the novel task of c… ▽ More Deep neural networks are typically trained in a single shot for a specific task and data distribution, but in real world settings both the task and the domain of application can change. The problem becomes even more challenging in dense predictive tasks, such as semantic segmentation, and furthermore most approaches tackle the two problems separately. In this paper we introduce the novel task of coarse-to-fine learning of semantic segmentation architectures in presence of domain shift. We consider subsequent learning stages progressively refining the task at the semantic level; i.e., the finer set of semantic labels at each learning step is hierarchically derived from the coarser set of the previous step. We propose a new approach (CCDA) to tackle this scenario. First, we employ the maximum squares loss to align source and target domains and, at the same time, to balance the gradients between well-classified and harder samples. Second, we introduce a novel coarse-to-fine knowledge distillation constraint to transfer network capabilities acquired on a coarser set of labels to a set of finer labels. Finally, we design a coarse-to-fine weight initialization rule to spread the importance from each coarse class to the respective finer classes. To evaluate our approach, we design two benchmarks where source knowledge is extracted from the GTA5 dataset and it is transferred to either the Cityscapes or the IDD datasets, and we show how it outperforms the main competitors. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 24 pages, 9 figures, 6 tables, under submission

arXiv:2108.03673 [pdf, other]

RECALL: Replay-based Continual Learning in Semantic Segmentation

Authors: Andrea Maracani, Umberto Michieli, Marco Toldo, Pietro Zanuttigh

Abstract: Deep networks allow to obtain outstanding results in semantic segmentation, however they need to be trained in a single shot with a large amount of data. Continual learning settings where new classes are learned in incremental steps and previous training data is no longer available are challenging due to the catastrophic forgetting phenomenon. Existing approaches typically fail when several increm… ▽ More Deep networks allow to obtain outstanding results in semantic segmentation, however they need to be trained in a single shot with a large amount of data. Continual learning settings where new classes are learned in incremental steps and previous training data is no longer available are challenging due to the catastrophic forgetting phenomenon. Existing approaches typically fail when several incremental steps are performed or in presence of a distribution shift of the background class. We tackle these issues by recreating no longer available data for the old classes and outlining a content inpainting scheme on the background class. We propose two sources for replay data. The first resorts to a generative adversarial network to sample from the class space of past learning steps. The second relies on web-crawled data to retrieve images containing examples of old classes from online databases. In both scenarios no samples of past steps are stored, thus avoiding privacy concerns. Replay data are then blended with new samples during the incremental steps. Our approach, RECALL, outperforms state-of-the-art methods. △ Less

Submitted 19 September, 2021; v1 submitted 8 August, 2021; originally announced August 2021.

Comments: Accepted by ICCV 2021

arXiv:2108.03021 [pdf, other]

Road Scenes Segmentation Across Different Domains by Disentangling Latent Representations

Authors: Francesco Barbato, Umberto Michieli, Marco Toldo, Pietro Zanuttigh

Abstract: Deep learning models obtain impressive accuracy in road scenes understanding, however they need a large quantity of labeled samples for their training. Additionally, such models do not generalise well to environments where the statistical properties of data do not perfectly match those of training scenes, and this can be a significant problem for intelligent vehicles. Hence, domain adaptation appr… ▽ More Deep learning models obtain impressive accuracy in road scenes understanding, however they need a large quantity of labeled samples for their training. Additionally, such models do not generalise well to environments where the statistical properties of data do not perfectly match those of training scenes, and this can be a significant problem for intelligent vehicles. Hence, domain adaptation approaches have been introduced to transfer knowledge acquired on a label-abundant source domain to a related label-scarce target domain. In this work, we design and carefully analyse multiple latent space-shaping regularisation strategies that work together to reduce the domain shift. More in detail, we devise a feature clustering strategy to increase domain alignment, a feature perpendicularity constraint to space apart features belonging to different semantic classes, including those not present in the current batch, and a feature norm alignment strategy to separate active and inactive channels. In addition, we propose a novel evaluation metric to capture the relative performance of an adapted model with respect to supervised training. We validate our framework in driving scenarios, considering both synthetic-to-real and real-to-real adaptation, outperforming previous feature-level state-of-the-art methods on multiple road scenes benchmarks. △ Less

Submitted 27 October, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

Comments: 10 pages, 3 supplementary pages, 10 figures, 3 supplementary figures, 2 tables, 1 supplementary table

arXiv:2105.08982 [pdf, other]

Prototype Guided Federated Learning of Visual Feature Representations

Authors: Umberto Michieli, Mete Ozay

Abstract: Federated Learning (FL) is a framework which enables distributed model training using a large corpus of decentralized training data. Existing methods aggregate models disregarding their internal representations, which are crucial for training models in vision tasks. System and statistical heterogeneity (e.g., highly imbalanced and non-i.i.d. data) further harm model training. To this end, we intro… ▽ More Federated Learning (FL) is a framework which enables distributed model training using a large corpus of decentralized training data. Existing methods aggregate models disregarding their internal representations, which are crucial for training models in vision tasks. System and statistical heterogeneity (e.g., highly imbalanced and non-i.i.d. data) further harm model training. To this end, we introduce a method, called FedProto, which computes client deviations using margins of prototypical representations learned on distributed data, and applies them to drive federated optimization via an attention mechanism. In addition, we propose three methods to analyse statistical properties of feature representations learned in FL, in order to elucidate the relationship between accuracy, margins and feature discrepancy of FL models. In experimental analyses, FedProto demonstrates state-of-the-art accuracy and convergence rate across image classification and semantic segmentation benchmarks by enabling maximum margin training of FL models. Moreover, FedProto reduces uncertainty of predictions of FL models compared to the baseline. To our knowledge, this is the first work evaluating FL models in dense prediction tasks, such as semantic segmentation. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: 11 pages manuscript, 6 pages supplemental material

arXiv:2104.02633 [pdf, other]

Latent Space Regularization for Unsupervised Domain Adaptation in Semantic Segmentation

Authors: Francesco Barbato, Marco Toldo, Umberto Michieli, Pietro Zanuttigh

Abstract: Deep convolutional neural networks for semantic segmentation achieve outstanding accuracy, however they also have a couple of major drawbacks: first, they do not generalize well to distributions slightly different from the one of the training data; second, they require a huge amount of labeled data for their optimization. In this paper, we introduce feature-level space-shaping regularization strat… ▽ More Deep convolutional neural networks for semantic segmentation achieve outstanding accuracy, however they also have a couple of major drawbacks: first, they do not generalize well to distributions slightly different from the one of the training data; second, they require a huge amount of labeled data for their optimization. In this paper, we introduce feature-level space-shaping regularization strategies to reduce the domain discrepancy in semantic segmentation. In particular, for this purpose we jointly enforce a clustering objective, a perpendicularity constraint and a norm alignment goal on the feature vectors corresponding to source and target samples. Additionally, we propose a novel measure able to capture the relative efficacy of an adaptation strategy compared to supervised training. We verify the effectiveness of such methods in the autonomous driving setting achieving state-of-the-art results in multiple synthetic-to-real road scenes benchmarks. △ Less

Submitted 7 July, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: Accepted at CVPR-WAD 2021, 11 pages, 7 figures, 1 tables

arXiv:2103.06342 [pdf, other]

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

Authors: Umberto Michieli, Pietro Zanuttigh

Abstract: Deep neural networks suffer from the major limitation of catastrophic forgetting old tasks when learning new ones. In this paper we focus on class incremental continual learning in semantic segmentation, where new categories are made available over time while previous training data is not retained. The proposed continual learning scheme shapes the latent space to reduce forgetting whilst improving… ▽ More Deep neural networks suffer from the major limitation of catastrophic forgetting old tasks when learning new ones. In this paper we focus on class incremental continual learning in semantic segmentation, where new categories are made available over time while previous training data is not retained. The proposed continual learning scheme shapes the latent space to reduce forgetting whilst improving the recognition of novel classes. Our framework is driven by three novel components which we also combine on top of existing techniques effortlessly. First, prototypes matching enforces latent space consistency on old classes, constraining the encoder to produce similar latent representation for previously seen classes in the subsequent steps. Second, features sparsification allows to make room in the latent space to accommodate novel classes. Finally, contrastive learning is employed to cluster features according to their semantics while tearing apart those of different classes. Extensive evaluation on the Pascal VOC2012 and ADE20K datasets demonstrates the effectiveness of our approach, significantly outperforming state-of-the-art methods. △ Less

Submitted 24 November, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

Comments: CVPR 2021. 22 pages, 10 figures, 11 tables

arXiv:2011.12616 [pdf, other]

Unsupervised Domain Adaptation in Semantic Segmentation via Orthogonal and Clustered Embeddings

Authors: Marco Toldo, Umberto Michieli, Pietro Zanuttigh

Abstract: Deep learning frameworks allowed for a remarkable advancement in semantic segmentation, but the data hungry nature of convolutional networks has rapidly raised the demand for adaptation techniques able to transfer learned knowledge from label-abundant domains to unlabeled ones. In this paper we propose an effective Unsupervised Domain Adaptation (UDA) strategy, based on a feature clustering method… ▽ More Deep learning frameworks allowed for a remarkable advancement in semantic segmentation, but the data hungry nature of convolutional networks has rapidly raised the demand for adaptation techniques able to transfer learned knowledge from label-abundant domains to unlabeled ones. In this paper we propose an effective Unsupervised Domain Adaptation (UDA) strategy, based on a feature clustering method that captures the different semantic modes of the feature distribution and groups features of the same class into tight and well-separated clusters. Furthermore, we introduce two novel learning objectives to enhance the discriminative clustering performance: an orthogonality loss forces spaced out individual representations to be orthogonal, while a sparsity loss reduces class-wise the number of active feature channels. The joint effect of these modules is to regularize the structure of the feature space. Extensive evaluations in the synthetic-to-real scenario show that we achieve state-of-the-art performance. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: Accepted at WACV 2021

arXiv:2007.09073 [pdf, other]

GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild

Authors: Umberto Michieli, Edoardo Borsato, Luca Rossi, Pietro Zanuttigh

Abstract: The semantic segmentation of parts of objects in the wild is a challenging task in which multiple instances of objects and multiple parts within those objects must be detected in the scene. This problem remains nowadays very marginally explored, despite its fundamental importance towards detailed object understanding. In this work, we propose a novel framework combining higher object-level context… ▽ More The semantic segmentation of parts of objects in the wild is a challenging task in which multiple instances of objects and multiple parts within those objects must be detected in the scene. This problem remains nowadays very marginally explored, despite its fundamental importance towards detailed object understanding. In this work, we propose a novel framework combining higher object-level context conditioning and part-level spatial relationships to address the task. To tackle object-level ambiguity, a class-conditioning module is introduced to retain class-level semantics when learning parts-level semantics. In this way, mid-level features carry also this information prior to the decoding stage. To tackle part-level ambiguity and localization we propose a novel adjacency graph-based module that aims at matching the relative spatial relationships between ground truth and predicted parts. The experimental evaluation on the Pascal-Part dataset shows that we achieve state-of-the-art results on this task. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: ECCV 2020

arXiv:2005.10876 [pdf, other]

Unsupervised Domain Adaptation in Semantic Segmentation: a Review

Authors: Marco Toldo, Andrea Maracani, Umberto Michieli, Pietro Zanuttigh

Abstract: The aim of this paper is to give an overview of the recent advancements in the Unsupervised Domain Adaptation (UDA) of deep networks for semantic segmentation. This task is attracting a wide interest, since semantic segmentation models require a huge amount of labeled data and the lack of data fitting specific requirements is the main limitation in the deployment of these techniques. This problem… ▽ More The aim of this paper is to give an overview of the recent advancements in the Unsupervised Domain Adaptation (UDA) of deep networks for semantic segmentation. This task is attracting a wide interest, since semantic segmentation models require a huge amount of labeled data and the lack of data fitting specific requirements is the main limitation in the deployment of these techniques. This problem has been recently explored and has rapidly grown with a large number of ad-hoc approaches. This motivates us to build a comprehensive overview of the proposed methodologies and to provide a clear categorization. In this paper, we start by introducing the problem, its formulation and the various scenarios that can be considered. Then, we introduce the different levels at which adaptation strategies may be applied: namely, at the input (image) level, at the internal features representation and at the output level. Furthermore, we present a detailed overview of the literature in the field, dividing previous methods based on the following (non mutually exclusive) categories: adversarial learning, generative-based, analysis of the classifier discrepancies, self-teaching, entropy minimization, curriculum learning and multi-task learning. Novel research directions are also briefly introduced to give a hint of interesting open problems in the field. Finally, a comparison of the performance of the various methods in the widely used autonomous driving scenario is presented. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: 34 pages, 7 figures, 2 tables

arXiv:2004.12724 [pdf, other]

Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training

Authors: Teo Spadotto, Marco Toldo, Umberto Michieli, Pietro Zanuttigh

Abstract: Unsupervised Domain Adaptation (UDA) aims at improving the generalization capability of a model trained on a source domain to perform well on a target domain for which no labeled data is available. In this paper, we consider the semantic segmentation of urban scenes and we propose an approach to adapt a deep neural network trained on synthetic data to real scenes addressing the domain shift betwee… ▽ More Unsupervised Domain Adaptation (UDA) aims at improving the generalization capability of a model trained on a source domain to perform well on a target domain for which no labeled data is available. In this paper, we consider the semantic segmentation of urban scenes and we propose an approach to adapt a deep neural network trained on synthetic data to real scenes addressing the domain shift between the two different data distributions. We introduce a novel UDA framework where a standard supervised loss on labeled synthetic data is supported by an adversarial module and a self-training strategy aiming at aligning the two domain distributions. The adversarial module is driven by a couple of fully convolutional discriminators dealing with different domains: the first discriminates between ground truth and generated maps, while the second between segmentation maps coming from synthetic or real world data. The self-training module exploits the confidence estimated by the discriminators on unlabeled data to select the regions used to reinforce the learning process. Furthermore, the confidence is thresholded with an adaptive mechanism based on the per-class overall confidence. Experimental results prove the effectiveness of the proposed strategy in adapting a segmentation network trained on synthetic datasets like GTA5 and SYNTHIA, to real world datasets like Cityscapes and Mapillary. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: 8 pages, 3 figures, 2 tables

arXiv:2001.04692 [pdf, other]

doi 10.1016/j.imavis.2020.103889

Unsupervised Domain Adaptation for Mobile Semantic Segmentation based on Cycle Consistency and Feature Alignment

Authors: Marco Toldo, Umberto Michieli, Gianluca Agresti, Pietro Zanuttigh

Abstract: The supervised training of deep networks for semantic segmentation requires a huge amount of labeled real world data. To solve this issue, a commonly exploited workaround is to use synthetic data for training, but deep networks show a critical performance drop when analyzing data with slightly different statistical properties with respect to the training set. In this work, we propose a novel Unsup… ▽ More The supervised training of deep networks for semantic segmentation requires a huge amount of labeled real world data. To solve this issue, a commonly exploited workaround is to use synthetic data for training, but deep networks show a critical performance drop when analyzing data with slightly different statistical properties with respect to the training set. In this work, we propose a novel Unsupervised Domain Adaptation (UDA) strategy to address the domain shift issue between real world and synthetic representations. An adversarial model, based on the cycle consistency framework, performs the mapping between the synthetic and real domain. The data is then fed to a MobileNet-v2 architecture that performs the semantic segmentation task. An additional couple of discriminators, working at the feature level of the MobileNet-v2, allows to better align the features of the two domain distributions and to further improve the performance. Finally, the consistency of the semantic maps is exploited. After an initial supervised training on synthetic data, the whole UDA architecture is trained end-to-end considering all its components at once. Experimental results show how the proposed strategy is able to obtain impressive performance in adapting a segmentation network trained on synthetic data to real world scenarios. The usage of the lightweight MobileNet-v2 architecture allows its deployment on devices with limited computational resources as the ones employed in autonomous vehicles. △ Less

Submitted 12 March, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

Comments: 11 pages, 3 figures, 3 tables

Journal ref: Image and Vision Computing, Volume 95, March 2020

arXiv:1911.03462 [pdf, other]

Knowledge Distillation for Incremental Learning in Semantic Segmentation

Authors: Umberto Michieli, Pietro Zanuttigh

Abstract: Deep learning architectures have shown remarkable results in scene understanding problems, however they exhibit a critical drop of performances when they are required to learn incrementally new tasks without forgetting old ones. This catastrophic forgetting phenomenon impacts on the deployment of artificial intelligence in real world scenarios where systems need to learn new and different represen… ▽ More Deep learning architectures have shown remarkable results in scene understanding problems, however they exhibit a critical drop of performances when they are required to learn incrementally new tasks without forgetting old ones. This catastrophic forgetting phenomenon impacts on the deployment of artificial intelligence in real world scenarios where systems need to learn new and different representations over time. Current approaches for incremental learning deal only with image classification and object detection tasks, while in this work we formally introduce incremental learning for semantic segmentation. We tackle the problem applying various knowledge distillation techniques on the previous model. In this way, we retain the information about learned classes, whilst updating the current model to learn the new ones. We developed four main methodologies of knowledge distillation working on both output layers and internal feature representations. We do not store any image belonging to previous training stages and only the last model is used to preserve high accuracy on previously learned classes. Extensive experimental results on the Pascal VOC2012 and MSRC-v2 datasets show the effectiveness of the proposed approaches in several incremental learning scenarios. △ Less

Submitted 20 January, 2021; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Computer Vision and Image Understanding (CVIU), 2021. arXiv admin note: text overlap with arXiv:1907.13372

arXiv:1909.00781 [pdf, other]

Adversarial Learning and Self-Teaching Techniques for Domain Adaptation in Semantic Segmentation

Authors: Umberto Michieli, Matteo Biasetton, Gianluca Agresti, Pietro Zanuttigh

Abstract: Deep learning techniques have been widely used in autonomous driving systems for the semantic understanding of urban scenes. However, they need a huge amount of labeled data for training, which is difficult and expensive to acquire. A recently proposed workaround is to train deep networks using synthetic data, but the domain shift between real world and synthetic representations limits the perform… ▽ More Deep learning techniques have been widely used in autonomous driving systems for the semantic understanding of urban scenes. However, they need a huge amount of labeled data for training, which is difficult and expensive to acquire. A recently proposed workaround is to train deep networks using synthetic data, but the domain shift between real world and synthetic representations limits the performance. In this work, a novel Unsupervised Domain Adaptation (UDA) strategy is introduced to solve this issue. The proposed learning strategy is driven by three components: a standard supervised learning loss on labeled synthetic data; an adversarial learning module that exploits both labeled synthetic data and unlabeled real data; finally, a self-teaching strategy applied to unlabeled data. The last component exploits a region growing framework guided by the segmentation confidence. Furthermore, we weighted this component on the basis of the class frequencies to enhance the performance on less common classes. Experimental results prove the effectiveness of the proposed strategy in adapting a segmentation network trained on synthetic datasets, like GTA5 and SYNTHIA, to real world datasets like Cityscapes and Mapillary. △ Less

Submitted 2 March, 2020; v1 submitted 2 September, 2019; originally announced September 2019.

Comments: Accepted at IEEE Transactions on Intelligent Vehicles (T-IV) 10 pages, 2 figures, 7 tables

arXiv:1907.13372 [pdf, other]

Incremental Learning Techniques for Semantic Segmentation

Authors: Umberto Michieli, Pietro Zanuttigh

Abstract: Deep learning architectures exhibit a critical drop of performance due to catastrophic forgetting when they are required to incrementally learn new tasks. Contemporary incremental learning frameworks focus on image classification and object detection while in this work we formally introduce the incremental learning problem for semantic segmentation in which a pixel-wise labeling is considered. To… ▽ More Deep learning architectures exhibit a critical drop of performance due to catastrophic forgetting when they are required to incrementally learn new tasks. Contemporary incremental learning frameworks focus on image classification and object detection while in this work we formally introduce the incremental learning problem for semantic segmentation in which a pixel-wise labeling is considered. To tackle this task we propose to distill the knowledge of the previous model to retain the information about previously learned classes, whilst updating the current model to learn the new ones. We propose various approaches working both on the output logits and on intermediate features. In opposition to some recent frameworks, we do not store any image from previously learned classes and only the last model is needed to preserve high accuracy on these classes. The experimental evaluation on the Pascal VOC2012 dataset shows the effectiveness of the proposed approaches. △ Less

Submitted 17 September, 2019; v1 submitted 31 July, 2019; originally announced July 2019.

Comments: 8 pages, 3 figures, 4 tables

Journal ref: International Conference on Computer Vision (ICCV), Workshop on Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV) 2019

arXiv:1804.08138 [pdf, other]

Complex Network Analysis of Men Single ATP Tennis Matches

Authors: Umberto Michieli

Abstract: Who are the most significant players in the history of men tennis? Is the official ATP ranking system fair in evaluating players scores? Which players deserved the most contemplation looking at their match records? Which players have never faced yet and are likely to play against in the future? Those are just some of the questions developed in this paper supported by data updated at April 2018. In… ▽ More Who are the most significant players in the history of men tennis? Is the official ATP ranking system fair in evaluating players scores? Which players deserved the most contemplation looking at their match records? Which players have never faced yet and are likely to play against in the future? Those are just some of the questions developed in this paper supported by data updated at April 2018. In order to give an answer to the aforementioned questions, complex network science techniques have been applied to some representations of the network of men singles tennis matches. Additionally, a new predictive algorithm is proposed in order to forecast the winner of a match. △ Less

Submitted 22 April, 2018; originally announced April 2018.

Comments: Dataset: https://drive.google.com/open?id=1mCxZfkkpIC9o-nxZ1yW3GBBdvBOPW6mQ 12 pages, 15 figures, 6 tables

arXiv:1803.02393 [pdf, other]

Game Theoretic Analysis of Road User Safety Scenarios Involving Autonomous Vehicles

Authors: Umberto Michieli, Leonardo Badia

Abstract: Interactions between pedestrians, bikers, and human-driven vehicles have been a major concern in traffic safety over the years. The upcoming age of autonomous vehicles will further raise major problems on whether self-driving cars can accurately avoid accidents; on the other hand, usability issues arise on whether human-driven cars and pedestrians can dominate the road at the expense of the autono… ▽ More Interactions between pedestrians, bikers, and human-driven vehicles have been a major concern in traffic safety over the years. The upcoming age of autonomous vehicles will further raise major problems on whether self-driving cars can accurately avoid accidents; on the other hand, usability issues arise on whether human-driven cars and pedestrians can dominate the road at the expense of the autonomous vehicles which will be programmed to avoid accidents. This paper proposes some game theoretical models applied to related traffic scenarios. In the first two games the reciprocal influence between a pedestrian and a vehicle (either autonomous or not) is analyzed, while the third game investigates the intersection of two vehicles, possibly autonomous. The games have been simulated in order to demonstrate the theoretical analysis and the predicted behaviors. These investigations can shed new lights on how novel urban traffic regulations could be required to allow for a better interaction of vehicles and a general improved management of traffic and communication vehicular networks. △ Less

Submitted 24 June, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

Comments: Accepted at 'IEEE International Symposium on Personal, Indoor and Mobile Radio Communications' 9-12 September 2018 - Bologna, Italy. Special Session on 'Wireless Technologies for Connected and Autonomous Vehicles'. 7 pages, 5 figures

arXiv:1707.09496 [pdf]

Local-ring network automata and the impact of hyperbolic geometry in complex network link-prediction

Authors: Alessandro Muscoloni, Umberto Michieli, Carlo Vittorio Cannistraci

Abstract: Topological link-prediction can exploit the entire network topology (global methods) or only the neighbourhood (local methods) of the link to predict. Global methods are believed the best. Is this common belief well-founded? Stochastic-Block-Model (SBM) is a global method believed as one of the best link-predictors, therefore it is considered a reference for comparison. But, our results suggest th… ▽ More Topological link-prediction can exploit the entire network topology (global methods) or only the neighbourhood (local methods) of the link to predict. Global methods are believed the best. Is this common belief well-founded? Stochastic-Block-Model (SBM) is a global method believed as one of the best link-predictors, therefore it is considered a reference for comparison. But, our results suggest that SBM, whose computational time is high, cannot in general overcome the Cannistraci-Hebb (CH) network automaton model that is a simple local-learning-rule of topological self-organization proved as the current best local-based and parameter-free deterministic rule for link-prediction. To elucidate the reasons of this unexpected result, we formally introduce the notion of local-ring network automata models and their relation with the nature of common-neighbours' definition in complex network theory. After extensive tests, we recommend Structural-Perturbation-Method (SPM) as the new best global method baseline. However, even SPM overall does not outperform CH and in several evaluation frameworks we astonishingly found the opposite. In particular, CH was the best predictor for synthetic networks generated by the Popularity-Similarity-Optimization (PSO) model, and its performance in PSO networks with community structure was even better than using the original internode-hyperbolic-distance as link-predictor. Interestingly, when tested on non-hyperbolic synthetic networks the performance of CH significantly dropped down indicating that this rule of network self-organization could be strongly associated to the rise of hyperbolic geometry in complex networks. The superiority of global methods seems a "misleading belief" caused by a latent geometry bias of the few small networks used as benchmark in previous studies. We propose to found a latent geometry theory of link-prediction in complex networks. △ Less

Submitted 29 August, 2018; v1 submitted 29 July, 2017; originally announced July 2017.

Showing 1–34 of 34 results for author: Michieli, U