Zum Hauptinhalt springen

Showing 1–27 of 27 results for author: Berretti, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.01627  [pdf, other

    cs.CV

    JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model

    Authors: Farzaneh Jafari, Stefano Berretti, Anup Basu

    Abstract: In recent years, talking head generation has become a focal point for researchers. Considerable effort is being made to refine lip-sync motion, capture expressive facial expressions, generate natural head poses, and achieve high video quality. However, no single model has yet achieved equivalence across all these metrics. This paper aims to animate a 3D face using Jamba, a hybrid Transformers-Mamb… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages with 3 figures

  2. arXiv:2405.07680  [pdf, other

    cs.CV cs.LG

    Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics

    Authors: Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier

    Abstract: The development of generative artificial intelligence for human motion generation has expanded rapidly, necessitating a unified evaluation framework. This paper presents a detailed review of eight evaluation metrics for human motion generation, highlighting their unique features and shortcomings. We propose standardized practices through a unified evaluation setup to facilitate consistent model co… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2403.12886  [pdf, other

    cs.CV

    EmoVOCA: Speech-Driven Emotional 3D Talking Heads

    Authors: Federico Nocentini, Claudio Ferrari, Stefano Berretti

    Abstract: The domain of 3D talking head generation has witnessed significant progress in recent years. A notable challenge in this field consists in blending speech-related motions with expression dynamics, which is primarily caused by the lack of comprehensive 3D datasets that combine diversity in spoken sentences with a variety of facial expressions. Whereas literature works attempted to exploit 2D video… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  4. arXiv:2403.10942  [pdf, other

    cs.CV

    ScanTalk: 3D Talking Heads from Unregistered Scans

    Authors: Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Stefano Berretti, Mohamed Daoudi

    Abstract: Speech-driven 3D talking heads generation has emerged as a significant area of interest among researchers, presenting numerous challenges. Existing methods are constrained by animating faces with fixed topologies, wherein point-wise correspondence is established, and the number and order of points remains consistent across all identities the model can animate. In this work, we present ScanTalk, a… ▽ More

    Submitted 19 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  5. arXiv:2311.14534  [pdf, other

    cs.LG

    Finding Foundation Models for Time Series Classification with a PreText Task

    Authors: Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier

    Abstract: Over the past decade, Time Series Classification (TSC) has gained an increasing attention. While various methods were explored, deep learning - particularly through Convolutional Neural Networks (CNNs)-stands out as an effective approach. However, due to the limited availability of training data, defining a foundation model for TSC that overcomes the overfitting problem is still a challenging task… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  6. arXiv:2309.16353  [pdf, other

    cs.LG

    ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging

    Authors: Ali Ismail-Fawaz, Hassan Ismail Fawaz, François Petitjean, Maxime Devanne, Jonathan Weber, Stefano Berretti, Geoffrey I. Webb, Germain Forestier

    Abstract: Time series data can be found in almost every domain, ranging from the medical field to manufacturing and wireless communication. Generating realistic and useful exemplars and prototypes is a fundamental data analysis task. In this paper, we investigate a novel approach to generating realistic and useful exemplars and prototypes for time series data. Our approach uses a new form of time series ave… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Published in AALTD workshop at ECML/PKDD 2023

  7. arXiv:2306.01415  [pdf, other

    cs.CV

    Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

    Authors: Federico Nocentini, Claudio Ferrari, Stefano Berretti

    Abstract: This paper presents a novel approach for generating 3D talking heads from raw audio inputs. Our method grounds on the idea that speech related movements can be comprehensively and efficiently described by the motion of a few control points located on the movable parts of the face, i.e., landmarks. The underlying musculoskeletal structure then allows us to learn how their motion influences the geom… ▽ More

    Submitted 26 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Journal ref: International Conference on Image Analysis and Processing (ICIAP) 2023

  8. arXiv:2306.01081  [pdf, other

    cs.CV cs.AI cs.MM

    4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks

    Authors: Lorenzo Berlincioni, Stefano Berretti, Marco Bertini, Alberto Del Bimbo

    Abstract: Time varying sequences of 3D point clouds, or 4D point clouds, are now being acquired at an increasing pace in several applications (e.g., LiDAR in autonomous or assisted driving). In many cases, such volume of data is transmitted, thus requiring that proper compression tools are applied to either reduce the resolution or the bandwidth. In this paper, we propose a new solution for upscaling and re… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  9. arXiv:2305.11921  [pdf, other

    stat.ME cs.AI cs.LG cs.PF

    An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set

    Authors: Ali Ismail-Fawaz, Angus Dempster, Chang Wei Tan, Matthieu Herrmann, Lynn Miller, Daniel F. Schmidt, Stefano Berretti, Jonathan Weber, Maxime Devanne, Germain Forestier, Geoffrey I. Webb

    Abstract: The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and inte… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  10. arXiv:2211.02366  [pdf, other

    cs.SD cs.CV eess.AS

    SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers

    Authors: A. Arezzo, S. Berretti

    Abstract: In recent years, Speech Emotion Recognition (SER) has been investigated mainly transforming the speech signal into spectrograms that are then classified using Convolutional Neural Networks pretrained on generic images and fine tuned with spectrograms. In this paper, we start from the general idea above and develop a new learning solution for SER, which is based on Compact Convolutional Transformer… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  11. arXiv:2210.16815  [pdf, other

    cs.CV

    CAD 3D Model classification by Graph Neural Networks: A new approach based on STEP format

    Authors: L. Mandelli, S. Berretti

    Abstract: In this paper, we introduce a new approach for retrieval and classification of 3D models that directly performs in the Computer-Aided Design (CAD) format without any conversion to other representations like point clouds or meshes, thus avoiding any loss of information. Among the various CAD formats, we consider the widely used STEP extension, which represents a standard for product manufacturing i… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  12. arXiv:2210.16807  [pdf, other

    cs.CV

    The Florence 4D Facial Expression Dataset

    Authors: F. Principi, S. Berretti, C. Ferrari, N. Otberdout, M. Daoudi, A. Del Bimbo

    Abstract: Human facial expressions change dynamically, so their recognition / analysis should be conducted by accounting for the temporal evolution of face deformations either in 2D or 3D. While abundant 2D video data do exist, this is not the case in 3D, where few 3D dynamic (4D) datasets were released for public use. The negative consequence of this scarcity of data is amplified by current deep learning b… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  13. arXiv:2209.01813  [pdf, other

    cs.CV

    Automatic Estimation of Self-Reported Pain by Trajectory Analysis in the Manifold of Fixed Rank Positive Semi-Definite Matrices

    Authors: Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Pietro Pala, Alberto Del Bimbo, Zakia Hammal

    Abstract: We propose an automatic method to estimate self-reported pain based on facial landmarks extracted from videos. For each video sequence, we decompose the face into four different regions and the pain intensity is measured by modeling the dynamics of facial movement using the landmarks of these regions. A formulation based on Gram matrices is used for representing the trajectory of landmarks on the… ▽ More

    Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

    Comments: To appear in IEEE Transactions On Affective Computing, it is an extension of our paper arXiv:2006.13882

  14. arXiv:2208.00050  [pdf, other

    cs.CV

    Generating Multiple 4D Expression Transitions by Learning Face Landmark Trajectories

    Authors: Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo

    Abstract: In this work, we address the problem of 4D facial expressions generation. This is usually addressed by animating a neutral 3D face to reach an expression peak, and then get back to the neutral state. In the real world though, people show more complex expressions, and switch from one expression to another. We thus propose a new model that generates transitions between different expressions, and syn… ▽ More

    Submitted 18 May, 2023; v1 submitted 29 July, 2022; originally announced August 2022.

    Comments: This preprint is an extension of CVPR 2022 paper arXiv:2105.07463

  15. arXiv:2206.11759  [pdf, other

    cs.CV cs.GR

    What makes you, you? Analyzing Recognition by Swapping Face Parts

    Authors: Claudio Ferrari, Matteo Serpentoni, Stefano Berretti, Alberto Del Bimbo

    Abstract: Deep learning advanced face recognition to an unprecedented accuracy. However, understanding how local parts of the face affect the overall recognition performance is still mostly unclear. Among others, face swap has been experimented to this end, but just for the entire face. In this paper, we propose to swap facial parts as a way to disentangle the recognition relevance of different face parts,… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted for publication at 26TH International Conference on Pattern Recognition (ICPR), 2022

  16. arXiv:2105.07463  [pdf, other

    cs.CV cs.AI

    Sparse to Dense Dynamic 3D Facial Expression Generation

    Authors: Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo

    Abstract: In this paper, we propose a solution to the task of generating dynamic 3D facial expressions from a neutral 3D face and an expression label. This involves solving two sub-problems: (i)modeling the temporal dynamics of expressions, and (ii) deforming the neutral mesh to obtain the expressive counterpart. We represent the temporal evolution of expressions using the motion of a sparse set of 3D landm… ▽ More

    Submitted 3 March, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: paper accepted at CVPR 2022

  17. arXiv:2006.13895  [pdf, other

    cs.CV

    Modelling the Statistics of Cyclic Activities by Trajectory Analysis on the Manifold of Positive-Semi-Definite Matrices

    Authors: Ettore Maria Celozzi, Luca Ciabini, Luca Cultrera, Pietro Pala, Stefano Berretti, Mohamed Daoudi, Alberto Del Bimbo

    Abstract: In this paper, a model is presented to extract statistical summaries to characterize the repetition of a cyclic body action, for instance a gym exercise, for the purpose of checking the compliance of the observed action to a template one and highlighting the parts of the action that are not correctly executed (if any). The proposed system relies on a Riemannian metric to compute the distance betwe… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: accepted at 15th IEEE International Conference on Automatic Face and Gesture Recognition 2020

  18. arXiv:2006.13882  [pdf, other

    cs.CV

    Automatic Estimation of Self-Reported Pain by Interpretable Representations of Motion Dynamics

    Authors: Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Pietro Pala, Alberto Del Bimbo, Zakia Hammal

    Abstract: We propose an automatic method for pain intensity measurement from video. For each video, pain intensity was measured using the dynamics of facial movement using 66 facial points. Gram matrices formulation was used for facial points trajectory representations on the Riemannian manifold of symmetric positive semi-definite matrices of fixed rank. Curve fitting and temporal alignment were then used t… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: accepted at ICPR 2020 Conference

  19. A Sparse and Locally Coherent Morphable Face Model for Dense Semantic Correspondence Across Heterogeneous 3D Faces

    Authors: Claudio Ferrari, Stefano Berretti, Pietro Pala, Alberto Del Bimbo

    Abstract: The 3D Morphable Model (3DMM) is a powerful statistical tool for representing 3D face shapes. To build a 3DMM, a training set of face scans in full point-to-point correspondence is required, and its modeling capabilities directly depend on the variability contained in the training data. Thus, to increase the descriptive power of the 3DMM, establishing a dense correspondence across heterogeneous sc… ▽ More

    Submitted 24 June, 2021; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  20. arXiv:1908.00646  [pdf, other

    cs.CV

    Fitting, Comparison, and Alignment of Trajectories on Positive Semi-Definite Matrices with Application to Action Recognition

    Authors: Benjamin Szczapa, Mohamed Daoudi, Stefano Berretti, Alberto Del Bimbo, Pietro Pala, Estelle Massart

    Abstract: In this paper, we tackle the problem of action recognition using body skeletons extracted from video sequences. Our approach lies in the continuity of recent works representing video frames by Gramian matrices that describe a trajectory on the Riemannian manifold of positive-semidefinite matrices of fixed rank. In comparison with previous works, the manifold of fixed-rank positive-semidefinite mat… ▽ More

    Submitted 9 September, 2019; v1 submitted 1 August, 2019; originally announced August 2019.

    Comments: Updated version of the paper published in the workshop HBU2019. The differences with the published version are a few small corrections, mainly misleading notations for the distance function on p. 4, and missing square root in the expression for "d", in the Thm. on p. 4. Noticeable changes w. r. t. v1 and v2 on arxiv, please use this version instead

  21. Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets

    Authors: Naima Otberdout, Mohamed Daoudi, Anis Kacem, Lahoucine Ballihi, Stefano Berretti

    Abstract: In this work, we propose a novel approach for generating videos of the six basic facial expressions given a neutral face image. We propose to exploit the face geometry by modeling the facial landmarks motion as curves encoded as points on a hypersphere. By proposing a conditional version of manifold-valued Wasserstein generative adversarial network (GAN) for motion generation on the hypersphere, w… ▽ More

    Submitted 28 May, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

  22. arXiv:1904.04297  [pdf, other

    cs.CV

    Learned 3D Shape Representations Using Fused Geometrically Augmented Images: Application to Facial Expression and Action Unit Detection

    Authors: Bilal Taha, Munawar Hayat, Stefano Berretti, Naoufel Werghi

    Abstract: This paper proposes an approach to learn generic multi-modal mesh surface representations using a novel scheme for fusing texture and geometric data. Our approach defines an inverse mapping between different geometric descriptors computed on the mesh surface or its down-sampled version, and the corresponding 2D texture image of the mesh, allowing the construction of fused geometrically augmented i… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

  23. arXiv:1902.03804  [pdf, other

    cs.CV

    Additional Baseline Metrics for the paper "Extended YouTube Faces: a Dataset for Heterogeneous Open-Set Face Identification"

    Authors: Claudio Ferrari, Stefano Berretti, Alberto Del Bimbo

    Abstract: In this report, we provide additional and corrected results for the paper "Extended YouTube Faces: a Dataset for Heterogeneous Open-Set Face Identification". After further investigations, we discovered and corrected wrongly labeled images and incorrect identities. This forced us to re-generate the evaluation protocol for the new data; in doing so, we also reproduced and extended the experimental r… ▽ More

    Submitted 11 February, 2019; originally announced February 2019.

    Comments: 3 pages, 2 figures

  24. arXiv:1810.11392  [pdf, other

    cs.CV

    Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

    Authors: Naima Otberdout, Anis Kacem, Mohamed Daoudi, Lahoucine Ballihi, Stefano Berretti

    Abstract: In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matri… ▽ More

    Submitted 4 December, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: A preliminary version of this work appeared in "Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159." arXiv admin note: substantial text overlap with arXiv:1805.03869

  25. arXiv:1807.00676  [pdf, other

    cs.CV

    A Novel Geometric Framework on Gram Matrix Trajectories for Human Behavior Understanding

    Authors: Anis Kacem, Mohamed Daoudi, Boulbaba Ben Amor, Stefano Berretti, Juan Carlos Alvarez-Paiva

    Abstract: In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparin… ▽ More

    Submitted 29 June, 2018; originally announced July 2018.

    Comments: Under minor revisions in IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI). A preliminary version of this work appeared in ICCV 17 (A Kacem, M Daoudi, BB Amor, JC Alvarez-Paiva, A Novel Space-Time Representation on the Positive Semidefinite Cone for Facial Expression Recognition, ICCV 17). arXiv admin note: substantial text overlap with arXiv:1707.06440

  26. arXiv:1805.03869  [pdf, other

    cs.CV

    Deep Covariance Descriptors for Facial Expression Recognition

    Authors: Naima Otberdout, Anis Kacem, Mohamed Daoudi, Lahoucine Ballihi, Stefano Berretti

    Abstract: In this paper, covariance matrices are exploited to encode the deep convolutional neural networks (DCNN) features for facial expression recognition. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By performing the classification of the facial expressions using Gaussian kernel on SPD manifold, we show that the covariance descriptors computed on… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

  27. arXiv:1707.07180  [pdf, other

    cs.CV

    Emotion Recognition by Body Movement Representation on the Manifold of Symmetric Positive Definite Matrices

    Authors: Mohamed Daoudi, Stefano Berretti, Pietro Pala, Yvonne Delevoye, Alberto Del Bimbo

    Abstract: Emotion recognition is attracting great interest for its potential application in a multitude of real-life situations. Much of the Computer Vision research in this field has focused on relating emotions to facial expressions, with investigations rarely including more than upper body. In this work, we propose a new scenario, for which emotional states are related to 3D dynamics of the whole body mo… ▽ More

    Submitted 22 July, 2017; originally announced July 2017.

    Comments: accepted in I19th International Conference on Image Analysis and processing (ICIAP), 11-15 september Catania, Italy, 2017