-
Semantic segmentation of SEM images of lower bainitic and tempered martensitic steels
Authors:
Xiaohan Bie,
Manoj Arthanari,
Evelin Barbosa de Melo,
Juancheng Li,
Stephen Yue,
Salim Brahimi,
Jun Song
Abstract:
This study employs deep learning techniques to segment scanning electron microscope images, enabling a quantitative analysis of carbide precipitates in lower bainite and tempered martensite steels with comparable strength. Following segmentation, carbides are investigated, and their volume percentage, size distribution, and orientations are probed within the image dataset. Our findings reveal that…
▽ More
This study employs deep learning techniques to segment scanning electron microscope images, enabling a quantitative analysis of carbide precipitates in lower bainite and tempered martensite steels with comparable strength. Following segmentation, carbides are investigated, and their volume percentage, size distribution, and orientations are probed within the image dataset. Our findings reveal that lower bainite and tempered martensite exhibit comparable volume percentages of carbides, albeit with a more uniform distribution of carbides in tempered martensite. Carbides in lower bainite demonstrate a tendency for better alignment than those in tempered martensite, aligning with the observations of other researchers. However, both microstructures display a scattered carbide orientation, devoid of any discernible pattern. Comparative analysis of aspect ratios and sizes of carbides in lower bainite and tempered martensite unveils striking similarities. The deep learning model achieves an impressive pixelwise accuracy of 98.0% in classifying carbide/iron matrix at the individual pixel level. The semantic segmentation derived from deep learning extends its applicability to the analysis of secondary phases in various materials, offering a time-efficient, versatile AI-powered workflow for quantitative microstructure analysis.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Speech Modeling with a Hierarchical Transformer Dynamical VAE
Authors:
Xiaoyu Lin,
Xiaoyu Bie,
Simon Leglaive,
Laurent Girin,
Xavier Alameda-Pineda
Abstract:
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors. In almost all the DVAEs of the literature, the temporal dependencies within each sequence and across the two sequences are modeled with recurrent neural networks. In this paper, we propose to…
▽ More
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors. In almost all the DVAEs of the literature, the temporal dependencies within each sequence and across the two sequences are modeled with recurrent neural networks. In this paper, we propose to model speech signals with the Hierarchical Transformer DVAE (HiT-DVAE), which is a DVAE with two levels of latent variable (sequence-wise and frame-wise) and in which the temporal dependencies are implemented with the Transformer architecture. We show that HiT-DVAE outperforms several other DVAEs for speech spectrogram modeling, while enabling a simpler training procedure, revealing its high potential for downstream low-level speech processing tasks such as speech enhancement.
△ Less
Submitted 10 May, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical VAE
Authors:
Xiaoyu Bie,
Wen Guo,
Simon Leglaive,
Lauren Girin,
Francesc Moreno-Noguer,
Xavier Alameda-Pineda
Abstract:
Studies on the automatic processing of 3D human pose data have flourished in the recent past. In this paper, we are interested in the generation of plausible and diverse future human poses following an observed 3D pose sequence. Current methods address this problem by injecting random variables from a single latent space into a deterministic motion prediction framework, which precludes the inheren…
▽ More
Studies on the automatic processing of 3D human pose data have flourished in the recent past. In this paper, we are interested in the generation of plausible and diverse future human poses following an observed 3D pose sequence. Current methods address this problem by injecting random variables from a single latent space into a deterministic motion prediction framework, which precludes the inherent multi-modality in human motion generation. In addition, previous works rarely explore the use of attention to select which frames are to be used to inform the generation process up to our knowledge. To overcome these limitations, we propose Hierarchical Transformer Dynamical Variational Autoencoder, HiT-DVAE, which implements auto-regressive generation with transformer-like attention mechanisms. HiT-DVAE simultaneously learns the evolution of data and latent space distribution with time correlated probabilistic dependencies, thus enabling the generative model to learn a more complex and time-varying latent space as well as diverse and realistic human motions. Furthermore, the auto-regressive generation brings more flexibility on observation and prediction, i.e. one can have any length of observation and predict arbitrary large sequences of poses with a single pre-trained model. We evaluate the proposed method on HumanEva-I and Human3.6M with various evaluation methods, and outperform the state-of-the-art methods on most of the metrics.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders
Authors:
Xiaoyu Bie,
Simon Leglaive,
Xavier Alameda-Pineda,
Laurent Girin
Abstract:
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech sp…
▽ More
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech spectrograms modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that requires neither noise samples nor noisy speech samples at training time, but only requires clean speech signals. In this paper, we extend these works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. The algorithm is presented with the most general DVAE formulation and is then applied with three specific DVAE models to illustrate the versatility of the framework. Experimental results show that the proposed DVAE-based approach outperforms its VAE-based counterpart, as well as several supervised and unsupervised noise-dependent baselines, especially when the noise type is unseen during training.
△ Less
Submitted 30 September, 2022; v1 submitted 23 June, 2021;
originally announced June 2021.
-
A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling
Authors:
Xiaoyu Bie,
Laurent Girin,
Simon Leglaive,
Thomas Hueber,
Xavier Alameda-Pineda
Abstract:
The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, th…
▽ More
The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called Dynamical Variational Autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.
△ Less
Submitted 14 June, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Multi-Person Extreme Motion Prediction
Authors:
Wen Guo,
Xiaoyu Bie,
Xavier Alameda-Pineda,
Francesc Moreno-Noguer
Abstract:
Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of the…
▽ More
Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons. We propose a novel cross interaction attention mechanism that exploits historical information of both persons, and learns to predict cross dependencies between the two pose sequences. Since no dataset to train such interactive situations is available, we collected ExPI (Extreme Pose Interaction), a new lab-based person interaction dataset of professional dancers performing Lindy-hop dancing actions, which contains 115 sequences with 30K frames annotated with 3D body poses and shapes. We thoroughly evaluate our cross interaction network on ExPI and show that both in short- and long-term predictions, it consistently outperforms state-of-the-art methods for single-person motion prediction.
△ Less
Submitted 19 June, 2022; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Dynamical Variational Autoencoders: A Comprehensive Review
Authors:
Laurent Girin,
Simon Leglaive,
Xiaoyu Bie,
Julien Diard,
Thomas Hueber,
Xavier Alameda-Pineda
Abstract:
Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data vectors are processed independently. Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only…
▽ More
Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data vectors are processed independently. Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models. In this paper, we perform a literature review of these models. We introduce and discuss a general class of models, called dynamical variational autoencoders (DVAEs), which encompasses a large subset of these temporal VAE extensions. Then, we present in detail seven recently proposed DVAE models, with an aim to homogenize the notations and presentation lines, as well as to relate these models with existing classical temporal models. We have reimplemented those seven DVAE models and present the results of an experimental benchmark conducted on the speech analysis-resynthesis task (the PyTorch code is made publicly available). The paper concludes with a discussion on important issues concerning the DVAE class of models and future research guidelines.
△ Less
Submitted 4 July, 2022; v1 submitted 28 August, 2020;
originally announced August 2020.