Data augmentation for Human Activity Recognition with Generative Adversarial Networks

Marcos Lupion; Federico Cruciani; Ian Cleland; Chris Nugent; Pilar M Ortigosa

doi:10.1109/JBHI.2024.3364910

Data augmentation for Human Activity Recognition with Generative Adversarial Networks

IEEE J Biomed Health Inform. 2024 Feb 12:PP. doi: 10.1109/JBHI.2024.3364910. Online ahead of print.

Authors

Marcos Lupion, Federico Cruciani, Ian Cleland, Chris Nugent, Pilar M Ortigosa

PMID: 38345954
DOI: 10.1109/JBHI.2024.3364910

Abstract

Currently, Human Activity Recognition (HAR) applications need a large volume of data to be able to generalize to new users and environments. However, the availability of labeled data is usually limited and the process of recording new data is costly and time-consuming. Synthetically increasing datasets using Generative Adversarial Networks (GANs) has been proposed, outperforming cropping, time-warping, and jittering techniques on raw signals. Incorporating GAN-generated synthetic data into datasets has been demonstrated to improve the accuracy of trained models. Regardless, currently, there is no optimal GAN architecture to generate accelerometry signals, neither a proper evaluation methodology to assess signal quality or accuracy using synthetic data. This work is the first to propose conditional Wasserstein Generative Adversarial Networks (cWGANs) to generate synthetic HAR accelerometry signals. Furthermore, we calculate quality metrics from the literature and study the impact of synthetic data on a large HAR dataset involving 395 users. Results show that i) cWGAN outperforms original Conditional Generative Adversarial Networks (cGANs), being 1D convolutional layers appropriate for generating accelerometry signals, ii) the performance improvement incorporating synthetic data is more significant as the dataset size is smaller, and iii) the quantity of synthetic data required is inversely proportional to the quantity of real data.