Search | arXiv e-print repository

doi 10.1145/3664647.3689004

MultiMediate'24: Multi-Domain Engagement Estimation

Authors: Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Anna Penzkofer, Dominik Schiller, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling

Abstract: Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'… ▽ More Estimating the momentary level of participant's engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate'24, we present the first challenge addressing multi-domain engagement estimation. As training data, we utilise the NOXI database of dyadic novice-expert interactions. In addition to within-domain test data, we add two new test domains. First, we introduce recordings following the NOXI protocol but covering languages that are not present in the NOXI training data. Second, we collected novel engagement annotations on the MPIIGroupInteraction dataset which consists of group discussions between three to four people. In this way, MultiMediate'24 evaluates the ability of approaches to generalise across factors such as language and cultural background, group size, task, and screen-mediated vs. face-to-face interaction. This paper describes the MultiMediate'24 challenge and presents baseline results. In addition, we discuss selected challenge solutions. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.08256

arXiv:2408.05562 [pdf, other]

What Matters in Autonomous Driving Anomaly Detection: A Weakly Supervised Horizon

Authors: Utkarsh Tiwari, Snehashis Majhi, Michal Balazia, François Brémond

Abstract: Video anomaly detection (VAD) in autonomous driving scenario is an important task, however it involves several challenges due to the ego-centric views and moving camera. Due to this, it remains largely under-explored. While recent developments in weakly-supervised VAD methods have shown remarkable progress in detecting critical real-world anomalies in static camera scenario, the development and va… ▽ More Video anomaly detection (VAD) in autonomous driving scenario is an important task, however it involves several challenges due to the ego-centric views and moving camera. Due to this, it remains largely under-explored. While recent developments in weakly-supervised VAD methods have shown remarkable progress in detecting critical real-world anomalies in static camera scenario, the development and validation of such methods are yet to be explored for moving camera VAD. This is mainly due to existing datasets like DoTA not following training pre-conditions of weakly-supervised learning. In this paper, we aim to promote weakly-supervised method development for autonomous driving VAD. We reorganize the DoTA dataset and aim to validate recent powerful weakly-supervised VAD methods on moving camera scenarios. Further, we provide a detailed analysis of what modifications on state-of-the-art methods can significantly improve the detection performance. Towards this, we propose a "feature transformation block" and through experimentation we show that our propositions can empower existing weakly-supervised VAD methods significantly in improving the VAD in autonomous driving. Our codes/dataset/demo will be released at github.com/ut21/WSAD-Driving △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2308.08256 [pdf, other]

doi 10.1145/3581783.3613851

MultiMediate'23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions

Authors: Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Dominik Schiller, Mohammed Guermal, Dominike Thomas, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling

Abstract: Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes t… ▽ More Automatic analysis of human behaviour is a fundamental prerequisite for the creation of machines that can effectively interact with- and support humans in social interactions. In MultiMediate'23, we address two key human social behaviour analysis tasks for the first time in a controlled challenge: engagement estimation and bodily behaviour recognition in social interactions. This paper describes the MultiMediate'23 challenge and presents novel sets of annotations for both tasks. For engagement estimation we collected novel annotations on the NOvice eXpert Interaction (NOXI) database. For bodily behaviour recognition, we annotated test recordings of the MPIIGroupInteraction corpus with the BBSI annotation scheme. In addition, we present baseline results for both challenge tasks. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: ACM MultiMedia'23

arXiv:2212.03968 [pdf, other]

Multimodal Vision Transformers with Forced Attention for Behavior Analysis

Authors: Tanay Agrawal, Michal Balazia, Philipp Müller, François Brémond

Abstract: Human behavior understanding requires looking at minute details in the large context of a scene containing multiple input modalities. It is necessary as it allows the design of more human-like machines. While transformer approaches have shown great improvements, they face multiple challenges such as lack of data or background noise. To tackle these, we introduce the Forced Attention (FAt) Transfor… ▽ More Human behavior understanding requires looking at minute details in the large context of a scene containing multiple input modalities. It is necessary as it allows the design of more human-like machines. While transformer approaches have shown great improvements, they face multiple challenges such as lack of data or background noise. To tackle these, we introduce the Forced Attention (FAt) Transformer which utilize forced attention with a modified backbone for input encoding and a use of additional inputs. In addition to improving the performance on different tasks and inputs, the modification requires less time and memory resources. We provide a model for a generalised feature extraction for tasks concerning social signals and behavior analysis. Our focus is on understanding behavior in videos where people are interacting with each other or talking into the camera which simulates the first person point of view in social interaction. FAt Transformers are applied to two downstream tasks: personality recognition and body language recognition. We achieve state-of-the-art results for Udiva v0.5, First Impressions v2 and MPII Group Interaction datasets. We further provide an extensive ablation study of the proposed architecture. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: Preprint. Full paper accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, Jan 2023. 11 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:2207.12817 [pdf, other]

doi 10.1145/3503161.3548363

Bodily Behaviors in Social Interaction: Novel Annotations and State-of-the-Art Evaluation

Authors: Michal Balazia, Philipp Müller, Ákos Levente Tánczos, August von Liechtenstein, François Brémond

Abstract: Body language is an eye-catching social signal and its automatic analysis can significantly advance artificial intelligence systems to understand and actively participate in social interactions. While computer vision has made impressive progress in low-level tasks like head and body pose estimation, the detection of more subtle behaviors such as gesturing, grooming, or fumbling is not well explore… ▽ More Body language is an eye-catching social signal and its automatic analysis can significantly advance artificial intelligence systems to understand and actively participate in social interactions. While computer vision has made impressive progress in low-level tasks like head and body pose estimation, the detection of more subtle behaviors such as gesturing, grooming, or fumbling is not well explored. In this paper we present BBSI, the first set of annotations of complex Bodily Behaviors embedded in continuous Social Interactions in a group setting. Based on previous work in psychology, we manually annotated 26 hours of spontaneous human behavior in the MPIIGroupInteraction dataset with 15 distinct body language classes. We present comprehensive descriptive statistics on the resulting dataset as well as results of annotation quality evaluations. For automatic detection of these behaviors, we adapt the Pyramid Dilated Attention Network (PDAN), a state-of-the-art approach for human action detection. We perform experiments using four variants of spatial-temporal features as input to PDAN: Two-Stream Inflated 3D CNN, Temporal Segment Networks, Temporal Shift Module and Swin Transformer. Results are promising and indicate a great room for improvement in this difficult task. Representing a key piece in the puzzle towards automatic understanding of social behavior, BBSI is fully available to the research community. △ Less

Submitted 7 December, 2022; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: Preprint. Full paper accepted at the ACM International Conference on Multimedia (ACMMM), Lisbon, Portugal, October 2022. 10 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:2206.06714 [pdf, other]

Interpretable Gait Recognition by Granger Causality

Authors: Michal Balazia, Katerina Hlavackova-Schindler, Petr Sojka, Claudia Plant

Abstract: Which joint interactions in the human gait cycle can be used as biometric characteristics? Most current methods on gait recognition suffer from the lack of interpretability. We propose an interpretable feature representation of gait sequences by the graphical Granger causal inference. Gait sequence of a person in the standardized motion capture format, constituting a set of 3D joint spatial trajec… ▽ More Which joint interactions in the human gait cycle can be used as biometric characteristics? Most current methods on gait recognition suffer from the lack of interpretability. We propose an interpretable feature representation of gait sequences by the graphical Granger causal inference. Gait sequence of a person in the standardized motion capture format, constituting a set of 3D joint spatial trajectories, is envisaged as a causal system of joints interacting in time. We apply the graphical Granger model (GGM) to obtain the so-called Granger causal graph among joints as a discriminative and visually interpretable representation of a person's gait. We evaluate eleven distance functions in the GGM feature space by established classification and class-separability evaluation metrics. Our experiments indicate that, depending on the metric, the most appropriate distance functions for the GGM are the total norm distance and the Ky-Fan 1-norm distance. Experiments also show that the GGM is able to detect the most discriminative joint interactions and that it outperforms five related interpretable models in correct classification rate and in Davies-Bouldin index. The proposed GGM model can serve as a complementary tool for gait analysis in kinesiology or for gait recognition in video surveillance. △ Less

Submitted 7 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Preprint. Full paper accepted at the IEEE/IAPR International Conference on Pattern Recognition (ICPR), Montreal, Canada, August 2022. 7 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:2112.12180 [pdf, other]

doi 10.5220/0010841400003124

Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding

Authors: Tanay Agrawal, Dhruv Agarwal, Michal Balazia, Neelabh Sinha, Francois Bremond

Abstract: Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, w… ▽ More Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, we propose the use of behaviour encoding which boosts performance with minimal change to the model. Cross-attention using transformers has become popular in recent times and is utilised for fusion of different modalities. Since long term relations may exist, breaking the input into chunks is not desirable, thus the proposed model processes the entire input together. Our experiments show the importance of each of the above contributions △ Less

Submitted 12 January, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: Preprint. Final paper accepted at the 17th International Conference on Computer Vision Theory and Applications (VISAPP), virtual, February, 2022. 8 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:2110.04828 [pdf, other]

doi 10.1109/AVSS52988.2021.9663816

FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

Authors: Neelabh Sinha, Michal Balazia, Francois Bremond

Abstract: 3D gaze estimation is about predicting the line of sight of a person in 3D space. Person-independent models for the same lack precision due to anatomical differences of subjects, whereas person-specific calibrated techniques add strict constraints on scalability. To overcome these issues, we propose a novel technique, Facial Landmark Heatmap Activated Multimodal Gaze Estimation (FLAME), as a way o… ▽ More 3D gaze estimation is about predicting the line of sight of a person in 3D space. Person-independent models for the same lack precision due to anatomical differences of subjects, whereas person-specific calibrated techniques add strict constraints on scalability. To overcome these issues, we propose a novel technique, Facial Landmark Heatmap Activated Multimodal Gaze Estimation (FLAME), as a way of combining eye anatomical information using eye landmark heatmaps to obtain precise gaze estimation without any person-specific calibration. Our evaluation demonstrates a competitive performance of about 10% improvement on benchmark datasets ColumbiaGaze and EYEDIAP. We also conduct an ablation study to validate our method. △ Less

Submitted 7 December, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

Comments: Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), virtual, November 2021. 8 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:2102.04965 [pdf, other]

doi 10.1109/ICPR48806.2021.9412446

How Unique Is a Face: An Investigative Study

Authors: Michal Balazia, S L Happy, Francois Bremond, Antitza Dantcheva

Abstract: Face recognition has been widely accepted as a means of identification in applications ranging from border control to security in the banking sector. Surprisingly, while widely accepted, we still lack the understanding of uniqueness or distinctiveness of faces as biometric modality. In this work, we study the impact of factors such as image resolution, feature representation, database size, age an… ▽ More Face recognition has been widely accepted as a means of identification in applications ranging from border control to security in the banking sector. Surprisingly, while widely accepted, we still lack the understanding of uniqueness or distinctiveness of faces as biometric modality. In this work, we study the impact of factors such as image resolution, feature representation, database size, age and gender on uniqueness denoted by the Kullback-Leibler divergence between genuine and impostor distributions. Towards understanding the impact, we present experimental results on the datasets AT&T, LFW, IMDb-Face, as well as ND-TWINS, with the feature extraction algorithms VGGFace, VGG16, ResNet50, InceptionV3, MobileNet and DenseNet121, that reveal the quantitative impact of the named factors. While these are early results, our findings indicate the need for a better understanding of the concept of biometric uniqueness and its implication on face recognition. △ Less

Submitted 7 December, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

Comments: Preprint. Full paper accepted at the IEEE/IAPR International Conference on Pattern Recognition (ICPR), Milan, Italy, January 2021. 6 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:1708.07755 [pdf, other]

doi 10.1145/3152124

Gait Recognition from Motion Capture Data

Authors: Michal Balazia, Petr Sojka

Abstract: Gait recognition from motion capture data, as a pattern classification discipline, can be improved by the use of machine learning. This paper contributes to the state-of-the-art with a statistical approach for extracting robust gait features directly from raw data by a modification of Linear Discriminant Analysis with Maximum Margin Criterion. Experiments on the CMU MoCap database show that the su… ▽ More Gait recognition from motion capture data, as a pattern classification discipline, can be improved by the use of machine learning. This paper contributes to the state-of-the-art with a statistical approach for extracting robust gait features directly from raw data by a modification of Linear Discriminant Analysis with Maximum Margin Criterion. Experiments on the CMU MoCap database show that the suggested method outperforms thirteen relevant methods based on geometric features and a method to learn the features by a combination of Principal Component Analysis and Linear Discriminant Analysis. The methods are evaluated in terms of the distribution of biometric templates in respective feature spaces expressed in a number of class separability coefficients and classification metrics. Results also indicate a high portability of learned features, that means, we can learn what aspects of walk people generally differ in and extract those as general gait features. Recognizing people without needing group-specific features is convenient as particular people might not always provide annotated learning data. As a contribution to reproducible research, our evaluation framework and database have been made publicly available. This research makes motion capture technology directly applicable for human recognition. △ Less

Submitted 7 December, 2022; v1 submitted 24 August, 2017; originally announced August 2017.

Comments: Preprint. Full paper accepted at the ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), special issue on Representation, Analysis and Recognition of 3D Humans, February 2018. 18 pages. arXiv admin note: substantial text overlap with arXiv:1701.00995, arXiv:1609.04392, arXiv:1609.06936

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:1706.09443 [pdf, other]

doi 10.1109/BTAS.2017.8272700

You Are How You Walk: Uncooperative MoCap Gait Identification for Video Surveillance with Incomplete and Noisy Data

Authors: Michal Balazia, Petr Sojka

Abstract: This work offers a design of a video surveillance system based on a soft biometric -- gait identification from MoCap data. The main focus is on two substantial issues of the video surveillance scenario: (1) the walkers do not cooperate in providing learning data to establish their identities and (2) the data are often noisy or incomplete. We show that only a few examples of human gait cycles are r… ▽ More This work offers a design of a video surveillance system based on a soft biometric -- gait identification from MoCap data. The main focus is on two substantial issues of the video surveillance scenario: (1) the walkers do not cooperate in providing learning data to establish their identities and (2) the data are often noisy or incomplete. We show that only a few examples of human gait cycles are required to learn a projection of raw MoCap data onto a low-dimensional sub-space where the identities are well separable. Latent features learned by Maximum Margin Criterion (MMC) method discriminate better than any collection of geometric features. The MMC method is also highly robust to noisy data and works properly even with only a fraction of joints tracked. The overall workflow of the design is directly applicable for a day-to-day operation based on the available MoCap technology and algorithms for gait analysis. In the concept we introduce, a walker's identity is represented by a cluster of gait data collected at their incidents within the surveillance system: They are how they walk. △ Less

Submitted 7 December, 2022; v1 submitted 28 June, 2017; originally announced June 2017.

Comments: Preprint. Full paper accepted at the IEEE/IAPR International Joint Conference on Biometrics (IJCB), Denver, USA, October 2017. 8 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:1701.00995 [pdf, other]

doi 10.1007/978-3-319-56414-2_3

An Evaluation Framework and Database for MoCap-Based Gait Recognition Methods

Authors: Michal Balazia, Petr Sojka

Abstract: As a contribution to reproducible research, this paper presents a framework and a database to improve the development, evaluation and comparison of methods for gait recognition from motion capture (MoCap) data. The evaluation framework provides implementation details and source codes of state-of-the-art human-interpretable geometric features as well as our own approaches where gait features are le… ▽ More As a contribution to reproducible research, this paper presents a framework and a database to improve the development, evaluation and comparison of methods for gait recognition from motion capture (MoCap) data. The evaluation framework provides implementation details and source codes of state-of-the-art human-interpretable geometric features as well as our own approaches where gait features are learned by a modification of Fisher's Linear Discriminant Analysis with the Maximum Margin Criterion, and by a combination of Principal Component Analysis and Linear Discriminant Analysis. It includes a description and source codes of a mechanism for evaluating four class separability coefficients of feature space and four rank-based classifier performance metrics. This framework also contains a tool for learning a custom classifier and for classifying a custom query on a custom gallery. We provide an experimental database along with source codes for its extraction from the general CMU MoCap database. △ Less

Submitted 7 December, 2022; v1 submitted 4 January, 2017; originally announced January 2017.

Comments: Preprint. Full paper published at the 1st IAPR Workshop on Proceedings of Reproducible Research in Pattern Recognition (RRPR), Cancun, Mexico, December 2016. 13 pages. arXiv admin note: text overlap with arXiv:1609.06936

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:1609.06936 [pdf, other]

doi 10.1007/978-3-319-49055-7_28

Walker-Independent Features for Gait Recognition from Motion Capture Data

Authors: Michal Balazia, Petr Sojka

Abstract: MoCap-based human identification, as a pattern recognition discipline, can be optimized using a machine learning approach. Yet in some applications such as video surveillance new identities can appear on the fly and labeled data for all encountered people may not always be available. This work introduces the concept of learning walker-independent gait features directly from raw joint coordinates b… ▽ More MoCap-based human identification, as a pattern recognition discipline, can be optimized using a machine learning approach. Yet in some applications such as video surveillance new identities can appear on the fly and labeled data for all encountered people may not always be available. This work introduces the concept of learning walker-independent gait features directly from raw joint coordinates by a modification of the Fisher Linear Discriminant Analysis with Maximum Margin Criterion. Our new approach shows not only that these features can discriminate different people than who they are learned on, but also that the number of learning identities can be much smaller than the number of walkers encountered in the real operation. △ Less

Submitted 7 December, 2022; v1 submitted 22 September, 2016; originally announced September 2016.

Comments: Preprint. Full paper published at the Joint IAPR International Workshops on Structural and Syntactic Pattern Recognition and Statistical Techniques in Pattern Recognition (S+SSPR), Merida, Mexico, November 2016. 11 pages. arXiv admin note: substantial text overlap with arXiv:1609.04392

MSC Class: 68T05; 68T10 ACM Class: I.5

arXiv:1609.04392 [pdf, other]

doi 10.1109/ICPR.2016.7899750

Learning Robust Features for Gait Recognition by Maximum Margin Criterion

Authors: Michal Balazia, Petr Sojka

Abstract: In the field of gait recognition from motion capture data, designing human-interpretable gait features is a common practice of many fellow researchers. To refrain from ad-hoc schemes and to find maximally discriminative features we may need to explore beyond the limits of human interpretability. This paper contributes to the state-of-the-art with a machine learning approach for extracting robust g… ▽ More In the field of gait recognition from motion capture data, designing human-interpretable gait features is a common practice of many fellow researchers. To refrain from ad-hoc schemes and to find maximally discriminative features we may need to explore beyond the limits of human interpretability. This paper contributes to the state-of-the-art with a machine learning approach for extracting robust gait features directly from raw joint coordinates. The features are learned by a modification of Linear Discriminant Analysis with Maximum Margin Criterion so that the identities are maximally separated and, in combination with an appropriate classifier, used for gait recognition. Experiments on the CMU MoCap database show that this method outperforms eight other relevant methods in terms of the distribution of biometric templates in respective feature spaces expressed in four class separability coefficients. Additional experiments indicate that this method is a leading concept for rank-based classifier systems. △ Less

Submitted 7 December, 2022; v1 submitted 14 September, 2016; originally announced September 2016.

Comments: Preprint. Full paper published at the 23rd IEEE/IAPR International Conference on Pattern Recognition (ICPR), Cancun, Mexico, December 2016. 6 pages

MSC Class: 68T05; 68T10 ACM Class: I.5

Showing 1–14 of 14 results for author: Balazia, M