-
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance
Authors:
Manuel Milling,
Shuo Liu,
Andreas Triantafyllopoulos,
Ilhan Aslan,
Björn W. Schuller
Abstract:
Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solu…
▽ More
Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and non-speech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios (SNRs), for a wide range of computer audition tasks in everyday-life noisy environments.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning
Authors:
Lukas Christ,
Shahin Amiriparian,
Manuel Milling,
Ilhan Aslan,
Björn W. Schuller
Abstract:
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by…
▽ More
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
How to Compliment a Human -- Designing Affective and Well-being Promoting Conversational Things
Authors:
Ilhan Aslan,
Dominik Neu,
Daniela Neupert,
Stefan Grafberger,
Nico Weise,
Pascal Pfeil,
Maximilian Kuschewski
Abstract:
With today's technologies it seems easier than ever to augment everyday things with the ability to perceive their environment and to talk to users. Considering conversational user interfaces, tremendous progress has already been made in designing and evaluating task oriented conversational interfaces, such as voice assistants for ordering food, booking a flight etc. However, it is still very chall…
▽ More
With today's technologies it seems easier than ever to augment everyday things with the ability to perceive their environment and to talk to users. Considering conversational user interfaces, tremendous progress has already been made in designing and evaluating task oriented conversational interfaces, such as voice assistants for ordering food, booking a flight etc. However, it is still very challenging to design smart things that can have with their users an informal conversation and emotional exchange, which requires the smart thing to master the usage of social everyday utterances, using irony and sarcasm, delivering good compliments, etc. In this paper, we focus on the experience design of compliments and the Complimenting Mirror design. The paper reports in detail on three phases of a human-centered design process including a Wizard of Oz study in the lab with 24 participants to explore and identify the effect of different compliment types on user experiences and a consequent field study with 105 users in an architecture museum with a fully functional installation of the Complimenting Mirror. In our analyses we argue why and how a "smart" mirror should compliment users and provide a theorization applicable for affective interaction design with things in more general. We focus on subjective user feedback including user concerns and prepositions of receiving compliments from a thing and on observations of real user behavior in the field i.e. transitions of bodily affective expressions comparing affective user states before, during, and after compliment delivery. Our research shows that compliment design matters significantly and using the right type of compliments in our final design in the field test, we succeed in achieving reactive expressions of positive emotions, "sincere" smiles and laughter, even from the seemingly sternest users.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Classification of vertices on social networks by multiple approaches
Authors:
Hacı İsmail Aslan,
Chang Choi,
Hoon Ko
Abstract:
Due to the advent of the expressions of data other than tabular formats, the topological compositions which make samples interrelated came into prominence. Analogically, those networks can be interpreted as social connections, dataflow maps, citation influence graphs, protein bindings, etc. However, in the case of social networks, it is highly crucial to evaluate the labels of discrete communities…
▽ More
Due to the advent of the expressions of data other than tabular formats, the topological compositions which make samples interrelated came into prominence. Analogically, those networks can be interpreted as social connections, dataflow maps, citation influence graphs, protein bindings, etc. However, in the case of social networks, it is highly crucial to evaluate the labels of discrete communities. The reason underneath for such a study is the non-negligible importance of analyzing graph networks to partition the vertices by using the topological features of network graphs, solely. For each of these interaction-based entities, a social graph, a mailing dataset, and two citation sets are selected as the testbench repositories. This paper, it was not only assessed the most valuable method but also determined how graph neural networks work and the need to improve against non-neural network approaches which are faster and computationally cost-effective. Also, this paper showed a limit to be excesses by prospective graph neural network variations by using the topological features of networks trialed.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Automatic Emotion Modelling in Written Stories
Authors:
Lukas Christ,
Shahin Amiriparian,
Manuel Milling,
Ilhan Aslan,
Björn W. Schuller
Abstract:
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address th…
▽ More
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
Authors:
Zijiang Yang,
Xin Jing,
Andreas Triantafyllopoulos,
Meishu Song,
Ilhan Aslan,
Björn W. Schuller
Abstract:
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output se…
▽ More
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output sequences of differing length. Sequence-to-sequence modelling is recently emerging as a competitive paradigm for models that can overcome those challenges. In an attempt to stimulate further research in this promising new direction, recent sequence-to-sequence EVC papers were systematically investigated and reviewed from six perspectives: their motivation, training strategies, model architectures, datasets, model inputs, and evaluation methods. This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art. Finally, we discuss existing challenges of sequence-to-sequence EVC.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
A Case Study to Reveal if an Area of Interest has a Trend in Ongoing Tweets Using Word and Sentence Embeddings
Authors:
İsmail Aslan,
Yücel Topçu
Abstract:
In the field of Natural Language Processing, information extraction from texts has been the objective of many researchers for years. Many different techniques have been applied in order to reveal the opinion that a tweet might have, thus understanding the sentiment of the small writing up to 280 characters. Other than figuring out the sentiment of a tweet, a study can also focus on finding the cor…
▽ More
In the field of Natural Language Processing, information extraction from texts has been the objective of many researchers for years. Many different techniques have been applied in order to reveal the opinion that a tweet might have, thus understanding the sentiment of the small writing up to 280 characters. Other than figuring out the sentiment of a tweet, a study can also focus on finding the correlation of the tweets with a certain area of interest, which constitutes the purpose of this study. In order to reveal if an area of interest has a trend in ongoing tweets, we have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores that show the similarity between the daily tweet corpus and the target words representing our area of interest is calculated by using a naïve correlation-based technique without training any Machine Learning Model. The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings computed by Multilanguage Universal Sentence Encoder and showed main opinion stream of the tweets with respect to a certain area of interest, which proves that an ongoing trend of a specific subject on Twitter can easily be captured in almost real time by using the proposed methodology in this study. We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results, whereas using word embeddings requires less computational time than sentence embeddings, thus being more effective. This paper will start with an introduction followed by the background information about the basics, then continue with the explanation of the proposed methodology and later on finish by interpreting the results and concluding the findings.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era
Authors:
Shahin Amiriparian,
Artem Sokolov,
Ilhan Aslan,
Lukas Christ,
Maurice Gerczuk,
Tobias Hübner,
Dmitry Lamanov,
Manuel Milling,
Sandra Ottl,
Ilya Poduremennykh,
Evgeniy Shuranov,
Björn W. Schuller
Abstract:
Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion recognition per se and in the…
▽ More
Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion recognition per se and in the context of fusion with acoustic information exploitation in the age of deep ASR systems. In order to tackle the above issues, we create transcripts from the original speech by applying three modern ASR systems, including an end-to-end model trained with recurrent neural network-transducer loss, a model with connectionist temporal classification loss, and a wav2vec framework for self-supervised learning. Afterwards, we use pre-trained textual models to extract text representations from the ASR outputs and the gold standard. For extraction and learning of acoustic speech features, we utilise openSMILE, openXBoW, DeepSpectrum, and auDeep. Finally, we conduct decision-level fusion on both information streams -- acoustics and linguistics. Using the best development configuration, we achieve state-of-the-art unweighted average recall values of $73.6\,\%$ and $73.8\,\%$ on the speaker-independent development and test partitions of IEMOCAP, respectively.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Towards Tool-Support for Interactive-Machine Learning Applications in the Android Ecosystem
Authors:
Muhammad Mehran Sunny,
Moritz Berghofer,
Ilhan Aslan
Abstract:
Consumer applications are becoming increasingly smarter and most of them have to run on device ecosystems. Potential benefits are for example enabling cross-device interaction and seamless user experiences. Essential for today's smart solutions with high performance are machine learning models. However, these models are often developed separately by AI engineers for one specific device and do not…
▽ More
Consumer applications are becoming increasingly smarter and most of them have to run on device ecosystems. Potential benefits are for example enabling cross-device interaction and seamless user experiences. Essential for today's smart solutions with high performance are machine learning models. However, these models are often developed separately by AI engineers for one specific device and do not consider the challenges and potentials associated with a device ecosystem in which their models have to run. We believe that there is a need for tool-support for AI engineers to address the challenges of implementing, testing, and deploying machine learning models for a next generation of smart interactive consumer applications. This paper presents preliminary results of a series of inquiries, including interviews with AI engineers and experiments for an interactive machine learning use case with a Smartwatch and Smartphone. We identified the themes through interviews and hands-on experience working on our use case and proposed features, such as data collection from sensors and easy testing of the resources consumption of running pre-processing code on the target device, which will serve as tool-support for AI engineers.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
Towards Somaesthetics Inspired Games: Exploring the Influence of a Mirror Effect on Self-Presentation in a Public Setting
Authors:
Fiona Guerin,
Alice Rey,
Enis Caliskan,
Erik Kynast,
Andreas Zimmerer,
Ilhan Aslan,
Elisabeth André
Abstract:
We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Base…
▽ More
We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Based on an analysis of video recordings of participants we identified, for example significant differences in how often participants smile. Ultimately, presenting a self-image to gamers in a social setting resulted in behavior change, which we argue could be utilized carefully from a Somaesthetics perspective as an experience design feature in future games.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Drawing with AI -- Exploring Collaborative Inking Experiences Based on Mid-air Pointing and Reinforcement Learning
Authors:
Franziska Geiger,
Michelle Martin,
Monika Pichlmair,
Ilhan Aslan,
Hannes Ritschel,
Björn Bittner,
Elisabeth André
Abstract:
Digitalization is changing the nature of tools and materials, which are used in artistic practices in professional and non-professional settings. For example, today it is common that even children express their ideas and explore their creativity by drawing on tablets as digital canvases. While there are many software-based tools, which resemble traditional tools, such as various forms of virtual b…
▽ More
Digitalization is changing the nature of tools and materials, which are used in artistic practices in professional and non-professional settings. For example, today it is common that even children express their ideas and explore their creativity by drawing on tablets as digital canvases. While there are many software-based tools, which resemble traditional tools, such as various forms of virtual brushes, erasers, etc. in contrast to traditional materials there is potential in augmenting software-based tools and digital canvases with artificial intelligence. Curious about how it would feel to interact with a digital canvas, which would be in contrast to a traditional canvas dynamic, responsive, and potentially able to continuously adapt to its user's input, we developed a drawing application and conducted a qualitative study with 14 users. In this paper, we describe details of our design process, which lead up to using a k-armed bandit as a simple form of reinforcement learning and a LeapMotion sensor to allow people from all walks of like, old and young to draw on pervasive displays, small and large, positioned near or far.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
Resonating Experiences of Self and Others enabled by a Tangible Somaesthetic Design
Authors:
Ilhan Aslan,
Andreas Seiderer,
Chi Tai Dang,
Simon Rädler,
Elisabeth André
Abstract:
Digitalization is penetrating every aspect of everyday life including a human's heart beating, which can easily be sensed by wearable sensors and displayed for others to see, feel, and potentially "bodily resonate" with. Previous work in studying human interactions and interaction designs with physiological data, such as a heart's pulse rate, have argued that feeding it back to the users may, for…
▽ More
Digitalization is penetrating every aspect of everyday life including a human's heart beating, which can easily be sensed by wearable sensors and displayed for others to see, feel, and potentially "bodily resonate" with. Previous work in studying human interactions and interaction designs with physiological data, such as a heart's pulse rate, have argued that feeding it back to the users may, for example support users' mindfulness and self-awareness during various everyday activities and ultimately support their wellbeing. Inspired by Somaesthetics as a discipline, which focuses on an appreciation of the living body's role in all our experiences, we designed and explored mobile tangible heart beat displays, which enable rich forms of bodily experiencing oneself and others in social proximity. In this paper, we first report on the design process of tangible heart displays and then present results of a field study with 30 pairs of participants. Participants were asked to use the tangible heart displays during watching movies together and report their experience in three different heart display conditions (i.e., displaying their own heart beat, their partner's heart beat, and watching a movie without a heart display). We found, for example that participants reported significant effects in experiencing sensory immersion when they felt their own heart beats compared to the condition without any heart beat display, and that feeling their partner's heart beats resulted in significant effects on social experience. We refer to resonance theory to discuss the results, highlighting the potential of how ubiquitous technology could utilize physiological data to provide resonance in a modern society facing social acceleration.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.