Search | arXiv e-print repository

doi 10.1007/s11390-024-2934-x

Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance

Authors: Manuel Milling, Shuo Liu, Andreas Triantafyllopoulos, Ilhan Aslan, Björn W. Schuller

Abstract: Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solu… ▽ More Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and non-speech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios (SNRs), for a wide range of computer audition tasks in everyday-life noisy environments. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2406.02251 [pdf, other]

Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

Authors: Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller

Abstract: Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by… ▽ More Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 Findings. arXiv admin note: text overlap with arXiv:2212.11382

arXiv:2302.00984 [pdf]

doi 10.55612/s-5002-058-007

How to Compliment a Human -- Designing Affective and Well-being Promoting Conversational Things

Authors: Ilhan Aslan, Dominik Neu, Daniela Neupert, Stefan Grafberger, Nico Weise, Pascal Pfeil, Maximilian Kuschewski

Abstract: With today's technologies it seems easier than ever to augment everyday things with the ability to perceive their environment and to talk to users. Considering conversational user interfaces, tremendous progress has already been made in designing and evaluating task oriented conversational interfaces, such as voice assistants for ordering food, booking a flight etc. However, it is still very chall… ▽ More With today's technologies it seems easier than ever to augment everyday things with the ability to perceive their environment and to talk to users. Considering conversational user interfaces, tremendous progress has already been made in designing and evaluating task oriented conversational interfaces, such as voice assistants for ordering food, booking a flight etc. However, it is still very challenging to design smart things that can have with their users an informal conversation and emotional exchange, which requires the smart thing to master the usage of social everyday utterances, using irony and sarcasm, delivering good compliments, etc. In this paper, we focus on the experience design of compliments and the Complimenting Mirror design. The paper reports in detail on three phases of a human-centered design process including a Wizard of Oz study in the lab with 24 participants to explore and identify the effect of different compliment types on user experiences and a consequent field study with 105 users in an architecture museum with a fully functional installation of the Complimenting Mirror. In our analyses we argue why and how a "smart" mirror should compliment users and provide a theorization applicable for affective interaction design with things in more general. We focus on subjective user feedback including user concerns and prepositions of receiving compliments from a thing and on observations of real user behavior in the field i.e. transitions of bodily affective expressions comparing affective user states before, during, and after compliment delivery. Our research shows that compliment design matters significantly and using the right type of compliments in our final design in the field test, we succeed in achieving reactive expressions of positive emotions, "sincere" smiles and laughter, even from the seemingly sternest users. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: 28 pages and about 10 figures, journal format for a future submission

Journal ref: International Journal on Interaction Design & Architecture(s) - IxD&A 2024

arXiv:2301.11288 [pdf, other]

doi 10.3934/mbe.2022565

Classification of vertices on social networks by multiple approaches

Authors: Hacı İsmail Aslan, Chang Choi, Hoon Ko

Abstract: Due to the advent of the expressions of data other than tabular formats, the topological compositions which make samples interrelated came into prominence. Analogically, those networks can be interpreted as social connections, dataflow maps, citation influence graphs, protein bindings, etc. However, in the case of social networks, it is highly crucial to evaluate the labels of discrete communities… ▽ More Due to the advent of the expressions of data other than tabular formats, the topological compositions which make samples interrelated came into prominence. Analogically, those networks can be interpreted as social connections, dataflow maps, citation influence graphs, protein bindings, etc. However, in the case of social networks, it is highly crucial to evaluate the labels of discrete communities. The reason underneath for such a study is the non-negligible importance of analyzing graph networks to partition the vertices by using the topological features of network graphs, solely. For each of these interaction-based entities, a social graph, a mailing dataset, and two citation sets are selected as the testbench repositories. This paper, it was not only assessed the most valuable method but also determined how graph neural networks work and the need to improve against non-neural network approaches which are faster and computationally cost-effective. Also, this paper showed a limit to be excesses by prospective graph neural network variations by using the topological features of networks trialed. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: This is a paper whose final and definite form is published open access by 'Mathematical Biosciences and Engineering' (ISSN: 1551-0018)

Journal ref: Math. Biosci. Eng. 19 (2022), no. 12, 12146-12159

arXiv:2212.11382 [pdf, other]

Automatic Emotion Modelling in Written Stories

Authors: Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller

Abstract: Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address th… ▽ More Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2203.15873 [pdf, other]

An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion

Authors: Zijiang Yang, Xin Jing, Andreas Triantafyllopoulos, Meishu Song, Ilhan Aslan, Björn W. Schuller

Abstract: Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output se… ▽ More Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output sequences of differing length. Sequence-to-sequence modelling is recently emerging as a competitive paradigm for models that can overcome those challenges. In an attempt to stimulate further research in this promising new direction, recent sequence-to-sequence EVC papers were systematically investigated and reviewed from six perspectives: their motivation, training strategies, model architectures, datasets, model inputs, and evaluation methods. This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art. Finally, we discuss existing challenges of sequence-to-sequence EVC. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Submitted to INTERSPEECH 2022

arXiv:2110.00866 [pdf]

A Case Study to Reveal if an Area of Interest has a Trend in Ongoing Tweets Using Word and Sentence Embeddings

Authors: İsmail Aslan, Yücel Topçu

Abstract: In the field of Natural Language Processing, information extraction from texts has been the objective of many researchers for years. Many different techniques have been applied in order to reveal the opinion that a tweet might have, thus understanding the sentiment of the small writing up to 280 characters. Other than figuring out the sentiment of a tweet, a study can also focus on finding the cor… ▽ More In the field of Natural Language Processing, information extraction from texts has been the objective of many researchers for years. Many different techniques have been applied in order to reveal the opinion that a tweet might have, thus understanding the sentiment of the small writing up to 280 characters. Other than figuring out the sentiment of a tweet, a study can also focus on finding the correlation of the tweets with a certain area of interest, which constitutes the purpose of this study. In order to reveal if an area of interest has a trend in ongoing tweets, we have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores that show the similarity between the daily tweet corpus and the target words representing our area of interest is calculated by using a naïve correlation-based technique without training any Machine Learning Model. The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings computed by Multilanguage Universal Sentence Encoder and showed main opinion stream of the tweets with respect to a certain area of interest, which proves that an ongoing trend of a specific subject on Twitter can easily be captured in almost real time by using the proposed methodology in this study. We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results, whereas using word embeddings requires less computational time than sentence embeddings, thus being more effective. This paper will start with an introduction followed by the background information about the basics, then continue with the explanation of the proposed methodology and later on finish by interpreting the results and concluding the findings. △ Less

Submitted 2 October, 2021; originally announced October 2021.

Comments: 25 pages, 7 figures

ACM Class: I.2.7; I.5.4; J.4

arXiv:2104.10121 [pdf, other]

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Authors: Shahin Amiriparian, Artem Sokolov, Ilhan Aslan, Lukas Christ, Maurice Gerczuk, Tobias Hübner, Dmitry Lamanov, Manuel Milling, Sandra Ottl, Ilya Poduremennykh, Evgeniy Shuranov, Björn W. Schuller

Abstract: Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion recognition per se and in the… ▽ More Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion recognition per se and in the context of fusion with acoustic information exploitation in the age of deep ASR systems. In order to tackle the above issues, we create transcripts from the original speech by applying three modern ASR systems, including an end-to-end model trained with recurrent neural network-transducer loss, a model with connectionist temporal classification loss, and a wav2vec framework for self-supervised learning. Afterwards, we use pre-trained textual models to extract text representations from the ASR outputs and the gold standard. For extraction and learning of acoustic speech features, we utilise openSMILE, openXBoW, DeepSpectrum, and auDeep. Finally, we conduct decision-level fusion on both information streams -- acoustics and linguistics. Using the best development configuration, we achieve state-of-the-art unweighted average recall values of $73.6\,\%$ and $73.8\,\%$ on the speaker-independent development and test partitions of IEMOCAP, respectively. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: 5 pages, 1 figure

ACM Class: I.2.7; I.5.0

arXiv:2103.14852 [pdf, other]

Towards Tool-Support for Interactive-Machine Learning Applications in the Android Ecosystem

Authors: Muhammad Mehran Sunny, Moritz Berghofer, Ilhan Aslan

Abstract: Consumer applications are becoming increasingly smarter and most of them have to run on device ecosystems. Potential benefits are for example enabling cross-device interaction and seamless user experiences. Essential for today's smart solutions with high performance are machine learning models. However, these models are often developed separately by AI engineers for one specific device and do not… ▽ More Consumer applications are becoming increasingly smarter and most of them have to run on device ecosystems. Potential benefits are for example enabling cross-device interaction and seamless user experiences. Essential for today's smart solutions with high performance are machine learning models. However, these models are often developed separately by AI engineers for one specific device and do not consider the challenges and potentials associated with a device ecosystem in which their models have to run. We believe that there is a need for tool-support for AI engineers to address the challenges of implementing, testing, and deploying machine learning models for a next generation of smart interactive consumer applications. This paper presents preliminary results of a series of inquiries, including interviews with AI engineers and experiments for an interactive machine learning use case with a Smartwatch and Smartphone. We identified the themes through interviews and hands-on experience working on our use case and proposed features, such as data collection from sensors and easy testing of the resources consumption of running pre-processing code on the target device, which will serve as tool-support for AI engineers. △ Less

Submitted 27 March, 2021; originally announced March 2021.

Comments: 4 pages

arXiv:2010.05204 [pdf, other]

Towards Somaesthetics Inspired Games: Exploring the Influence of a Mirror Effect on Self-Presentation in a Public Setting

Authors: Fiona Guerin, Alice Rey, Enis Caliskan, Erik Kynast, Andreas Zimmerer, Ilhan Aslan, Elisabeth André

Abstract: We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Base… ▽ More We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Based on an analysis of video recordings of participants we identified, for example significant differences in how often participants smile. Ultimately, presenting a self-image to gamers in a social setting resulted in behavior change, which we argue could be utilized carefully from a Somaesthetics perspective as an experience design feature in future games. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: 11 pages

arXiv:2010.05047 [pdf, other]

Drawing with AI -- Exploring Collaborative Inking Experiences Based on Mid-air Pointing and Reinforcement Learning

Authors: Franziska Geiger, Michelle Martin, Monika Pichlmair, Ilhan Aslan, Hannes Ritschel, Björn Bittner, Elisabeth André

Abstract: Digitalization is changing the nature of tools and materials, which are used in artistic practices in professional and non-professional settings. For example, today it is common that even children express their ideas and explore their creativity by drawing on tablets as digital canvases. While there are many software-based tools, which resemble traditional tools, such as various forms of virtual b… ▽ More Digitalization is changing the nature of tools and materials, which are used in artistic practices in professional and non-professional settings. For example, today it is common that even children express their ideas and explore their creativity by drawing on tablets as digital canvases. While there are many software-based tools, which resemble traditional tools, such as various forms of virtual brushes, erasers, etc. in contrast to traditional materials there is potential in augmenting software-based tools and digital canvases with artificial intelligence. Curious about how it would feel to interact with a digital canvas, which would be in contrast to a traditional canvas dynamic, responsive, and potentially able to continuously adapt to its user's input, we developed a drawing application and conducted a qualitative study with 14 users. In this paper, we describe details of our design process, which lead up to using a k-armed bandit as a simple form of reinforcement learning and a LeapMotion sensor to allow people from all walks of like, old and young to draw on pervasive displays, small and large, positioned near or far. △ Less

Submitted 10 October, 2020; originally announced October 2020.

Comments: 10 pages

ACM Class: H.5.m

arXiv:2005.02304 [pdf, other]

Resonating Experiences of Self and Others enabled by a Tangible Somaesthetic Design

Authors: Ilhan Aslan, Andreas Seiderer, Chi Tai Dang, Simon Rädler, Elisabeth André

Abstract: Digitalization is penetrating every aspect of everyday life including a human's heart beating, which can easily be sensed by wearable sensors and displayed for others to see, feel, and potentially "bodily resonate" with. Previous work in studying human interactions and interaction designs with physiological data, such as a heart's pulse rate, have argued that feeding it back to the users may, for… ▽ More Digitalization is penetrating every aspect of everyday life including a human's heart beating, which can easily be sensed by wearable sensors and displayed for others to see, feel, and potentially "bodily resonate" with. Previous work in studying human interactions and interaction designs with physiological data, such as a heart's pulse rate, have argued that feeding it back to the users may, for example support users' mindfulness and self-awareness during various everyday activities and ultimately support their wellbeing. Inspired by Somaesthetics as a discipline, which focuses on an appreciation of the living body's role in all our experiences, we designed and explored mobile tangible heart beat displays, which enable rich forms of bodily experiencing oneself and others in social proximity. In this paper, we first report on the design process of tangible heart displays and then present results of a field study with 30 pairs of participants. Participants were asked to use the tangible heart displays during watching movies together and report their experience in three different heart display conditions (i.e., displaying their own heart beat, their partner's heart beat, and watching a movie without a heart display). We found, for example that participants reported significant effects in experiencing sensory immersion when they felt their own heart beats compared to the condition without any heart beat display, and that feeling their partner's heart beats resulted in significant effects on social experience. We refer to resonance theory to discuss the results, highlighting the potential of how ubiquitous technology could utilize physiological data to provide resonance in a modern society facing social acceleration. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: 18 pages

Showing 1–12 of 12 results for author: Aslan, I