Search | arXiv e-print repository

A compact neuromorphic system for ultra energy-efficient, on-device robot localization

Authors: Adam D. Hines, Michael Milford, Tobias Fischer

Abstract: Neuromorphic computing offers a transformative pathway to overcome the computational and energy challenges faced in deploying robotic localization and navigation systems at the edge. Visual place recognition, a critical component for navigation, is often hampered by the high resource demands of conventional systems, making them unsuitable for small-scale robotic platforms which still require to pe… ▽ More Neuromorphic computing offers a transformative pathway to overcome the computational and energy challenges faced in deploying robotic localization and navigation systems at the edge. Visual place recognition, a critical component for navigation, is often hampered by the high resource demands of conventional systems, making them unsuitable for small-scale robotic platforms which still require to perform complex, long-range tasks. Although neuromorphic approaches offer potential for greater efficiency, real-time edge deployment remains constrained by the complexity and limited scalability of bio-realistic networks. Here, we demonstrate a neuromorphic localization system that performs accurate place recognition in up to 8km of traversal using models as small as 180 KB with 44k parameters, while consuming less than 1% of the energy required by conventional methods. Our Locational Encoding with Neuromorphic Systems (LENS) integrates spiking neural networks, an event-based dynamic vision sensor, and a neuromorphic processor within a single SPECK(TM) chip, enabling real-time, energy-efficient localization on a hexapod robot. LENS represents the first fully neuromorphic localization system capable of large-scale, on-device deployment, setting a new benchmark for energy efficient robotic place recognition. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 28 pages, 4 main figures, 4 supplementary figures, 1 supplementary table, and 1 movie. Under review

arXiv:2403.15336 [pdf, other]

Dialogue Understandability: Why are we streaming movies with subtitles?

Authors: Helard Becerra Martinez, Alessandro Ragano, Diptasree Debnath, Asad Ullah, Crisron Rudolf Lucas, Martin Walsh, Andrew Hines

Abstract: Watching movies and TV shows with subtitles enabled is not simply down to audibility or speech intelligibility. A variety of evolving factors related to technological advances, cinema production and social behaviour challenge our perception and understanding. This study seeks to formalise and give context to these influential factors under a wider and novel term referred to as Dialogue Understanda… ▽ More Watching movies and TV shows with subtitles enabled is not simply down to audibility or speech intelligibility. A variety of evolving factors related to technological advances, cinema production and social behaviour challenge our perception and understanding. This study seeks to formalise and give context to these influential factors under a wider and novel term referred to as Dialogue Understandability. We propose a working definition for Dialogue Understandability being a listener's capacity to follow the story without undue cognitive effort or concentration being required that impacts their Quality of Experience (QoE). The paper identifies, describes and categorises the factors that influence Dialogue Understandability mapping them over the QoE framework, a media streaming lifecycle, and the stakeholders involved. We then explore available measurement tools in the literature and link them to the factors they could potentially be used for. The maturity and suitability of these tools is evaluated over a set of pilot experiments. Finally, we reflect on the gaps that still need to be filled, what we can measure and what not, future subjective experiments, and new research trends that could help us to fully characterise Dialogue Understandability. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2309.16284 [pdf, other]

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment

Authors: Alessandro Ragano, Jan Skoglund, Andrew Hines

Abstract: This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature embeddings via a triplet loss guided by the Neurogram Similarity Index Measure (NSIM) to capture degradation intensity. During inference, the similarity score bet… ▽ More This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature embeddings via a triplet loss guided by the Neurogram Similarity Index Measure (NSIM) to capture degradation intensity. During inference, the similarity score between any two audio samples is computed through Euclidean distance of their embeddings. NOMAD is fully unsupervised and can be used in general perceptual audio tasks for audio analysis e.g. quality assessment and generative tasks such as speech enhancement and speech synthesis. The proposed method is evaluated with 3 tasks. Ranking degradation intensity, predicting speech quality, and as a loss function for speech enhancement. Results indicate NOMAD outperforms other non-matching reference approaches in both ranking degradation intensity and quality assessment, exhibiting competitive performance with full-reference audio metrics. NOMAD demonstrates a promising technique that mimics human capabilities in assessing audio quality with non-matching references to learn perceptual embeddings without the need for human-generated labels. △ Less

Submitted 19 January, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Accepted for ICASSP 2024

arXiv:2309.12763 [pdf, other]

Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models

Authors: Asad Ullah, Alessandro Ragano, Andrew Hines

Abstract: Self-supervised representation learning (SSRL) has demonstrated superior performance than supervised models for tasks including phoneme recognition. Training SSRL models poses a challenge for low-resource languages where sufficient pre-training data may not be available. A common approach is cross-lingual pre-training. Instead, we propose to use audio augmentation techniques, namely: pitch variati… ▽ More Self-supervised representation learning (SSRL) has demonstrated superior performance than supervised models for tasks including phoneme recognition. Training SSRL models poses a challenge for low-resource languages where sufficient pre-training data may not be available. A common approach is cross-lingual pre-training. Instead, we propose to use audio augmentation techniques, namely: pitch variation, noise addition, accented target language and other language speech to pre-train SSRL models in a low resource condition and evaluate phoneme recognition. Our comparisons found that a combined synthetic augmentations (noise/pitch) strategy outperformed accent and language knowledge transfer. Furthermore, we examined the scaling factor of augmented data to achieve equivalent performance to model pre-trained with target domain speech. Our findings suggest that for resource-constrained languages, combined augmentations can be a viable option than other augmentations. △ Less

Submitted 28 June, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: Paper accepted in Interspeech2024

arXiv:2309.10225 [pdf, other]

VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition

Authors: Adam D. Hines, Peter G. Stratton, Michael Milford, Tobias Fischer

Abstract: Spiking Neural Networks (SNNs) are at the forefront of neuromorphic computing thanks to their potential energy-efficiency, low latencies, and capacity for continual learning. While these capabilities are well suited for robotics tasks, SNNs have seen limited adaptation in this field thus far. This work introduces a SNN for Visual Place Recognition (VPR) that is both trainable within minutes and qu… ▽ More Spiking Neural Networks (SNNs) are at the forefront of neuromorphic computing thanks to their potential energy-efficiency, low latencies, and capacity for continual learning. While these capabilities are well suited for robotics tasks, SNNs have seen limited adaptation in this field thus far. This work introduces a SNN for Visual Place Recognition (VPR) that is both trainable within minutes and queryable in milliseconds, making it well suited for deployment on compute-constrained robotic systems. Our proposed system, VPRTempo, overcomes slow training and inference times using an abstracted SNN that trades biological realism for efficiency. VPRTempo employs a temporal code that determines the timing of a single spike based on a pixel's intensity, as opposed to prior SNNs relying on rate coding that determined the number of spikes; improving spike efficiency by over 100%. VPRTempo is trained using Spike-Timing Dependent Plasticity and a supervised delta learning rule enforcing that each output spiking neuron responds to just a single place. We evaluate our system on the Nordland and Oxford RobotCar benchmark localization datasets, which include up to 27k places. We found that VPRTempo's accuracy is comparable to prior SNNs and the popular NetVLAD place recognition algorithm, while being several orders of magnitude faster and suitable for real-time deployment -- with inference speeds over 50 Hz on CPU. VPRTempo could be integrated as a loop closure component for online SLAM on resource-constrained systems such as space and underwater robots. △ Less

Submitted 29 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 8 pages, 3 figures, accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2024

arXiv:2306.08959 [pdf, other]

Statutory Professions in AI governance and their consequences for explainable AI

Authors: Labhaoise NiFhaolain, Andrew Hines, Vivek Nallur

Abstract: Intentional and accidental harms arising from the use of AI have impacted the health, safety and rights of individuals. While regulatory frameworks are being developed, there remains a lack of consensus on methods necessary to deliver safe AI. The potential for explainable AI (XAI) to contribute to the effectiveness of the regulation of AI is being increasingly examined. Regulation must include me… ▽ More Intentional and accidental harms arising from the use of AI have impacted the health, safety and rights of individuals. While regulatory frameworks are being developed, there remains a lack of consensus on methods necessary to deliver safe AI. The potential for explainable AI (XAI) to contribute to the effectiveness of the regulation of AI is being increasingly examined. Regulation must include methods to ensure compliance on an ongoing basis, though there is an absence of practical proposals on how to achieve this. For XAI to be successfully incorporated into a regulatory system, the individuals who are engaged in interpreting/explaining the model to stakeholders should be sufficiently qualified for the role. Statutory professionals are prevalent in domains in which harm can be done to the health, safety and rights of individuals. The most obvious examples are doctors, engineers and lawyers. Those professionals are required to exercise skill and judgement and to defend their decision making process in the event of harm occurring. We propose that a statutory profession framework be introduced as a necessary part of the AI regulatory framework for compliance and monitoring purposes. We will refer to this new statutory professional as an AI Architect (AIA). This AIA would be responsible to ensure the risk of harm is minimised and accountable in the event that harms occur. The AIA would also be relied on to provide appropriate interpretations/explanations of XAI models to stakeholders. Further, in order to satisfy themselves that the models have been developed in a satisfactory manner, the AIA would require models to have appropriate transparency. Therefore it is likely that the introduction of an AIA system would lead to an increase in the use of XAI to enable AIA to discharge their professional obligations. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted for publication at xAI-2023 conference

arXiv:2211.07445 [pdf, other]

Exploring the Impact of Noise and Degradations on Heart Sound Classification Models

Authors: Davoud Shariat Panah, Andrew Hines, Susan McKeever

Abstract: The development of data-driven heart sound classification models has been an active area of research in recent years. To develop such data-driven models in the first place, heart sound signals need to be captured using a signal acquisition device. However, it is almost impossible to capture noise-free heart sound signals due to the presence of internal and external noises in most situations. Such… ▽ More The development of data-driven heart sound classification models has been an active area of research in recent years. To develop such data-driven models in the first place, heart sound signals need to be captured using a signal acquisition device. However, it is almost impossible to capture noise-free heart sound signals due to the presence of internal and external noises in most situations. Such noises and degradations in heart sound signals can potentially reduce the accuracy of data-driven classification models. Although different techniques have been proposed in the literature to address the noise issue, how and to what extent different noise and degradations in heart sound signals impact the accuracy of data-driven classification models remains unexplored. To answer this question, we produced a synthetic heart sound dataset including normal and abnormal heart sounds contaminated with a large variety of noise and degradations. We used this dataset to investigate the impact of noise and degradation in heart sound recordings on the performance of different classification models. The results show different noises and degradations affect the performance of heart sound classification models to a different extent; some are more problematic for classification models, and others are less destructive. Comparing the findings of this study with the results of a survey we previously carried out with a group of clinicians shows noise and degradations that are more detrimental to classification models are also more disruptive to accurate auscultation. The findings of this study can be leveraged to develop targeted heart sound quality enhancement approaches - which adapt the type and aggressiveness of quality enhancement based on the characteristics of noise and degradation in heart sound signals. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Submitted to Computers in Biology and Medicine Journal

arXiv:2210.15310 [pdf, other]

Learning Music Representations with wav2vec 2.0

Authors: Alessandro Ragano, Emmanouil Benetos, Andrew Hines

Abstract: Learning music representations that are general-purpose offers the flexibility to finetune several downstream tasks using smaller datasets. The wav2vec 2.0 speech representation model showed promising results in many downstream speech tasks, but has been less effective when adapted to music. In this paper, we evaluate whether pre-training wav2vec 2.0 directly on music data can be a better solution… ▽ More Learning music representations that are general-purpose offers the flexibility to finetune several downstream tasks using smaller datasets. The wav2vec 2.0 speech representation model showed promising results in many downstream speech tasks, but has been less effective when adapted to music. In this paper, we evaluate whether pre-training wav2vec 2.0 directly on music data can be a better solution instead of finetuning the speech model. We illustrate that when pre-training on music data, the discrete latent representations are able to encode the semantic meaning of musical concepts such as pitch and instrument. Our results show that finetuning wav2vec 2.0 pre-trained on music data allows us to achieve promising results on music classification tasks that are competitive with prior work on audio representations. In addition, the results are superior to the pre-trained model on speech embeddings, demonstrating that wav2vec 2.0 pre-trained on music data can be a promising music representation model. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2209.06358 [pdf, other]

Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset

Authors: Michael Chinen, Jan Skoglund, Chandan K A Reddy, Alessandro Ragano, Andrew Hines

Abstract: Non-reference speech quality models are important for a growing number of applications. The VoiceMOS 2022 challenge provided a dataset of synthetic voice conversion and text-to-speech samples with subjective labels. This study looks at the amount of variance that can be explained in subjective ratings of speech quality from metadata and the distribution imbalances of the dataset. Speech quality mo… ▽ More Non-reference speech quality models are important for a growing number of applications. The VoiceMOS 2022 challenge provided a dataset of synthetic voice conversion and text-to-speech samples with subjective labels. This study looks at the amount of variance that can be explained in subjective ratings of speech quality from metadata and the distribution imbalances of the dataset. Speech quality models were constructed using wav2vec 2.0 with additional metadata features that included rater groups and system identifiers and obtained competitive metrics including a Spearman rank correlation coefficient (SRCC) of 0.934 and MSE of 0.088 at the system-level, and 0.877 and 0.198 at the utterance-level. Using data and metadata that the test restricted or blinded further improved the metrics. A metadata analysis showed that the system-level metrics do not represent the model's system-level prediction as a result of the wide variation in the number of utterances used for each system on the validation and test datasets. We conclude that, in general, conditions should have enough utterances in the test set to bound the sample mean error, and be relatively balanced in utterance count between systems, otherwise the utterance-level metrics may be more reliable and interpretable. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: Preprint; accepted for Interspeech 2022

arXiv:2202.02454 [pdf, other]

doi 10.1109/MCOM.001.2100109

Supervised Learning based QoE Prediction of Video Streaming in Future Networks: A Tutorial with Comparative Study

Authors: Arslan Ahmad, Atif Bin Mansoor, Alcardo Alex Barakabitze, Andrew Hines, Luigi Atzori, Ray Walshe

Abstract: The Quality of Experience (QoE) based service management remains key for successful provisioning of multimedia services in next-generation networks such as 5G/6G, which requires proper tools for quality monitoring, prediction and resource management where machine learning (ML) can play a crucial role. In this paper, we provide a tutorial on the development and deployment of the QoE measurement and… ▽ More The Quality of Experience (QoE) based service management remains key for successful provisioning of multimedia services in next-generation networks such as 5G/6G, which requires proper tools for quality monitoring, prediction and resource management where machine learning (ML) can play a crucial role. In this paper, we provide a tutorial on the development and deployment of the QoE measurement and prediction solutions for video streaming services based on supervised learning ML models. Firstly, we provide a detailed pipeline for developing and deploying supervised learning-based video streaming QoE prediction models which covers several stages including data collection, feature engineering, model optimization and training, testing and prediction and evaluation. Secondly, we discuss the deployment of the ML model for the QoE prediction/measurement in the next generation networks (5G/6G) using network enabling technologies such as Software-Defined Networking (SDN), Network Function Virtualization (NFV) and Mobile Edge Computing (MEC) by proposing reference architecture. Thirdly, we present a comparative study of the state-of-the-art supervised learning ML models for QoE prediction of video streaming applications based on multiple performance metrics. △ Less

Submitted 3 January, 2022; originally announced February 2022.

Journal ref: IEEE Communications Magazine, vol. 59, no. 11, pp. 88-94, November 2021

arXiv:2110.13589 [pdf, other]

doi 10.1145/3524273.3532885

AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics

Authors: Jack Geraghty, Jiazheng Li, Alessandro Ragano, Andrew Hines

Abstract: Audio quality assessment has been widely researched in the signal processing area. Full-reference objective metrics (e.g., POLQA, ViSQOL) have been developed to estimate the audio quality relying only on human rating experiments. To evaluate the audio quality of novel audio processing techniques, researchers constantly need to compare objective quality metrics. Testing different implementations of… ▽ More Audio quality assessment has been widely researched in the signal processing area. Full-reference objective metrics (e.g., POLQA, ViSQOL) have been developed to estimate the audio quality relying only on human rating experiments. To evaluate the audio quality of novel audio processing techniques, researchers constantly need to compare objective quality metrics. Testing different implementations of the same metric and evaluating new datasets are fundamental and ongoing iterative activities. In this paper, we present AQP - an open-source, node-based, light-weight Python pipeline for audio quality assessment. AQP allows researchers to test and compare objective quality metrics helping to improve robustness, reproducibility and development speed. We introduce the platform, explain the motivations, and illustrate with examples how, using AQP, objective quality metrics can be (i) compared and benchmarked; (ii) prototyped and adapted in a modular fashion; (iii) visualised and checked for errors. The code has been shared on GitHub to encourage adoption and contributions from the community. △ Less

Submitted 30 June, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: 6 pages, 3 figures, accepted and presented at ACM MMSys22, June, 2022, Athlone, Ireland

ACM Class: H.5.5; D.2.11; D.2.13

arXiv:2108.08745 [pdf, other]

doi 10.1109/QoMEX51781.2021.9465410

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

Authors: Alessandro Ragano, Emmanouil Benetos, Andrew Hines

Abstract: Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment metrics. In this paper, we propose two multi-task models to tackle the problems above. In the first model, we first learn a feature representation with a degradat… ▽ More Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment metrics. In this paper, we propose two multi-task models to tackle the problems above. In the first model, we first learn a feature representation with a degradation classifier on a large dataset. Then we perform MOS prediction and degradation classification simultaneously on a small dataset annotated with MOS. In the second approach, the initial stage consists of learning features with a deep clustering-based unsupervised feature representation on the large dataset. Next, we perform MOS prediction and cluster label classification simultaneously on a small dataset. The results show that the deep clustering-based model outperforms the degradation classifier-based model and the 3 baselines (autoencoder features, P.563, and SRMRnorm) on TCD-VoIP. This paper indicates that multi-task learning combined with feature representations from unlabelled data is a promising approach to deal with the lack of large MOS annotated datasets. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: Published in 2021 13th International Conference on Quality of Multimedia Experience (QoMEX)

arXiv:2007.07032 [pdf]

QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)

Authors: Andrew Perkis, Christian Timmerer, Sabina Baraković, Jasmina Baraković Husić, Søren Bech, Sebastian Bosse, Jean Botev, Kjell Brunnström, Luis Cruz, Katrien De Moor, Andrea de Polo Saibanti, Wouter Durnez, Sebastian Egger-Lampl, Ulrich Engelke, Tiago H. Falk, Jesús Gutiérrez, Asim Hameed, Andrew Hines, Tanja Kojic, Dragan Kukolj, Eirini Liotou, Dragorad Milovanovic, Sebastian Möller, Niall Murray, Babak Naderi , et al. (19 additional authors not shown)

Abstract: With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions,… ▽ More With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable interdisciplinary differences regarding definitions, scope, and constituents that are required to be addressed so that a coherent understanding of the concepts can be achieved. Such consensus is vital for paving the directionality of the future of immersive media experiences (IMEx) and all related matters. The aim of this white paper is to provide a survey of definitions of immersion and presence which leads to a definition of immersive media experience (IMEx). The Quality of Experience (QoE) for immersive media is described by establishing a relationship between the concepts of QoE and IMEx followed by application areas of immersive media experience. Influencing factors on immersive media experience are elaborated as well as the assessment of immersive media experience. Finally, standardization activities related to IMEx are highlighted and the white paper is concluded with an outlook related to future developments. △ Less

Submitted 24 November, 2020; v1 submitted 10 June, 2020; originally announced July 2020.

arXiv:2006.14750 [pdf, other]

Could regulating the creators deliver trustworthy AI?

Authors: Labhaoise Ni Fhaolain, Andrew Hines

Abstract: Is a new regulated profession, such as Artificial Intelligence (AI) Architect who is responsible and accountable for AI outputs necessary to ensure trustworthy AI? AI is becoming all pervasive and is often deployed in everyday technologies, devices and services without our knowledge. There is heightened awareness of AI in recent years which has brought with it fear. This fear is compounded by the… ▽ More Is a new regulated profession, such as Artificial Intelligence (AI) Architect who is responsible and accountable for AI outputs necessary to ensure trustworthy AI? AI is becoming all pervasive and is often deployed in everyday technologies, devices and services without our knowledge. There is heightened awareness of AI in recent years which has brought with it fear. This fear is compounded by the inability to point to a trustworthy source of AI, however even the term "trustworthy AI" itself is troublesome. Some consider trustworthy AI to be that which complies with relevant laws, while others point to the requirement to comply with ethics and standards (whether in addition to or in isolation of the law). This immediately raises questions of whose ethics and which standards should be applied and whether these are sufficient to produce trustworthy AI in any event. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: To be published in The Second Workshop on Implementing Machine Ethics, Dublin, Ireland, 30 June 2020

arXiv:2004.09584 [pdf, other]

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

Authors: Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman, Andrew Hines

Abstract: Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of both design and usage. As an open source C++ library or binary with permissive licensing, ViSQOL can now be deployed beyond the research context into production… ▽ More Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of both design and usage. As an open source C++ library or binary with permissive licensing, ViSQOL can now be deployed beyond the research context into production usage. The feedback from internal production teams at Google has helped to improve this new release, and serves to show cases where it is most applicable, as well as to highlight limitations. The new model is benchmarked against real-world data for evaluation purposes. The trends and direction of future work is discussed. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)

arXiv:2004.04208 [pdf, other]

doi 10.1109/NetSoft48620.2020.9165497

How Crisp is the Crease? A Subjective Study on Web Browsing Perception of Above-The-Fold

Authors: Hamed Z. Jahromi, Declan T. Delaney, Andrew Hines

Abstract: Quality of Experience (QoE) for various types of websites has gained significant attention in recent years. In order to design and evaluate websites, a metric that can estimate a user's experienced quality robustly for diverse content is necessary. SpeedIndex (SI) has been widely adopted to estimate perceived web page loading progress. It measures the speed of rendering pixels for the webpage that… ▽ More Quality of Experience (QoE) for various types of websites has gained significant attention in recent years. In order to design and evaluate websites, a metric that can estimate a user's experienced quality robustly for diverse content is necessary. SpeedIndex (SI) has been widely adopted to estimate perceived web page loading progress. It measures the speed of rendering pixels for the webpage that is visible in the browser window. This is termed Above-The-Fold (ATF). The influence of animated content on the perception of ATF has been less comprehensively explored. In this paper, we present an experimental design and methodology to measure ATF perception for websites with and without animated elements for various page content categories. We found that pages with animated elements caused people to have more varied perceptions of ATF under different network conditions. Animated content also impacts the page load estimation accuracy of SI for websites. We discuss how the difference in the perception of ATF will impact the QoE management of web applications. We explain the necessity of revisiting the visual assessment of ATF to include the animated contents and improve the robustness of metrics like SI. △ Less

Submitted 8 April, 2020; originally announced April 2020.

arXiv:2003.11882 [pdf, other]

Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders

Authors: Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines

Abstract: This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are contrasted. Performance analysis of the coded speech is evaluated for different quality aspects: accuracy of pitch periods estimation, the word error rates for… ▽ More This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are contrasted. Performance analysis of the coded speech is evaluated for different quality aspects: accuracy of pitch periods estimation, the word error rates for automatic speech recognition, and the influence of speaker gender and coding delays. A number of performance metrics of speech samples taken from a publicly available database were compared with subjective scores. Results from subjective quality assessment do not correlate well with existing full reference speech quality metrics. The results provide valuable insights into aspects of the speech signal that will be used to develop a novel metric to accurately predict speech quality from generative-model-based coders. △ Less

Submitted 26 March, 2020; originally announced March 2020.

Comments: 6 pages, 11 figures, conference

arXiv:2003.11100 [pdf, other]

How deep is your encoder: an analysis of features descriptors for an autoencoder-based audio-visual quality metric

Authors: Helard Martinez, Andrew Hines, Mylene C. Q. Farias

Abstract: The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a… ▽ More The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective. The metric receives two sets of audio and video features descriptors and produces a low-dimensional set of features used to predict the audio-visual quality. A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases. The current work performs an ablation study on the base architecture of the metric. Several modules are removed or re-trained using different configurations to have a better understanding of the metric functionality. The results presented in this study provided important feedback that allows us to understand the real capacity of the metric's architecture and eventually develop a much better audio-visual quality metric. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2003.10914 [pdf, other]

doi 10.1109/QoMEX48832.2020.9123117

You Drive Me Crazy! Interactive QoE Assessment for Telepresence Robot Control

Authors: Hamed Z. Jahromi, Ivan Bartolec, Edwin Gamboa, Andrew Hines, Raimund Schatz

Abstract: Telepresence robots (TPRs) are versatile, remotely controlled vehicles that enable physical presence and human-to-human interaction over a distance. Thanks to improving hardware and dropping price points, TPRs enjoy the growing interest in various industries and application domains. Still, a satisfying experience remains key for their acceptance and successful adoption, not only in terms of enabli… ▽ More Telepresence robots (TPRs) are versatile, remotely controlled vehicles that enable physical presence and human-to-human interaction over a distance. Thanks to improving hardware and dropping price points, TPRs enjoy the growing interest in various industries and application domains. Still, a satisfying experience remains key for their acceptance and successful adoption, not only in terms of enabling remote communication with others, but also in terms of managing robot mobility by means of remote navigation. This paper focuses on the latter aspect of remote operation which has been hitherto neglected. We present the results of an extensive subjective study designed to systematically assess remote navigation Quality of Experience (QoE) in the context of using a TPR live over the Internet. Participants were 'beamed' into a remote office space and asked to perform characteristic TPR remote operation tasks (driving, turning, parking). Visual and control dimensions of their experience were systematically impaired by altering network characteristics (bandwidth, delay and packet loss rate) in a controlled fashion. Our results show that users can differentiate well between visual and navigation/control aspects of their experience. Furthermore, QoE impairment sensitivity varies with the actual task at hand. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2003.09889 [pdf, other]

doi 10.1109/QoMEX48832.2020.9123111

Audio Impairment Recognition Using a Correlation-Based Feature Representation

Authors: Alessandro Ragano, Emmanouil Benetos, Andrew Hines

Abstract: Audio impairment recognition is based on finding noise in audio files and categorising the impairment type. Recently, significant performance improvement has been obtained thanks to the usage of advanced deep learning models. However, feature robustness is still an unresolved issue and it is one of the main reasons why we need powerful deep learning architectures. In the presence of a variety of m… ▽ More Audio impairment recognition is based on finding noise in audio files and categorising the impairment type. Recently, significant performance improvement has been obtained thanks to the usage of advanced deep learning models. However, feature robustness is still an unresolved issue and it is one of the main reasons why we need powerful deep learning architectures. In the presence of a variety of musical styles, hand-crafted features are less efficient in capturing audio degradation characteristics and they are prone to failure when recognising audio impairments and could mistakenly learn musical concepts rather than impairment types. In this paper, we propose a new representation of hand-crafted features that is based on the correlation of feature pairs. We experimentally compare the proposed correlation-based feature representation with a typical raw feature representation used in machine learning and we show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage whilst achieving comparable accuracy. △ Less

Submitted 24 March, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

Comments: This publication has been accepted in 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)

arXiv:2001.11406 [pdf, other]

doi 10.23919/EUSIPCO.2019.8902975

NAViDAd: A No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder

Authors: Helard Martinez, M. C. Farias, A. Hines

Abstract: The development of models for quality prediction of both audio and video signals is a fairly mature field. But, although several multimodal models have been proposed, the area of audio-visual quality prediction is still an emerging area. In fact, despite the reasonable performance obtained by combination and parametric metrics, currently there is no reliable pixel-based audio-visual quality metric… ▽ More The development of models for quality prediction of both audio and video signals is a fairly mature field. But, although several multimodal models have been proposed, the area of audio-visual quality prediction is still an emerging area. In fact, despite the reasonable performance obtained by combination and parametric metrics, currently there is no reliable pixel-based audio-visual quality metric. The approach presented in this work is based on the assumption that autoencoders, fed with descriptive audio and video features, might produce a set of features that is able to describe the complex audio and video interactions. Based on this hypothesis, we propose a No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd). The model visual features are natural scene statistics (NSS) and spatial-temporal measures of the video component. Meanwhile, the audio features are obtained by computing the spectrogram representation of the audio component. The model is formed by a 2-layer framework that includes a deep autoencoder layer and a classification layer. These two layers are stacked and trained to build the deep neural network model. The model is trained and tested using a large set of stimuli, containing representative audio and video artifacts. The model performed well when tested against the UnB-AV and the LiveNetflix-II databases. %Results shows that this type of approach produces quality scores that are highly correlated to subjective quality scores. △ Less

Submitted 4 February, 2020; v1 submitted 30 January, 2020; originally announced January 2020.

Comments: 5 pages

Journal ref: 2019 27th European Signal Processing Conference (EUSIPCO), IEEE, 2019, pp 1-5

arXiv:1912.02802 [pdf]

doi 10.1016/j.comnet.2019.106984

5G network slicing using SDN and NFV- A survey of taxonomy, architectures and future challenges

Authors: Alcardo Alex Barakabitze, Arslan Ahmad, Rashid Mijumbi, Andrew Hines

Abstract: In this paper, we provide a comprehensive review and updated solutions related to 5G network slicing using SDN and NFV. Firstly, we present 5G service quality and business requirements followed by a description of 5G network softwarization and slicing paradigms including essential concepts, history and different use cases. Secondly, we provide a tutorial of 5G network slicing technology enablers i… ▽ More In this paper, we provide a comprehensive review and updated solutions related to 5G network slicing using SDN and NFV. Firstly, we present 5G service quality and business requirements followed by a description of 5G network softwarization and slicing paradigms including essential concepts, history and different use cases. Secondly, we provide a tutorial of 5G network slicing technology enablers including SDN, NFV, MEC, cloud/Fog computing, network hypervisors, virtual machines & containers. Thidly, we comprehensively survey different industrial initiatives and projects that are pushing forward the adoption of SDN and NFV in accelerating 5G network slicing. A comparison of various 5G architectural approaches in terms of practical implementations, technology adoptions and deployment strategies is presented. Moreover, we provide a discussion on various open source orchestrators and proof of concepts representing industrial contribution. The work also investigates the standardization efforts in 5G networks regarding network slicing and softwarization. Additionally, the article presents the management and orchestration of network slices in a single domain followed by a comprehensive survey of management and orchestration approaches in 5G network slicing across multiple domains while supporting multiple tenants. Furthermore, we highlight the future challenges and research directions regarding network softwarization and slicing using SDN and NFV in 5G networks. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: 40 Pages, 22 figures, published in computer networks (Open Access)

MSC Class: 68 (Computer Science)

Journal ref: 2019

Showing 1–22 of 22 results for author: Hines, A