Search | arXiv e-print repository

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models. We show that PRoDeliberation achieves the latency reduction of parallel decoding (2-10x improvement over autoregressive models) while retaining the ability to correct Automatic Speech Recognition (ASR) mistranscriptions of autoregressive deliberation systems. We further show that the design of the denoising training allows PRoDeliberation to overcome the limitations of small ASR devices, and we provide analysis on the necessity of each component of the system. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2209.02448 [pdf, ps, other]

Fast Adaptive Regression-based Model Predictive Control

Authors: Eslam Mostafa, Hussein A. Aly, Ahmed Elliethy

Abstract: Model predictive control (MPC) is an optimal control method that predicts the future states of the system being controlled and estimates the optimal control inputs that drive the predicted states to the required reference. The computations of the MPC are performed at pre-determined sample instances over a finite time horizon. The number of sample instances and the horizon length determine the perf… ▽ More Model predictive control (MPC) is an optimal control method that predicts the future states of the system being controlled and estimates the optimal control inputs that drive the predicted states to the required reference. The computations of the MPC are performed at pre-determined sample instances over a finite time horizon. The number of sample instances and the horizon length determine the performance of the MPC and its computational cost. A long horizon with a large sample count allows the MPC to better estimate the inputs when the states have rapid changes over time, which results in better performance but at the expense of high computational cost. However, this long horizon is not always necessary, especially for slowly-varying states. In this case, a short horizon with less sample count is preferable as the same MPC performance can be obtained but at a fraction of the computational cost. In this paper, we propose an adaptive regression-based MPC that predicts the best minimum horizon length and the sample count from several features extracted from the time-varying changes of the states. The proposed technique builds a synthetic dataset using the system model and utilizes the dataset to train a support vector regressor that performs the prediction. The proposed technique is experimentally compared with several state-of-the-art techniques on both linear and non-linear models. The proposed technique shows a superior reduction in computational time with a reduction of about 35-65\% compared with the other techniques without introducing a noticeable loss in performance. △ Less

Submitted 4 May, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: Accepted for publication in Control Theory and Technology May. 2023

arXiv:2111.06331 [pdf, other]

Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset

Authors: Aly Moustafa, Salah A. Aly

Abstract: Current authentication and trusted systems depend on classical and biometric methods to recognize or authorize users. Such methods include audio speech recognitions, eye, and finger signatures. Recent tools utilize deep learning and transformers to achieve better results. In this paper, we develop a deep learning constructed model for Arabic speakers identification by using Wav2Vec2.0 and HuBERT a… ▽ More Current authentication and trusted systems depend on classical and biometric methods to recognize or authorize users. Such methods include audio speech recognitions, eye, and finger signatures. Recent tools utilize deep learning and transformers to achieve better results. In this paper, we develop a deep learning constructed model for Arabic speakers identification by using Wav2Vec2.0 and HuBERT audio representation learning tools. The end-to-end Wav2Vec2.0 paradigm acquires contextualized speech representations learnings by randomly masking a set of feature vectors, and then applies a transformer neural network. We employ an MLP classifier that is able to differentiate between invariant labeled classes. We show several experimental results that safeguard the high accuracy of the proposed model. The experiments ensure that an arbitrary wave signal for a certain speaker can be identified with 98% and 97.1% accuracies in the cases of Wav2Vec2.0 and HuBERT, respectively. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: 5 pages, 9 figures, 2 tables

arXiv:2001.02048 [pdf, other]

doi 10.1007/s11042-021-10575-y

Flexible Architecture for Real-time Processing of Multiple Video Signals

Authors: Mohamed Awad, Islam T. Abougindia, Ahmed Elliethy, Hussein A. Aly

Abstract: Simultaneous processing of multiple video sources requires each pixel in a frame from a video source to be processed synchronously with the pixels at the same spatial positions in corresponding frames from the other video sources. However, simultaneous processing is challenging as corresponding frames from different video signals provided by multiple sources have time-varying delay because of the… ▽ More Simultaneous processing of multiple video sources requires each pixel in a frame from a video source to be processed synchronously with the pixels at the same spatial positions in corresponding frames from the other video sources. However, simultaneous processing is challenging as corresponding frames from different video signals provided by multiple sources have time-varying delay because of the electrical and mechanical restrictions inside the video sources hardware that cause deviation in the corresponding frame rates. Researchers overcome the aforementioned challenges either by utilizing ready-made video processing systems or designing and implementing a custom system tailored to their specific application. These video processing systems lack flexibility in handling different applications requirements such as the required number of video sources and outputs, video standards, or frame rates of the input/output videos. In this paper, we present a design for a flexible simultaneous video processing architecture that is suitable for various applications. The proposed architecture is upgradeable to deal with multiple video standards, scalable to process/produce a variable number of input/output videos, and compatible with most video processors. Moreover, we present in details the analog/digital mixed-signals and power distribution considerations used in designing the proposed architecture. As a case study application of the proposed flexible architecture, we utilized the architecture for a realization of a simultaneous video processing system that performs video fusion from visible and near-infrared video sources in real time. We make available the source files of the hardware design along with the bill of material (BOM) of the case study to be a reference for researchers who intend to design and implement simultaneous multi-video processing systems. △ Less

Submitted 29 December, 2019; originally announced January 2020.

Comments: 13 pages, 16 figures, 3 tables

Journal ref: Springer Multimedia Tools and Applications (2021)

Showing 1–4 of 4 results for author: Aly, A