Search | arXiv e-print repository

Reproducing the Acoustic Velocity Vectors in a Circular Listening Area

Authors: Jiarui Wang, Thushara Abhayapala, Jihui Aimee Zhang, Prasanga Samarasinghe

Abstract: Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Sweet spots restrict listener's movement, whereas meas… ▽ More Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Sweet spots restrict listener's movement, whereas measuring the acoustic velocity vectors on the boundary requires complicated measurement setup. This paper proposes the cylindrical harmonic coefficients of the acoustic velocity vectors in a circular area (CHV coefficients), which are calculated from the cylindrical harmonic coefficients of the global pressure (global CHP coefficients) by using the sound field translation formula. The global CHP coefficients can be measured by a circular microphone array, which can be bought off-the-shelf. By matching the CHV coefficients, the acoustic velocity vectors are reproduced throughout the listening area. Hence, listener's movements are allowed. Simulations show that at low frequency, where the acoustic velocity vectors are the dominant factor for localization, the proposed reproduction method based on the CHV coefficients results in higher accuracy in reproduced acoustic velocity vectors when compared with traditional method based on the global CHP coefficients. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Submitted to EUSIPCO 2024

arXiv:2309.13819 [pdf, other]

A Two-Step Approach for Narrowband Source Localization in Reverberant Rooms

Authors: Wei-Ting Lai, Lachlan Birnie, Thushara Abhayapala, Amy Bastine, Shaoheng Xu, Prasanga Samarasinghe

Abstract: This paper presents a two-step approach for narrowband source localization within reverberant rooms. The first step involves dereverberation by modeling the homogeneous component of the sound field by an equivalent decomposition of planewaves using Iteratively Reweighted Least Squares (IRLS), while the second step focuses on source localization by modeling the dereverberated component as a sparse… ▽ More This paper presents a two-step approach for narrowband source localization within reverberant rooms. The first step involves dereverberation by modeling the homogeneous component of the sound field by an equivalent decomposition of planewaves using Iteratively Reweighted Least Squares (IRLS), while the second step focuses on source localization by modeling the dereverberated component as a sparse representation of point-source distribution using Orthogonal Matching Pursuit (OMP). The proposed method enhances localization accuracy with fewer measurements, particularly in environments with strong reverberation. A numerical simulation in a conference room scenario, using a uniform microphone array affixed to the wall, demonstrates real-world feasibility. Notably, the proposed method and microphone placement effectively localize sound sources within the 2D-horizontal plane without requiring prior knowledge of boundary conditions and room geometry, making it versatile for application in different room types. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2309.10605 [pdf, other]

An Active Noise Control System Based on Soundfield Interpolation Using a Physics-informed Neural Network

Authors: Yile Angela Zhang, Fei Ma, Thushara Abhayapala, Prasanga Samarasinghe, Amy Bastine

Abstract: Conventional multiple-point active noise control (ANC) systems require placing error microphones within the region of interest (ROI), inconveniencing users. This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. The soundfield within the ROI is interpolated from the microphone signals using a physics-informed neural… ▽ More Conventional multiple-point active noise control (ANC) systems require placing error microphones within the region of interest (ROI), inconveniencing users. This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. The soundfield within the ROI is interpolated from the microphone signals using a physics-informed neural network (PINN). PINN exploits the acoustic wave equation to assist soundfield interpolation under a limited number of monitoring microphones, and demonstrates better interpolation performance than the spherical harmonic method in simulations. An ANC system is designed to take advantage of the interpolated signal to reduce noise signal within the ROI. The PINN-assisted ANC system reduces noise more than that of the multiple-point ANC system in simulations. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.08290 [pdf, other]

Head-Related Transfer Function Interpolation with a Spherical CNN

Authors: Xingyu Chen, Fei Ma, Yile Zhang, Amy Bastine, Prasanga N. Samarasinghe

Abstract: Head-related transfer functions (HRTFs) are crucial for spatial soundfield reproduction in virtual reality applications. However, obtaining personalized, high-resolution HRTFs is a time-consuming and costly task. Recently, deep learning-based methods showed promise in interpolating high-resolution HRTFs from sparse measurements. Some of these methods treat HRTF interpolation as an image super-reso… ▽ More Head-related transfer functions (HRTFs) are crucial for spatial soundfield reproduction in virtual reality applications. However, obtaining personalized, high-resolution HRTFs is a time-consuming and costly task. Recently, deep learning-based methods showed promise in interpolating high-resolution HRTFs from sparse measurements. Some of these methods treat HRTF interpolation as an image super-resolution task, which neglects spatial acoustic features. This paper proposes a spherical convolutional neural network method for HRTF interpolation. The proposed method realizes the convolution process by decomposing and reconstructing HRTF through the Spherical Harmonics (SHs). The SHs, an orthogonal function set defined on a sphere, allow the convolution layers to effectively capture the spatial features of HRTFs, which are sampled on a sphere. Simulation results demonstrate the effectiveness of the proposed method in achieving accurate interpolation from sparse measurements, outperforming the SH method and learning-based methods. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2306.09135 [pdf, other]

Time-Domain Wideband Image Source Method for Spherical Microphone Arrays

Authors: Jiarui Wang, Prasanga Samarasinghe, Thushara Abhayapala, Jihui Aimee Zhang

Abstract: This paper presents the time-domain wideband spherical microphone array impulse response generator (TDW-SMIR generator), which is a time-domain wideband image source method (ISM) for generating the room impulse responses captured by an open spherical microphone array. To incorporate loudspeaker directivity, the TDW-SMIR generator considers a source that emits a sequence of spherical wave fronts wh… ▽ More This paper presents the time-domain wideband spherical microphone array impulse response generator (TDW-SMIR generator), which is a time-domain wideband image source method (ISM) for generating the room impulse responses captured by an open spherical microphone array. To incorporate loudspeaker directivity, the TDW-SMIR generator considers a source that emits a sequence of spherical wave fronts whose amplitudes are related to the loudspeaker directional impulse responses measured in the far-field. The TDW-SMIR generator uses geometric models to derive the time-domain signals recorded by the spherical microphone array. Comparisons are made with frequency-domain single band ISMs. Simulation results prove the results of the TDW-SMIR generator are similar to those of frequency-domain single band ISMs. △ Less

Submitted 9 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted for publication in the IEEE 25th International Workshop on Multimedia Signal Processing (IEEE MMSP 2023)

arXiv:2302.00592 [pdf, other]

doi 10.1109/TENCON55691.2022.9977637

Comparative Study of Parameter Selection for Enhanced Edge Inference for a Multi-Output Regression model for Head Pose Estimation

Authors: Asiri Lindamulage, Nuwan Kodagoda, Shyam Reyal, Pradeepa Samarasinghe, Pratheepan Yogarajah

Abstract: Magnitude-based pruning is a technique used to optimise deep learning models for edge inference. We have achieved over 75% model size reduction with a higher accuracy than the original multi-output regression model for head-pose estimation. Magnitude-based pruning is a technique used to optimise deep learning models for edge inference. We have achieved over 75% model size reduction with a higher accuracy than the original multi-output regression model for head-pose estimation. △ Less

Submitted 28 December, 2022; originally announced February 2023.

Comments: Conference:- in TENCON 2022 - 2022 IEEE Region 10 Conference (TENCON)

Journal ref: TENCON 2022 - 2022 IEEE Region 10 Conference (TENCON), Nov. 2022

arXiv:2212.09027 [pdf, other]

2D Pose Estimation based Child Action Recognition

Authors: Sanka Mohottala, Sandun Abeygunawardana, Pradeepa Samarasinghe, Dharshana Kasthurirathna, Charith Abhayaratne

Abstract: We present a graph convolutional network with 2D pose estimation for the first time on child action recognition task achieving on par results with an RGB modality based model on a novel benchmark dataset containing unconstrained environment based videos. We present a graph convolutional network with 2D pose estimation for the first time on child action recognition task achieving on par results with an RGB modality based model on a novel benchmark dataset containing unconstrained environment based videos. △ Less

Submitted 18 December, 2022; originally announced December 2022.

Comments: Paper Accepted for the IEEE TENCON Conference (2022). 7 pages, 5 figures

arXiv:2212.09013 [pdf, other]

Graph Neural Network based Child Activity Recognition

Authors: Sanka Mohottala, Pradeepa Samarasinghe, Dharshana Kasthurirathna, Charith Abhayaratne

Abstract: This paper presents an implementation on child activity recognition (CAR) with a graph convolution network (GCN) based deep learning model since prior implementations in this domain have been dominated by CNN, LSTM and other methods despite the superior performance of GCN. To the best of our knowledge, we are the first to use a GCN model in child activity recognition domain. In overcoming the chal… ▽ More This paper presents an implementation on child activity recognition (CAR) with a graph convolution network (GCN) based deep learning model since prior implementations in this domain have been dominated by CNN, LSTM and other methods despite the superior performance of GCN. To the best of our knowledge, we are the first to use a GCN model in child activity recognition domain. In overcoming the challenges of having small size publicly available child action datasets, several learning methods such as feature extraction, fine-tuning and curriculum learning were implemented to improve the model performance. Inspired by the contradicting claims made on the use of transfer learning in CAR, we conducted a detailed implementation and analysis on transfer learning together with a study on negative transfer learning effect on CAR as it hasn't been addressed previously. As the principal contribution, we were able to develop a ST-GCN based CAR model which, despite the small size of the dataset, obtained around 50% accuracy on vanilla implementations. With feature extraction and fine-tuning methods, accuracy was improved by 20%-30% with the highest accuracy being 82.24%. Furthermore, the results provided on activity datasets empirically demonstrate that with careful selection of pre-train model datasets through methods such as curriculum learning could enhance the accuracy levels. Finally, we provide preliminary evidence on possible frame rate effect on the accuracy of CAR models, a direction future research can explore. △ Less

Submitted 18 December, 2022; originally announced December 2022.

Comments: Accepted to 23rd IEEE ICIT Conference (2022), 8 pages, 4 figures

arXiv:2206.09298 [pdf, ps, other]

GMM based multi-stage Wiener filtering for low SNR speech enhancement

Authors: Wageesha Manamperi, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Jihui Zhang

Abstract: This paper proposes a single-channel speech enhancement method to reduce the noise and enhance speech at low signal-to-noise ratio (SNR) levels and non-stationary noise conditions. Specifically, we focus on modeling the noise using a Gaussian mixture model (GMM) based on a multi-stage process with a parametric Wiener filter. The proposed noise model estimates a more accurate noise power spectral d… ▽ More This paper proposes a single-channel speech enhancement method to reduce the noise and enhance speech at low signal-to-noise ratio (SNR) levels and non-stationary noise conditions. Specifically, we focus on modeling the noise using a Gaussian mixture model (GMM) based on a multi-stage process with a parametric Wiener filter. The proposed noise model estimates a more accurate noise power spectral density (PSD), and allows for better generalization under various noise conditions compared to traditional Wiener filtering methods. Simulations show that the proposed approach can achieve better performance in terms of speech quality (PESQ) and intelligibility (STOI) at low SNR levels. △ Less

Submitted 14 July, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

Comments: 5 pages, 3 figures, submitted to a conference

arXiv:2008.03513 [pdf, ps, other]

doi 10.1109/ICASSP40776.2020.9054728

A Novel Method for Obtaining Diffuse Field Measurements for Microphone Calibration

Authors: Noman Akbar, Glenn Dickins, Mark R. P. Thomas, Prasanga Samarasinghe, Thushara Abhayapala

Abstract: We propose a straightforward and cost-effective method to perform diffuse soundfield measurements for calibrating the magnitude response of a microphone array. Typically, such calibration is performed in a diffuse soundfield created in reverberation chambers, an expensive and time-consuming process. A method is proposed for obtaining diffuse field measurements in untreated environments. First, a c… ▽ More We propose a straightforward and cost-effective method to perform diffuse soundfield measurements for calibrating the magnitude response of a microphone array. Typically, such calibration is performed in a diffuse soundfield created in reverberation chambers, an expensive and time-consuming process. A method is proposed for obtaining diffuse field measurements in untreated environments. First, a closed-form expression for the spatial correlation of a wideband signal in a diffuse field is derived. Next, we describe a practical procedure for obtaining the diffuse field response of a microphone array in the presence of a non-diffuse soundfield by the introduction of random perturbations in the microphone location. Experimental spatial correlation data obtained is compared with the theoretical model, confirming that it is possible to obtain diffuse field measurements in untreated environments with relatively few loudspeakers. A 30 second test signal played from 4-8 loudspeakers is shown to be sufficient in obtaining a diffuse field measurement using the proposed method. An Eigenmike is then successfully calibrated at two different geographical locations. △ Less

Submitted 8 August, 2020; originally announced August 2020.

Comments: Accepted to appear in IEEE ICASSP 2020

Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

arXiv:2007.11795 [pdf, other]

Sound Field Translation and Mixed Source Model for Virtual Applications with Perceptual Validation

Authors: Lachlan Birnie, Thushara Abhayapala, Vladimir Tourbabin, Prasanga Samarasinghe

Abstract: Non-interactive and linear experiences like cinema film offer high quality surround sound audio to enhance immersion, however the listener's experience is usually fixed to a single acoustic perspective. With the rise of virtual reality, there is a demand for recording and recreating real-world experiences in a way that allows for the user to interact and move within the reproduction. Conventional… ▽ More Non-interactive and linear experiences like cinema film offer high quality surround sound audio to enhance immersion, however the listener's experience is usually fixed to a single acoustic perspective. With the rise of virtual reality, there is a demand for recording and recreating real-world experiences in a way that allows for the user to interact and move within the reproduction. Conventional sound field translation techniques take a recording and expand it into an equivalent environment of virtual sources. However, the finite sampling of a commercial higher order microphone produces an acoustic sweet-spot in the virtual reproduction. As a result, the technique remains to restrict the listener's navigable region. In this paper, we propose a method for listener translation in an acoustic reproduction that incorporates a mixture of near-field and far-field sources in a sparsely expanded virtual environment. We perceptually validate the method through a Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) experiment. Compared to the planewave benchmark, the proposed method offers both improved source localizability and robustness to spectral distortions at translated positions. A cross-examination with numerical simulations demonstrated that the sparse expansion relaxes the inherent sweet-spot constraint, leading to the improved localizability for sparse environments. Additionally, the proposed method is seen to better reproduce the intensity and binaural room impulse response spectra of near-field environments, further supporting the strong perceptual results. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: 12 pages, 11 figures This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2003.08050 [pdf, other]

doi 10.1109/TASLP.2019.2960734

Multi-Source DOA Estimation through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield

Authors: A. Fahim, P. N. Samarasinghe, T. D. Abhayapala

Abstract: We propose a novel multi-source direction of arrival (DOA) estimation technique using a convolutional neural network algorithm which learns the modal coherence patterns of an incident soundfield through measured spherical harmonic coefficients. We train our model for individual time-frequency bins in the short-time Fourier transform spectrum by analyzing the unique snapshot of modal coherence for… ▽ More We propose a novel multi-source direction of arrival (DOA) estimation technique using a convolutional neural network algorithm which learns the modal coherence patterns of an incident soundfield through measured spherical harmonic coefficients. We train our model for individual time-frequency bins in the short-time Fourier transform spectrum by analyzing the unique snapshot of modal coherence for each desired direction. The proposed method is capable of estimating simultaneously active multiple sound sources on a $3$D space using a single-source training scheme. This single-source training scheme reduces the training time and resource requirements as well as allows the reuse of the same trained model for different multi-source combinations. The method is evaluated against various simulated and practical noisy and reverberant environments with varying acoustic criteria and found to outperform the baseline methods in terms of DOA estimation accuracy. Furthermore, the proposed algorithm allows independent training of azimuth and elevation during a full DOA estimation over $3$D space which significantly improves its training efficiency without affecting the overall estimation accuracy. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2019) 605 - 618

arXiv:1805.06234 [pdf, other]

doi 10.1109/TASLP.2018.2835723

PSD Estimation and Source Separation in a Noisy Reverberant Environment using a Spherical Microphone Array

Authors: Abdullah Fahim, Prasanga N. Samarasinghe, Thushara D. Abhayapala

Abstract: In this paper, we propose an efficient technique for estimating individual power spectral density (PSD) components, i.e., PSD of each desired sound source as well as of noise and reverberation, in a multi-source reverberant sound scene with coherent background noise. We formulate the problem in the spherical harmonics domain to take the advantage of the inherent orthogonality of the spherical harm… ▽ More In this paper, we propose an efficient technique for estimating individual power spectral density (PSD) components, i.e., PSD of each desired sound source as well as of noise and reverberation, in a multi-source reverberant sound scene with coherent background noise. We formulate the problem in the spherical harmonics domain to take the advantage of the inherent orthogonality of the spherical harmonics basis functions and extract the PSD components from the cross-correlation between the different sound field modes. We also investigate an implementation issue that occurs at the nulls of the Bessel functions and offer an engineering solution. The performance evaluation takes place in a practical environment with a commercial microphone array in order to measure the robustness of the proposed algorithm against all the deviations incurred in practice. We also exhibit an application of the proposed PSD estimator through a source septation algorithm and compare the performance with a contemporary method in terms of different objective measures. △ Less

Submitted 16 May, 2018; originally announced May 2018.

arXiv:1709.01346 [pdf, ps, other]

doi 10.1109/WASPAA.2017.8169998

PSD Estimation of Multiple Sound Sources in a Reverberant Room Using a Spherical Microphone Array

Authors: Abdullah Fahim, Prasanga N. Samarasinghe, Thushara D. Abhayapala

Abstract: We propose an efficient method to estimate source power spectral densities (PSDs) in a multi-source reverberant environment using a spherical microphone array. The proposed method utilizes the spatial correlation between the spherical harmonics (SH) coefficients of a sound field to estimate source PSDs. The use of the spatial cross-correlation of the SH coefficients allows us to employ the method… ▽ More We propose an efficient method to estimate source power spectral densities (PSDs) in a multi-source reverberant environment using a spherical microphone array. The proposed method utilizes the spatial correlation between the spherical harmonics (SH) coefficients of a sound field to estimate source PSDs. The use of the spatial cross-correlation of the SH coefficients allows us to employ the method in an environment with a higher number of sources compared to conventional methods. Furthermore, the orthogonality property of the SH basis functions saves the effort of designing specific beampatterns of a conventional beamformer-based method. We evaluate the performance of the algorithm with different number of sources in practical reverberant and non-reverberant rooms. We also demonstrate an application of the method by separating source signals using a conventional beamformer and a Wiener post-filter designed from the estimated PSDs. △ Less

Submitted 5 September, 2017; originally announced September 2017.

Comments: Accepted for WASPAA 2017

arXiv:1510.08950 [pdf, other]

Estimation of the direct-to-reverberant Energy Ratio using a spherical microphone array

Authors: Hanchi Chen, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Wen Zhang

Abstract: This paper proposes a practical approach to estimate the direct-to-reverberant energy ratio (DRR) using a spherical microphone array without having knowledge of the source signal. We base our estimation on a theoretical relationship between the DRR and the coherence estimation function between coincident pressure and particle velocity. We discuss the proposed method's ability to estimate the DRR i… ▽ More This paper proposes a practical approach to estimate the direct-to-reverberant energy ratio (DRR) using a spherical microphone array without having knowledge of the source signal. We base our estimation on a theoretical relationship between the DRR and the coherence estimation function between coincident pressure and particle velocity. We discuss the proposed method's ability to estimate the DRR in a wide variety of room sizes, reverberation times and source receiver distances with appropriate examples. Test results show that the method can estimate the room DRR for frequencies between 199 - 2511 Hz, with $\pm$ 3 dB accuracy. △ Less

Submitted 29 October, 2015; originally announced October 2015.

Comments: In Proceedings of the ACE Challenge Workshop - a satellite event of IEEE-WASPAA 2015 (arXiv:1510.00383)

Report number: ACEChallenge/2015/01

arXiv:1505.04385 [pdf, ps, other]

An Efficient Parameterization of the Room Transfer Function

Authors: Prasanga Samarasinghe, Thushara Abhayapala, Mark Poletti, Terence Betlehem

Abstract: This paper proposes an efficient parameterization of the Room Transfer Function (RTF). Typically, the RTF rapidly varies with varying source and receiver positions, hence requires an impractical number of point to point measurements to characterize a given room. Therefore, we derive a novel RTF parameterization that is robust to both receiver and source variations with the following salient featur… ▽ More This paper proposes an efficient parameterization of the Room Transfer Function (RTF). Typically, the RTF rapidly varies with varying source and receiver positions, hence requires an impractical number of point to point measurements to characterize a given room. Therefore, we derive a novel RTF parameterization that is robust to both receiver and source variations with the following salient features: (i) The parameterization is given in terms of a modal expansion of 3D basis functions. (ii) The aforementioned modal expansion can be truncated at a finite number of modes given that the source and receiver locations are from two sizeable spatial regions, which are arbitrarily distributed. (iii) The parameter weights/coefficients are independent of the source/receiver positions. Therefore, a finite set of coefficients is shown to be capable of accurately calculating the RTF between any two arbitrary points from a predefined spatial region where the source(s) lie and a pre-defined spatial region where the receiver(s) lie. A practical method to measure the RTF coefficients is also provided, which only requires a single microphone unit and a single loudspeaker unit, given that the room characteristics remain stationary over time. The accuracy of the above parameterization is verified using appropriate simulation examples. △ Less

Submitted 17 May, 2015; originally announced May 2015.

Comments: 11 pages, 6 figures

Showing 1–16 of 16 results for author: Samarasinghe, P