Search | arXiv e-print repository

An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio

Authors: Siding Zeng, Jiangyan Yi, Jianhua Tao, Yujie Chen, Shan Liang, Yong Ren, Xiaohui Zhang

Abstract: When the task of locating manipulation regions in partially-fake audio (PFA) involves cross-domain datasets, the performance of deep learning models drops significantly due to the shift between the source and target domains. To address this issue, existing approaches often employ data augmentation before training. However, they overlook the characteristics in target domain that are absent in sourc… ▽ More When the task of locating manipulation regions in partially-fake audio (PFA) involves cross-domain datasets, the performance of deep learning models drops significantly due to the shift between the source and target domains. To address this issue, existing approaches often employ data augmentation before training. However, they overlook the characteristics in target domain that are absent in source domain. Inspired by the mixture-of-experts model, we propose an unsupervised method named Samples mining with Diversity and Entropy (SDE). Our method first learns from a collection of diverse experts that achieve great performance from different perspectives in the source domain, but with ambiguity on target samples. We leverage these diverse experts to select the most informative samples by calculating their entropy. Furthermore, we introduced a label generation method tailored for these selected samples that are incorporated in the training process in source domain integrating the target domain information. We applied our method to a cross-domain partially fake audio detection dataset, ADD2023Track2. By introducing 10% of unknown samples from the target domain, we achieved an F1 score of 43.84%, which represents a relative increase of 77.2% compared to the second-best method. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2404.15354 [pdf, other]

Elevating Spectral GNNs through Enhanced Band-pass Filter Approximation

Authors: Guoming Li, Jian Yang, Shangsong Liang, Dongsheng Luo

Abstract: Spectral Graph Neural Networks (GNNs) have attracted great attention due to their capacity to capture patterns in the frequency domains with essential graph filters. Polynomial-based ones (namely poly-GNNs), which approximately construct graph filters with conventional or rational polynomials, are routinely adopted in practice for their substantial performances on graph learning tasks. However, pr… ▽ More Spectral Graph Neural Networks (GNNs) have attracted great attention due to their capacity to capture patterns in the frequency domains with essential graph filters. Polynomial-based ones (namely poly-GNNs), which approximately construct graph filters with conventional or rational polynomials, are routinely adopted in practice for their substantial performances on graph learning tasks. However, previous poly-GNNs aim at achieving overall lower approximation error on different types of filters, e.g., low-pass and high-pass, but ignore a key question: \textit{which type of filter warrants greater attention for poly-GNNs?} In this paper, we first show that poly-GNN with a better approximation for band-pass graph filters performs better on graph learning tasks. This insight further sheds light on critical issues of existing poly-GNNs, i.e., those poly-GNNs achieve trivial performance in approximating band-pass graph filters, hindering the great potential of poly-GNNs. To tackle the issues, we propose a novel poly-GNN named TrigoNet. TrigoNet constructs different graph filters with novel trigonometric polynomial, and achieves leading performance in approximating band-pass graph filters against other polynomials. By applying Taylor expansion and deserting nonlinearity, TrigoNet achieves noticeable efficiency among baselines. Extensive experiments show the advantages of TrigoNet in both accuracy performances and efficiency. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.11861 [pdf, other]

sEMG-based Fine-grained Gesture Recognition via Improved LightGBM Model

Authors: Xiupeng Qiao, Zekun Chen, Shili Liang

Abstract: Surface electromyogram (sEMG), as a bioelectrical signal reflecting the activity of human muscles, has a wide range of applications in the control of prosthetics, human-computer interaction and so on. However, the existing recognition methods are all discrete actions, that is, every time an action is executed, it is necessary to restore the resting state before the next action, and it is unable to… ▽ More Surface electromyogram (sEMG), as a bioelectrical signal reflecting the activity of human muscles, has a wide range of applications in the control of prosthetics, human-computer interaction and so on. However, the existing recognition methods are all discrete actions, that is, every time an action is executed, it is necessary to restore the resting state before the next action, and it is unable to effectively recognize the gestures of continuous actions. To solve this problem, this paper proposes an improved fine gesture recognition model based on LightGBM algorithm. A sliding window sample segmentation scheme is adopted to replace active segment detection, and a series of innovative schemes such as improved loss function, Optuna hyperparameter search and Bagging integration are adopted to optimize LightGBM model and realize gesture recognition of continuous active segment signals. In order to verify the effectiveness of the proposed algorithm, we used the NinaproDB7 dataset to design the normal data recognition experiment and the disabled data transfer experiment. The results showed that the recognition rate of the proposed model was 89.72% higher than that of the optimal model Bi-ConvGRU for 18 gesture recognition tasks in the open data set, it reached 90.28%. Compared with the scheme directly trained on small sample data, the recognition rate of transfer learning was significantly improved from 60.35% to 78.54%, effectively solving the problem of insufficient data, and proving the applicability and advantages of transfer learning in fine gesture recognition tasks for disabled people. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11383 [pdf, other]

Lower Limb Movements Recognition Based on Feature Recursive Elimination and Backpropagation Neural Network

Authors: Yongkai Ma, Shili Liang, Zekun Chen

Abstract: Surface electromyographic (sEMG) signal serve as a signal source commonly used for lower limb movement recognition, reflecting the intent of human movement. However, it has been a challenge to improve the movements recognition rate while using fewer features in this area of research area. In this paper, a method for lower limb movements recognition based on recursive feature elimination and backpr… ▽ More Surface electromyographic (sEMG) signal serve as a signal source commonly used for lower limb movement recognition, reflecting the intent of human movement. However, it has been a challenge to improve the movements recognition rate while using fewer features in this area of research area. In this paper, a method for lower limb movements recognition based on recursive feature elimination and backpropagation neural network of support vector machine is proposed. First, the sEMG signal of five subjects performing eight different lower limb movements was recorded using a BIOPAC collector. The optimal feature subset consists of 25 feature vectors, determined using a Recursive Feature Elimination based on Support Vector Machine (SVM-RFE). Finally, this study used five supervised classification algorithms to recognize these eight different lower limb movements. The results of the experimental study show that the combination of the BPNN classifier and the SVM-RFE feature selection algorithm is able to achieve an excellent action recognition accuracy of 95\%, which provides sufficient support for the feasibility of this approach. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.07444 [pdf, other]

Two-Way Aerial Secure Communications via Distributed Collaborative Beamforming under Eavesdropper Collusion

Authors: Jiahui Li, Geng Sun, Qingqing Wu, Shuang Liang, Pengfei Wang, Dusit Niyato

Abstract: Unmanned aerial vehicles (UAVs)-enabled aerial communication provides a flexible, reliable, and cost-effective solution for a range of wireless applications. However, due to the high line-of-sight (LoS) probability, aerial communications between UAVs are vulnerable to eavesdropping attacks, particularly when multiple eavesdroppers collude. In this work, we aim to introduce distributed collaborativ… ▽ More Unmanned aerial vehicles (UAVs)-enabled aerial communication provides a flexible, reliable, and cost-effective solution for a range of wireless applications. However, due to the high line-of-sight (LoS) probability, aerial communications between UAVs are vulnerable to eavesdropping attacks, particularly when multiple eavesdroppers collude. In this work, we aim to introduce distributed collaborative beamforming (DCB) into UAV swarms and handle the eavesdropper collusion by controlling the corresponding signal distributions. Specifically, we consider a two-way DCB-enabled aerial communication between two UAV swarms and construct these swarms as two UAV virtual antenna arrays. Then, we minimize the two-way known secrecy capacity and the maximum sidelobe level to avoid information leakage from the known and unknown eavesdroppers, respectively. Simultaneously, we also minimize the energy consumption of UAVs for constructing virtual antenna arrays. Due to the conflicting relationships between secure performance and energy efficiency, we consider these objectives as a multi-objective optimization problem. Following this, we propose an enhanced multi-objective swarm intelligence algorithm via the characterized properties of the problem. Simulation results show that our proposed algorithm can obtain a set of informative solutions and outperform other state-of-the-art baseline algorithms. Experimental tests demonstrate that our method can be deployed in limited computing power platforms of UAVs and is beneficial for saving computational resources. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by IEEE INFOCOM 2024

arXiv:2404.04597 [pdf, other]

A Two Time-Scale Joint Optimization Approach for UAV-assisted MEC

Authors: Zemin Sun, Geng Sun, Long He, Fang Mei, Shuang Liang, Yanheng Liu

Abstract: Unmanned aerial vehicles (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services close to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply heterogeneity… ▽ More Unmanned aerial vehicles (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services close to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply heterogeneity between MDs and MEC servers, the trajectory control requirements on energy efficiency and timeliness, and the different time-scale dynamics of the network. To address these issues, we first present a hierarchical architecture by incorporating terrestrial-aerial computing capabilities and leveraging UAV flexibility. Furthermore, we formulate a joint computing resource allocation, computation offloading, and trajectory control problem to maximize the system utility. Since the problem is a non-convex mixed integer nonlinear programming (MINLP), we propose a two time-scale joint computing resource allocation, computation offloading, and trajectory control (TJCCT) approach. In the short time scale, we propose a price-incentive method for on-demand computing resource allocation and a matching mechanism-based method for computation offloading. In the long time scale, we propose a convex optimization-based method for UAV trajectory control. Besides, we prove the stability, optimality, and polynomial complexity of TJCCT. Simulation results demonstrate that TJCCT outperforms the comparative algorithms in terms of the utility of the system, the QoE of MDs, and the revenue of MEC servers. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2403.15828

arXiv:2404.04559 [pdf, ps, other]

Spectral GNN via Two-dimensional (2-D) Graph Convolution

Authors: Guoming Li, Jian Yang, Shangsong Liang, Dongsheng Luo

Abstract: Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph learning. As an essential part of spectral GNNs, spectral graph convolution extracts crucial frequency information in graph data, leading to superior performance of spectral GNNs in downstream tasks. However, in this paper, we show that existing spectral GNNs remain critical drawbacks in performing the spectral graph c… ▽ More Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph learning. As an essential part of spectral GNNs, spectral graph convolution extracts crucial frequency information in graph data, leading to superior performance of spectral GNNs in downstream tasks. However, in this paper, we show that existing spectral GNNs remain critical drawbacks in performing the spectral graph convolution. Specifically, considering the spectral graph convolution as a construction operation towards target output, we prove that existing popular convolution paradigms cannot construct the target output with mild conditions on input graph signals, causing spectral GNNs to fall into suboptimal solutions. To address the issues, we rethink the spectral graph convolution from a more general two-dimensional (2-D) signal convolution perspective and propose a new convolution paradigm, named 2-D graph convolution. We prove that 2-D graph convolution unifies existing graph convolution paradigms, and is capable to construct arbitrary target output. Based on the proposed 2-D graph convolution, we further propose ChebNet2D, an efficient and effective GNN implementation of 2-D graph convolution through applying Chebyshev interpolation. Extensive experiments on benchmark datasets demonstrate both effectiveness and efficiency of the ChebNet2D. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2403.15828 [pdf, other]

TJCCT: A Two-timescale Approach for UAV-assisted Mobile Edge Computing

Authors: Zemin Sun, Geng Sun, Qingqing Wu, Long He, Shuang Liang, Hongyang Pan, Dusit Niyato, Chau Yuen, Victor C. M. Leung

Abstract: Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services in close proximity to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply h… ▽ More Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services in close proximity to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply heterogeneity between MDs and MEC servers, the trajectory control requirements on energy efficiency and timeliness, and the different time-scale dynamics of the network. To address these issues, we first present a hierarchical architecture by incorporating terrestrial-aerial computing capabilities and leveraging UAV flexibility. Furthermore, we formulate a joint computing resource allocation, computation offloading, and trajectory control problem to maximize the system utility. Since the problem is a non-convex and NP-hard mixed integer nonlinear programming (MINLP), we propose a two-timescale joint computing resource allocation, computation offloading, and trajectory control (TJCCT) approach for solving the problem. In the short timescale, we propose a price-incentive model for on-demand computing resource allocation and a matching mechanism-based method for computation offloading. In the long timescale, we propose a convex optimization-based method for UAV trajectory control. Besides, we theoretically prove the stability, optimality, and polynomial complexity of TJCCT. Extended simulation results demonstrate that the proposed TJCCT outperforms the comparative algorithms in terms of the system utility, average processing rate, average completion delay, and average completion ratio. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.05247 [pdf, other]

Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

Authors: Tianrui Lou, Xiaojun Jia, Jindong Gu, Li Liu, Siyuan Liang, Bangyan He, Xiaochun Cao

Abstract: Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. An… ▽ More Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. Another promising strategy, shape-based attack, can effectively eliminate outliers, but existing methods often suffer significant reductions in imperceptibility due to irrational deformations. We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength, specifically in parts of the object surface that are complex and exhibit drastic curvature changes. Therefore, we propose a novel shape-based adversarial attack method, HiT-ADV, which initially conducts a two-stage search for attack regions based on saliency and imperceptibility scores, and then adds deformation perturbations in each attack region using Gaussian kernel functions. Additionally, HiT-ADV is extendable to physical attack. We propose that by employing benign resampling and benign rigid transformations, we can further enhance physical adversarial strength with little sacrifice to imperceptibility. Extensive experiments have validated the superiority of our method in terms of adversarial and imperceptible properties in both digital and physical spaces. Our code is avaliable at: https://github.com/TRLou/HiT-ADV. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2402.04097 [pdf, other]

Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction

Authors: Shijun Liang, Evan Bell, Qing Qu, Rongrong Wang, Saiprasad Ravishankar

Abstract: The ability of deep image prior (DIP) to recover high-quality images from incomplete or corrupted measurements has made it popular in inverse problems in image restoration and medical imaging including magnetic resonance imaging (MRI). However, conventional DIP suffers from severe overfitting and spectral bias effects. In this work, we first provide an analysis of how DIP recovers information from… ▽ More The ability of deep image prior (DIP) to recover high-quality images from incomplete or corrupted measurements has made it popular in inverse problems in image restoration and medical imaging including magnetic resonance imaging (MRI). However, conventional DIP suffers from severe overfitting and spectral bias effects. In this work, we first provide an analysis of how DIP recovers information from undersampled imaging measurements by analyzing the training dynamics of the underlying networks in the kernel regime for different architectures. This study sheds light on important underlying properties for DIP-based recovery. Current research suggests that incorporating a reference image as network input can enhance DIP's performance in image reconstruction compared to using random inputs. However, obtaining suitable reference images requires supervision, and raises practical difficulties. In an attempt to overcome this obstacle, we further introduce a self-driven reconstruction process that concurrently optimizes both the network weights and the input while eliminating the need for training data. Our method incorporates a novel denoiser regularization term which enables robust and stable joint estimation of both the network input and reconstructed image. We demonstrate that our self-guided method surpasses both the original DIP and modern supervised methods in terms of MR image reconstruction performance and outperforms previous DIP-based schemes for image inpainting. △ Less

Submitted 7 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2312.07784 [pdf, other]

Robust MRI Reconstruction by Smoothed Unrolling (SMUG)

Authors: Shijun Liang, Van Hoang Minh Nguyen, Jinghan Jia, Ismail Alkhouri, Sijia Liu, Saiprasad Ravishankar

Abstract: As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case additive perturbations. This sensitivity often leads to unstable, aliased images. This raises the question of how to devise DL techniques for… ▽ More As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case additive perturbations. This sensitivity often leads to unstable, aliased images. This raises the question of how to devise DL techniques for MRI reconstruction that can be robust to train-test variations. To address this problem, we propose a novel image reconstruction framework, termed Smoothed Unrolling (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning approach. RS, which improves the tolerance of a model against input noises, has been widely used in the design of adversarial defense approaches for image classification tasks. Yet, we find that the conventional design that applies RS to the entire DL-based MRI model is ineffective. In this paper, we show that SMUG and its variants address the above issue by customizing the RS process based on the unrolling architecture of a DL-based MRI reconstruction model. Compared to the vanilla RS approach, we show that SMUG improves the robustness of MRI reconstruction with respect to a diverse set of instability sources, including worst-case and random noise perturbations to input measurements, varying measurement sampling rates, and different numbers of unrolling steps. Furthermore, we theoretically analyze the robustness of our method in the presence of perturbations. △ Less

Submitted 19 August, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.00396 [pdf, other]

Joint Scheduling and Trajectory Optimization of Charging UAV in Wireless Rechargeable Sensor Networks

Authors: Yanheng Liu, Hongyang Pan, Geng Sun, Aimin Wang, Jiahui Li, Shuang Liang

Abstract: Wireless rechargeable sensor networks with a charging unmanned aerial vehicle (CUAV) have the broad application prospects in the power supply of the rechargeable sensor nodes (SNs). However, how to schedule a CUAV and design the trajectory to improve the charging efficiency of the entire system is still a vital problem. In this paper, we formulate a joint-CUAV scheduling and trajectory optimizatio… ▽ More Wireless rechargeable sensor networks with a charging unmanned aerial vehicle (CUAV) have the broad application prospects in the power supply of the rechargeable sensor nodes (SNs). However, how to schedule a CUAV and design the trajectory to improve the charging efficiency of the entire system is still a vital problem. In this paper, we formulate a joint-CUAV scheduling and trajectory optimization problem (JSTOP) to simultaneously minimize the hovering points of CUAV, the number of the repeatedly covered SNs and the flying distance of CUAV for charging all SNs. Due to the complexity of JSTOP, it is decomposed into two optimization subproblems that are CUAV scheduling optimization problem (CSOP) and CUAV trajectory optimization problem (CTOP). CSOP is a hybrid optimization problem that consists of the continuous and discrete solution space, and the solution dimension in CSOP is not fixed since it should be changed with the number of hovering points of CUAV. Moreover, CTOP is a completely discrete optimization problem. Thus, we propose a particle swarm optimization (PSO) with a flexible dimension mechanism, a K-means operator and a punishment-compensation mechanism (PSOFKP) and a PSO with a discretization factor, a 2-opt operator and a path crossover reduction mechanism (PSOD2P) to solve the converted CSOP and CTOP, respectively. Simulation results evaluate the benefits of PSOFKP and PSOD2P under different scales and settings of the network, and the stability of the proposed algorithms is verified. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2310.00384 [pdf, ps, other]

Joint Power and 3D Trajectory Optimization for UAV-enabled Wireless Powered Communication Networks with Obstacles

Authors: Hongyang Pan, Yanheng Liu, Geng Sun, Junsong Fan, Shuang Liang, Chau Yuen

Abstract: Unmanned aerial vehicle (UAV)-enabled wireless powered communication networks (WPCNs) are promising technologies in 5G/6G wireless communications, while there are several challenges about UAV power allocation and scheduling to enhance the energy utilization efficiency, considering the existence of obstacles. In this work, we consider a UAV-enabled WPCN scenario that a UAV needs to cover the ground… ▽ More Unmanned aerial vehicle (UAV)-enabled wireless powered communication networks (WPCNs) are promising technologies in 5G/6G wireless communications, while there are several challenges about UAV power allocation and scheduling to enhance the energy utilization efficiency, considering the existence of obstacles. In this work, we consider a UAV-enabled WPCN scenario that a UAV needs to cover the ground wireless devices (WDs). During the coverage process, the UAV needs to collect data from the WDs and charge them simultaneously. To this end, we formulate a joint-UAV power and three-dimensional (3D) trajectory optimization problem (JUPTTOP) to simultaneously increase the total number of the covered WDs, increase the time efficiency, and reduce the total flying distance of UAV so as to improve the energy utilization efficiency in the network. Due to the difficulties and complexities, we decompose it into two sub optimization problems, which are the UAV power allocation optimization problem (UPAOP) and UAV 3D trajectory optimization problem (UTTOP), respectively. Then, we propose an improved non-dominated sorting genetic algorithm-II with K-means initialization operator and Variable dimension mechanism (NSGA-II-KV) for solving the UPAOP. For UTTOP, we first introduce a pretreatment method, and then use an improved particle swarm optimization with Normal distribution initialization, Genetic mechanism, Differential mechanism and Pursuit operator (PSO-NGDP) to deal with this sub optimization problem. Simulation results verify the effectiveness of the proposed strategies under different scales and settings of the networks. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2310.00288 [pdf]

doi 10.1038/s41928-023-00965-5

Parallel in-memory wireless computing

Authors: Cong Wang, Gong-Jie Ruan, Zai-Zheng Yang, Xing-Jian Yangdong, Yixiang Li, Liang Wu, Yingmeng Ge, Yichen Zhao, Chen Pan, Wei Wei, Li-Bo Wang, Bin Cheng, Zaichen Zhang, Chuan Zhang, Shi-Jun Liang, Feng Miao

Abstract: Parallel wireless digital communication with ultralow power consumption is critical for emerging edge technologies such as 5G and Internet of Things. However, the physical separation between digital computing units and analogue transmission units in traditional wireless technology leads to high power consumption. Here we report a parallel in-memory wireless computing scheme. The approach combines… ▽ More Parallel wireless digital communication with ultralow power consumption is critical for emerging edge technologies such as 5G and Internet of Things. However, the physical separation between digital computing units and analogue transmission units in traditional wireless technology leads to high power consumption. Here we report a parallel in-memory wireless computing scheme. The approach combines in-memory computing with wireless communication using memristive crossbar arrays. We show that the system can be used for the radio transmission of a binary stream of 480 bits with a bit error rate of 0. The in-memory wireless computing uses two orders of magnitude less power than conventional technology (based on digital-to-analogue and analogue-to-digital converters). We also show that the approach can be applied to acoustic and optical wireless communications △ Less

Submitted 30 September, 2023; originally announced October 2023.

Journal ref: Nat Electron 6, 381-389 (2023)

arXiv:2309.16709 [pdf, other]

Joint Task Offloading and Resource Allocation in Aerial-Terrestrial UAV Networks with Edge and Fog Computing for Post-Disaster Rescue

Authors: Geng Sun, Long He, Zemin Sun, Qingqing Wu, Shuang Liang, Jiahui Li, Dusit Niyato, Victor C. M. Leung

Abstract: Unmanned aerial vehicles (UAVs) play an increasingly important role in assisting fast-response post-disaster rescue due to their fast deployment, flexible mobility, and low cost. However, UAVs face the challenges of limited battery capacity and computing resources, which could shorten the expected flight endurance of UAVs and increase the rescue response delay during performing mission-critical ta… ▽ More Unmanned aerial vehicles (UAVs) play an increasingly important role in assisting fast-response post-disaster rescue due to their fast deployment, flexible mobility, and low cost. However, UAVs face the challenges of limited battery capacity and computing resources, which could shorten the expected flight endurance of UAVs and increase the rescue response delay during performing mission-critical tasks. To address this challenge, we first present a three-layer post-disaster rescue computing architecture by leveraging the aerial-terrestrial edge capabilities of mobile edge computing (MEC) and vehicle fog computing (VFC), which consists of a vehicle fog layer, a UAV client layer, and a UAV edge layer. Moreover, we formulate a joint task offloading and resource allocation optimization problem (JTRAOP) with the aim of maximizing the time-average system utility. Since the formulated JTRAOP is proved to be NP-hard, we propose an MEC-VFC-aided task offloading and resource allocation (MVTORA) approach, which consists of a game theoretic algorithm for task offloading decision, a convex optimization-based algorithm for MEC resource allocation, and an evolutionary computation-based hybrid algorithm for VFC resource allocation. Simulation results validate that the proposed approach can achieve superior system performance compared to the other benchmark schemes, especially under heavy system workloads. △ Less

Submitted 6 October, 2023; v1 submitted 17 August, 2023; originally announced September 2023.

Comments: 18 pages, 6 figures

arXiv:2309.15977 [pdf, other]

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Abstract: Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment. Some prior work has proposed representing RIR as a neural field function of the sound emitter and receiver positions. However, these methods do not sufficiently consider the acoustic properties of an audio scene, leading to unsatisfactor… ▽ More Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment. Some prior work has proposed representing RIR as a neural field function of the sound emitter and receiver positions. However, these methods do not sufficiently consider the acoustic properties of an audio scene, leading to unsatisfactory performance. This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene by leveraging multiple acoustic contexts, such as geometry, material property, and spatial information. Driven by the unique properties of RIR, i.e., temporal un-smoothness and monotonic energy attenuation, we design a temporal correlation module and multi-scale energy decay criterion. Experimental results show that NACF outperforms existing field-based methods by a notable margin. Please visit our project page for more qualitative results. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.05794 [pdf, other]

Robust Physics-based Deep MRI Reconstruction Via Diffusion Purification

Authors: Ismail Alkhouri, Shijun Liang, Rongrong Wang, Qing Qu, Saiprasad Ravishankar

Abstract: Deep learning (DL) techniques have been extensively employed in magnetic resonance imaging (MRI) reconstruction, delivering notable performance enhancements over traditional non-DL methods. Nonetheless, recent studies have identified vulnerabilities in these models during testing, namely, their susceptibility to (\textit{i}) worst-case measurement perturbations and to (\textit{ii}) variations in t… ▽ More Deep learning (DL) techniques have been extensively employed in magnetic resonance imaging (MRI) reconstruction, delivering notable performance enhancements over traditional non-DL methods. Nonetheless, recent studies have identified vulnerabilities in these models during testing, namely, their susceptibility to (\textit{i}) worst-case measurement perturbations and to (\textit{ii}) variations in training/testing settings like acceleration factors and k-space sampling locations. This paper addresses the robustness challenges by leveraging diffusion models. In particular, we present a robustification strategy that improves the resilience of DL-based MRI reconstruction methods by utilizing pretrained diffusion models as noise purifiers. In contrast to conventional robustification methods for DL-based MRI reconstruction, such as adversarial training (AT), our proposed approach eliminates the need to tackle a minimax optimization problem. It only necessitates fine-tuning on purified examples. Our experimental results highlight the efficacy of our approach in mitigating the aforementioned instabilities when compared to leading robustification approaches for deep MRI reconstruction, including AT and randomized smoothing. △ Less

Submitted 24 October, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.00122 [pdf, other]

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

Authors: Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Abstract: We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse… ▽ More We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse categories. In contrast, DAVIS leverages a generative diffusion model and a Separation U-Net to synthesize separated magnitudes starting from Gaussian noises, conditioned on both the audio mixture and the visual footage. With its generative objective, DAVIS is better suited to achieving the goal of high-quality sound separation across diverse categories. We compare DAVIS to existing state-of-the-art discriminative audio-visual separation methods on the domain-specific MUSIC dataset and the open-domain AVE dataset, and results show that DAVIS outperforms other methods in separation quality, demonstrating the advantages of our framework for tackling the audio-visual source separation task. △ Less

Submitted 31 July, 2023; originally announced August 2023.

arXiv:2305.13774 [pdf, other]

ADD 2023: the Second Audio Deepfake Detection Challenge

Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.08038 [pdf, other]

Orthogonal AMP for Problems with Multiple Measurement Vectors and/or Multiple Transforms

Authors: Yiyao Cheng, Lei Liu, Shansuo Liang, Jonathan. H. Manton, Li Ping

Abstract: Approximate message passing (AMP) algorithms break a (high-dimensional) statistical problem into parts then repeatedly solve each part in turn, akin to alternating projections. A distinguishing feature is their asymptotic behaviours can be accurately predicted via their associated state evolution equations. Orthogonal AMP (OAMP) was recently developed to avoid the need for computing the so-called… ▽ More Approximate message passing (AMP) algorithms break a (high-dimensional) statistical problem into parts then repeatedly solve each part in turn, akin to alternating projections. A distinguishing feature is their asymptotic behaviours can be accurately predicted via their associated state evolution equations. Orthogonal AMP (OAMP) was recently developed to avoid the need for computing the so-called Onsager term in traditional AMP algorithms, providing two clear benefits: the derivation of an OAMP algorithm is both straightforward and more broadly applicable. OAMP was originally demonstrated for statistical problems with a single measurement vector and single transform. This paper extends OAMP to statistical problems with multiple measurement vectors (MMVs) and multiple transforms (MTs). We name the resulting algorithms as OAMP-MMV and OAMP-MT respectively, and their combination as augmented OAMP (A-OAMP). Whereas the extension of traditional AMP algorithms to such problems would be challenging, the orthogonal principle underpinning OAMP makes these extensions straightforward. The MMV and MT models are widely applicable to signal processing and communications. We present an example of MIMO relay system with correlated source data and signal clipping, which can be modelled as a joint MMV-MT system. While existing methods meet with difficulties in this example, OAMP offers an efficient solution with excellent performance. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.15299 [pdf, other]

Resilient Output Consensus Control of Heterogeneous Multi-agent Systems against Byzantine Attacks: A Twin Layer Approach

Authors: Xin Gong, Yiwen Liang, Yukang Cui, Shi Liang, Tingwen Huang

Abstract: This paper studies the problem of cooperative control of heterogeneous multi-agent systems (MASs) against Byzantine attacks. The agent affected by Byzantine attacks sends different wrong values to all neighbors while applying wrong input signals for itself, which is aggressive and difficult to be defended. Inspired by the concept of Digital Twin, a new hierarchical protocol equipped with a virtual… ▽ More This paper studies the problem of cooperative control of heterogeneous multi-agent systems (MASs) against Byzantine attacks. The agent affected by Byzantine attacks sends different wrong values to all neighbors while applying wrong input signals for itself, which is aggressive and difficult to be defended. Inspired by the concept of Digital Twin, a new hierarchical protocol equipped with a virtual twin layer (TL) is proposed, which decouples the above problems into the defense scheme against Byzantine edge attacks on the TL and the defense scheme against Byzantine node attacks on the cyber-physical layer (CPL). On the TL, we propose a resilient topology reconfiguration strategy by adding a minimum number of key edges to improve network resilience. It is strictly proved that the control strategy is sufficient to achieve asymptotic consensus in finite time with the topology on the TL satisfying strongly $(2f+1)$-robustness. On the CPL, decentralized chattering-free controllers are proposed to guarantee the resilient output consensus for the heterogeneous MASs against Byzantine node attacks. Moreover, the obtained controller shows exponential convergence. The effectiveness and practicality of the theoretical results are verified by numerical examples. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.12735 [pdf, other]

SMUG: Towards robust MRI reconstruction by smoothed unrolling

Authors: Hui Li, Jinghan Jia, Shijun Liang, Yuguang Yao, Saiprasad Ravishankar, Sijia Liu

Abstract: Although deep learning (DL) has gained much popularity for accelerated magnetic resonance imaging (MRI), recent studies have shown that DL-based MRI reconstruction models could be oversensitive to tiny input perturbations (that are called 'adversarial perturbations'), which cause unstable, low-quality reconstructed images. This raises the question of how to design robust DL methods for MRI reconst… ▽ More Although deep learning (DL) has gained much popularity for accelerated magnetic resonance imaging (MRI), recent studies have shown that DL-based MRI reconstruction models could be oversensitive to tiny input perturbations (that are called 'adversarial perturbations'), which cause unstable, low-quality reconstructed images. This raises the question of how to design robust DL methods for MRI reconstruction. To address this problem, we propose a novel image reconstruction framework, termed SMOOTHED UNROLLING (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning operation. RS, which improves the tolerance of a model against input noises, has been widely used in the design of adversarial defense for image classification. Yet, we find that the conventional design that applies RS to the entire DL process is ineffective for MRI reconstruction. We show that SMUG addresses the above issue by customizing the RS operation based on the unrolling architecture of the DL-based MRI reconstruction model. Compared to the vanilla RS approach and several variants of SMUG, we show that SMUG improves the robustness of MRI reconstruction with respect to a diverse set of perturbation sources, including perturbations to the input measurements, different measurement sampling rates, and different unrolling steps. Code for SMUG will be available at https://github.com/LGM70/SMUG. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.02088 [pdf, other]

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Abstract: Can machines recording an audio-visual scene produce realistic, matching audio-visual experiences at novel positions and novel view directions? We answer it by studying a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. Concretely, given a video recording of an audio-visual scene, the task is to synthesize new videos with s… ▽ More Can machines recording an audio-visual scene produce realistic, matching audio-visual experiences at novel positions and novel view directions? We answer it by studying a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. Concretely, given a video recording of an audio-visual scene, the task is to synthesize new videos with spatial audios along arbitrary novel camera trajectories in that scene. We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF, in which we implicitly associate audio generation with the 3D geometry and material properties of a visual environment. Furthermore, we present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields. To facilitate the study of this new task, we collect a high-quality Real-World Audio-Visual Scene (RWAVS) dataset. We demonstrate the advantages of our method on this real-world dataset and the simulation-based SoundSpaces dataset. △ Less

Submitted 16 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: NeurIPS 2023

arXiv:2301.09321 [pdf, ps, other]

Optimal Inter-area Oscillation Damping Control: A Transfer Deep Reinforcement Learning Approach with Switching Control Strategy

Authors: Siyuan Liang, Long Huo, Xin Chen, Peiyuan Sun

Abstract: Wide-area damping control for inter-area oscillation (IAO) is critical to modern power systems. The recent breakthroughs in deep learning and the broad deployment of phasor measurement units (PMU) promote the development of datadriven IAO damping controllers. In this paper, the damping control of IAOs is modeled as a Markov Decision Process (MDP) and solved by the proposed Deep Deterministic Polic… ▽ More Wide-area damping control for inter-area oscillation (IAO) is critical to modern power systems. The recent breakthroughs in deep learning and the broad deployment of phasor measurement units (PMU) promote the development of datadriven IAO damping controllers. In this paper, the damping control of IAOs is modeled as a Markov Decision Process (MDP) and solved by the proposed Deep Deterministic Policy Gradient (DDPG) based deep reinforcement learning (DRL) approach. The proposed approach optimizes the eigenvalue distribution of the system, which determines the IAO modes in nature. The eigenvalues are evaluated by the data-driven method called dynamic mode decomposition. For a given power system, only a subset of generators selected by participation factors needs to be controlled, alleviating the control and computing burdens. A Switching Control Strategy (SCS) is introduced to improve the transient response of IAOs. Numerical simulations of the IEEE-39 New England power grid model validate the effectiveness and advanced performance of the proposed approach as well as its robustness against communication delays. In addition, we demonstrate the transfer ability of the DRL model trained on the linearized power grid model to provide effective IAO damping control in the non-linear power grid model environment. △ Less

Submitted 23 January, 2023; originally announced January 2023.

arXiv:2211.03577 [pdf]

Regrowth-free AlGaInAs MQW polarization controller integrated with sidewall grating DFB laser

Authors: Xiao Sun, Song Liang, Weiqing Cheng, Shengwei Ye, Yiming Sun, Yongguang Huang, Ruikang Zhang, Jichuan Xiong, Xuefeng Liu, John H. Marsh, Lianping Hou

Abstract: We report an AlGaInAs multiple quantum well integrated source of polarization controlled light consisting of a polarization mode converter PMC, differential phase shifter(DPS), and a side wall grating distributed-feedback DFB laser. We demonstrate an asymmetrical stepped-height ridge waveguide PMC to realize TE to TM polarization conversion and a symmetrical straight waveguide DPS to enable polari… ▽ More We report an AlGaInAs multiple quantum well integrated source of polarization controlled light consisting of a polarization mode converter PMC, differential phase shifter(DPS), and a side wall grating distributed-feedback DFB laser. We demonstrate an asymmetrical stepped-height ridge waveguide PMC to realize TE to TM polarization conversion and a symmetrical straight waveguide DPS to enable polarization rotation from approximately counterclockwise circular polarization to linear polarization. Based on the identical epitaxial layer scheme, all of the PMC, DPS, and DFB laser can be integrated monolithically using only a single step of metalorganic vapor phase epitaxy and two steps of III V material dry etching. For the DFB-PMC device, a high TE to TM polarization conversion efficiency 98% over a wide range of DFB injection currents is reported at 1555 nm wavelength. For the DFB-PMC-DPS device, a 60 degree rotation of the Stokes vector was obtained on the Poincaré sphere with a range of bias voltage from 0 V to -4.0 V at IDFB is 170 mA. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2210.10519

arXiv:2206.14635 [pdf, ps, other]

Prespecified-time observer-based distributed control of battery energy storage systems

Authors: Wu Yang, Shu-Ming Liang, Yan-Wu Wang, Zhi-Wei Liu

Abstract: This paper studies the state-of-charge (SoC) balancing and the total charging/discharging power tracking issues for battery energy storage systems (BESSs) with multiple distributed heterogeneous battery units. Different from the traditional cooperative control strategies based on the asymptotical or finite-time distributed observers, two distributed prespecified-time observers are proposed to esti… ▽ More This paper studies the state-of-charge (SoC) balancing and the total charging/discharging power tracking issues for battery energy storage systems (BESSs) with multiple distributed heterogeneous battery units. Different from the traditional cooperative control strategies based on the asymptotical or finite-time distributed observers, two distributed prespecified-time observers are proposed to estimate average battery units state and average desired power, respectively, which can be determined in advance and independent of initial states or control parameters. Finally, two simulation examples are given to verify the effectiveness and superiority of the proposed control strategy. △ Less

Submitted 29 June, 2022; originally announced June 2022.

arXiv:2206.11680 [pdf, other]

Capacity Optimality of OAMP in Coded Large Unitarily Invariant Systems

Authors: Lei Liu, Shansuo Liang, Li Ping

Abstract: This paper investigates a large unitarily invariant system (LUIS) involving a unitarily invariant sensing matrix, an arbitrary fixed signal distribution, and forward error control (FEC) coding. Several area properties are established based on the state evolution of orthogonal approximate message passing (OAMP) in an un-coded LUIS. Under the assumptions that the state evolution for joint OAMP and F… ▽ More This paper investigates a large unitarily invariant system (LUIS) involving a unitarily invariant sensing matrix, an arbitrary fixed signal distribution, and forward error control (FEC) coding. Several area properties are established based on the state evolution of orthogonal approximate message passing (OAMP) in an un-coded LUIS. Under the assumptions that the state evolution for joint OAMP and FEC decoding is correct and the replica method is reliable, we analyze the achievable rate of OAMP. We prove that OAMP reaches the constrained capacity predicted by the replica method of the LUIS with an arbitrary signal distribution based on matched FEC coding. Meanwhile, we elaborate a constrained capacity-achieving coding principle for LUIS, based on which irregular low-density parity-check (LDPC) codes are optimized for binary signaling in the simulation results. We show that OAMP with the optimized codes has significant performance improvement over the un-optimized ones and the well-known Turbo linear MMSE algorithm. For quadrature phase-shift keying (QPSK) modulation, constrained capacity-approaching bit error rate (BER) performances are observed under various channel conditions. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: Accepted by the 2022 IEEE International Symposium on Information Theory (ISIT). arXiv admin note: substantial text overlap with arXiv:2108.08503

arXiv:2206.10861 [pdf, other]

UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022

Authors: Yuanhang Zhang, Susan Liang, Shuang Yang, Shiguang Shan

Abstract: This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022. Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon which are designed for robust scene-level ASD. We augment the architecture with a simple GRU-based module that allows information… ▽ More This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022. Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon which are designed for robust scene-level ASD. We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes through read and update operations. We report a best result of 94.47% mAP on the AVA-ActiveSpeaker test set, which continues to rank first on this year's challenge leaderboard and significantly pushes the state-of-the-art. △ Less

Submitted 22 June, 2022; originally announced June 2022.

Comments: 5 pages, 3 figures; technical report for AVA Challenge (see https://research.google.com/ava/challenge.html) at the International Challenge on Activity Recognition (ActivityNet), CVPR 2022

arXiv:2206.00775 [pdf, other]

Adaptive Local Neighborhood-based Neural Networks for MR Image Reconstruction from Undersampled Data

Authors: Shijun Liang, Anish Lahiri, Saiprasad Ravishankar

Abstract: Recent medical image reconstruction techniques focus on generating high-quality medical images suitable for clinical use at the lowest possible cost and with the fewest possible adverse effects on patients. Recent works have shown significant promise for reconstructing MR images from sparsely sampled k-space data using deep learning. In this work, we propose a technique that rapidly estimates deep… ▽ More Recent medical image reconstruction techniques focus on generating high-quality medical images suitable for clinical use at the lowest possible cost and with the fewest possible adverse effects on patients. Recent works have shown significant promise for reconstructing MR images from sparsely sampled k-space data using deep learning. In this work, we propose a technique that rapidly estimates deep neural networks directly at reconstruction time by fitting them on small adaptively estimated neighborhoods of a training set. In brief, our algorithm alternates between searching for neighbors in a data set that are similar to the test reconstruction, and training a local network on these neighbors followed by updating the test reconstruction. Because our reconstruction model is learned on a dataset that is in some sense similar to the image being reconstructed rather than being fit on a large, diverse training set, it is more adaptive to new scans. It can also handle changes in training sets and flexible scan settings, while being relatively fast. Our approach, dubbed LONDN-MRI, was validated on multiple data sets using deep unrolled reconstruction networks. Reconstructions were performed at four fold and eight fold undersampling of k-space with 1D variable-density random phase-encode undersampling masks. Our results demonstrate that our proposed locally-trained method produces higher-quality reconstructions compared to models trained globally on larger datasets as well as other scan-adaptive methods. △ Less

Submitted 23 January, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2205.14942 [pdf]

doi 10.1109/TITS.2022.3158253

Edge YOLO: Real-Time Intelligent Object Detection System Based on Edge-Cloud Cooperation in Autonomous Vehicles

Authors: Siyuan Liang, Hao Wu

Abstract: Driven by the ever-increasing requirements of autonomous vehicles, such as traffic monitoring and driving assistant, deep learning-based object detection (DL-OD) has been increasingly attractive in intelligent transportation systems. However, it is difficult for the existing DL-OD schemes to realize the responsible, cost-saving, and energy-efficient autonomous vehicle systems due to low their inhe… ▽ More Driven by the ever-increasing requirements of autonomous vehicles, such as traffic monitoring and driving assistant, deep learning-based object detection (DL-OD) has been increasingly attractive in intelligent transportation systems. However, it is difficult for the existing DL-OD schemes to realize the responsible, cost-saving, and energy-efficient autonomous vehicle systems due to low their inherent defects of low timeliness and high energy consumption. In this paper, we propose an object detection (OD) system based on edge-cloud cooperation and reconstructive convolutional neural networks, which is called Edge YOLO. This system can effectively avoid the excessive dependence on computing power and uneven distribution of cloud computing resources. Specifically, it is a lightweight OD framework realized by combining pruning feature extraction network and compression feature fusion network to enhance the efficiency of multi-scale prediction to the largest extent. In addition, we developed an autonomous driving platform equipped with NVIDIA Jetson for system-level verification. We experimentally demonstrate the reliability and efficiency of Edge YOLO on COCO2017 and KITTI data sets, respectively. According to COCO2017 standard datasets with a speed of 26.6 frames per second (FPS), the results show that the number of parameters in the entire network is only 25.67 MB, while the accuracy (mAP) is up to 47.3%. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2205.07646 [pdf, other]

A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Authors: Liang Huang, Senjie Liang, Feiyang Ye, Nan Gao

Abstract: Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast A… ▽ More Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 9 pages, 4 figures

arXiv:2204.01731 [pdf, ps, other]

Gan-Based Joint Activity Detection and Channel Estimation For Grant-free Random Access

Authors: Shuang Liang, Yinan Zou, Yong Zhou

Abstract: Joint activity detection and channel estimation (JADCE) for grant-free random access is a critical issue that needs to be addressed to support massive connectivity in IoT networks. However, the existing model-free learning method can only achieve either activity detection or channel estimation, but not both. In this paper, we propose a novel model-free learning method based on generative adversari… ▽ More Joint activity detection and channel estimation (JADCE) for grant-free random access is a critical issue that needs to be addressed to support massive connectivity in IoT networks. However, the existing model-free learning method can only achieve either activity detection or channel estimation, but not both. In this paper, we propose a novel model-free learning method based on generative adversarial network (GAN) to tackle the JADCE problem. We adopt the U-net architecture to build the generator rather than the standard GAN architecture, where a pre-estimated value that contains the activity information is adopted as input to the generator. By leveraging the properties of the pseudoinverse, the generator is refined by using an affine projection and a skip connection to ensure the output of the generator is consistent with the measurement. Moreover, we build a two-layer fully-connected neural network to design pilot matrix for reducing the impact of receiver noise. Simulation results show that the proposed method outperforms the existing methods in high SNR regimes, as both data consistency projection and pilot matrix optimization improve the learning ability. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 5 pages, 5 figures IEEE ICASSP2022

arXiv:2203.03836 [pdf, other]

An Efficient Two-Stage SPARC Decoder for Massive MIMO Unsourced Random Access

Authors: Juntao You, Wenjie Wang, Shansuo Liang, Wei Han, Bo Bai

Abstract: In this paper, we study a concatenate coding scheme based on sparse regression code (SPARC) and tree code for unsourced random access in massive multiple-input and multiple-output systems. Our focus is concentrated on efficient decoding for the inner SPARC with practical concerns. A two-stage method is proposed to achieve near-optimal performance while maintaining low computational complexity. Spe… ▽ More In this paper, we study a concatenate coding scheme based on sparse regression code (SPARC) and tree code for unsourced random access in massive multiple-input and multiple-output systems. Our focus is concentrated on efficient decoding for the inner SPARC with practical concerns. A two-stage method is proposed to achieve near-optimal performance while maintaining low computational complexity. Specifically, a one-step thresholding-based algorithm is first used for reducing large dimensions of the SPARC decoding, after which a relaxed maximum-likelihood estimator is employed for refinement. Adequate simulation results are provided to validate the near-optimal performance and the low computational complexity. Besides, for covariance-based sparse recovery method, theoretical analyses are given to characterize the upper bound of the number of active users supported when convex relaxation is considered, and the probability of successful dimension reduction by the one-step thresholding-based algorithm. △ Less

Submitted 11 August, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 30 pages, 4 figures

arXiv:2203.00224 [pdf, other]

On Orthogonal Approximate Message Passing

Authors: Lei Liu, Yiyao Cheng, Shansuo Liang, Jonathan H. Manton, Li Ping

Abstract: Approximate Message Passing (AMP) is an efficient iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions, such as sparse systems. In AMP, a so-called Onsager term is added to keep estimation errors approximately Gaussian. Orthogonal AMP (OAMP) does not require this Onsager term, relying instead on an orthogonalization procedure to keep… ▽ More Approximate Message Passing (AMP) is an efficient iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions, such as sparse systems. In AMP, a so-called Onsager term is added to keep estimation errors approximately Gaussian. Orthogonal AMP (OAMP) does not require this Onsager term, relying instead on an orthogonalization procedure to keep the current errors uncorrelated with (i.e., orthogonal to) past errors. \LL{In this paper, we show the generality and significance of the orthogonality in ensuring that errors are "asymptotically independently and identically distributed Gaussian" (AIIDG).} This AIIDG property, which is essential for the attractive performance of OAMP, holds for separable functions. \LL{We present a simple and versatile procedure to establish the orthogonality through Gram-Schmidt (GS) orthogonalization, which is applicable to any prototype. We show that different AMP-type algorithms, such as expectation propagation (EP), turbo, AMP and OAMP, can be unified under the orthogonal principle.} The simplicity and generality of OAMP provide efficient solutions for estimation problems beyond the classical linear models. \LL{As an example, we study the optimization of OAMP via the GS model and GS orthogonalization.} More related applications will be discussed in a companion paper where new algorithms are developed for problems with multiple constraints and multiple measurement variables. △ Less

Submitted 13 January, 2023; v1 submitted 28 February, 2022; originally announced March 2022.

Comments: 15 pages, 2 figure

arXiv:2202.08433 [pdf, ps, other]

ADD 2022: the First Audio Deep Synthesis Detection Challenge

Authors: Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Xiaohui Zhang, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Abstract: Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake gam… ▽ More Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG). The LF track focuses on dealing with bona fide and fully fake utterances with various real-world noises etc. The PF track aims to distinguish the partially fake audio from the real. The FG track is a rivalry game, which includes two tasks: an audio generation task and an audio fake detection task. In this paper, we describe the datasets, evaluation metrics, and protocols. We also report major findings that reflect the recent advances in audio deepfake detection tasks. △ Less

Submitted 2 July, 2024; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by ICASSP 2022

arXiv:2201.09245 [pdf, other]

Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network

Authors: Peiyuan Sun, Long Huo, Siyuan Liang, Xin Chen

Abstract: Transient stability prediction is critically essential to the fast online assessment and maintaining the stable operation in power systems. The wide deployment of phasor measurement units (PMUs) promotes the development of data-driven approaches for transient stability assessment. This paper proposes the temporal and topological embedding deep neural network (TTEDNN) model to forecast transient st… ▽ More Transient stability prediction is critically essential to the fast online assessment and maintaining the stable operation in power systems. The wide deployment of phasor measurement units (PMUs) promotes the development of data-driven approaches for transient stability assessment. This paper proposes the temporal and topological embedding deep neural network (TTEDNN) model to forecast transient stability with the early transient dynamics. The TTEDNN model can accurately and efficiently predict the transient stability by extracting the temporal and topological features from the time-series data of the early transient dynamics. The grid-informed adjacency matrix is used to incorporate the power grid structural and electrical parameter information. The transient dynamics simulation environments under the single-node and multiple-node perturbations are used to test the performance of the TTEDNN model for the IEEE 39-bus and IEEE 118-bus power systems. The results show that the TTEDNN model has the best and most robust prediction performance. Furthermore, the TTEDNN model also demonstrates the transfer capability to predict the transient stability in the more complicated transient dynamics simulation environments. △ Less

Submitted 23 January, 2022; originally announced January 2022.

arXiv:2112.14839 [pdf, ps, other]

An overview of the quantitative causality analysis and causal graph reconstruction based on a rigorous formalism of information flow

Authors: X. San Liang

Abstract: Inference of causal relations from data now has become an important field in artificial intelligence. During the past 16 years, causality analysis (in a quantitative sense) has been developed independently in physics from first principles. This short note is a brief summary of this line of work, including part of the theory and several representative applications. Inference of causal relations from data now has become an important field in artificial intelligence. During the past 16 years, causality analysis (in a quantitative sense) has been developed independently in physics from first principles. This short note is a brief summary of this line of work, including part of the theory and several representative applications. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Comments: 7 pages, 1 figure. Presented at the First International AIxIA Workshop on Causality, Causal-ITALY, Italian Conference on Artificial Intelligence, November 30, 2021

arXiv:2112.02629 [pdf, other]

A Tensor-BTD-based Modulation for Massive Unsourced Random Access

Authors: Zhenting Luan, Yuchi Wu, Shansuo Liang, Liping Zhang, Wei Han, Bo Bai

Abstract: In this letter, we propose a novel tensor-based modulation scheme for massive unsourced random access. The proposed modulation can be deemed as a summation of third-order tensors, of which the factors are representatives of subspaces. A constellation design based on high-dimensional Grassmann manifold is presented for information encoding. The uniqueness of tensor decomposition provides theoretica… ▽ More In this letter, we propose a novel tensor-based modulation scheme for massive unsourced random access. The proposed modulation can be deemed as a summation of third-order tensors, of which the factors are representatives of subspaces. A constellation design based on high-dimensional Grassmann manifold is presented for information encoding. The uniqueness of tensor decomposition provides theoretical guarantee for active user separation. Simulation results show that our proposed method outperforms the state-of-the-art tensor-based modulation. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2111.10006 [pdf]

Image enhancement in acoustic-resolution photoacoustic microscopy enabled by a novel directional algorithm

Authors: Fei Feng, Siqi Liang, Sung-Liang Chen

Abstract: Acoustic-resolution photoacoustic microscopy (AR-PAM) is a promising tool for microvascular imaging. In the focal region, resolution of AR-PAM is determined by the ultrasound transducer and ultimately limited by acoustic diffraction. In the out-of-focus region, resolution deteriorates with increasing distance from the focal plane, which restricts depth of focus (DOF). Besides, a trade-off exists b… ▽ More Acoustic-resolution photoacoustic microscopy (AR-PAM) is a promising tool for microvascular imaging. In the focal region, resolution of AR-PAM is determined by the ultrasound transducer and ultimately limited by acoustic diffraction. In the out-of-focus region, resolution deteriorates with increasing distance from the focal plane, which restricts depth of focus (DOF). Besides, a trade-off exists between resolution and DOF. Previously, synthetic aperture focusing technique (SAFT) and/or deconvolution methods have been demonstrated to enhance AR-PAM images. However, they suffer from issues in low resolution, low signal-to-noise ratio (SNR), and/or poor image fidelity. Here, we propose a novel algorithm for AR-PAM to enhance image resolution, SNR, and fidelity. The algorithm consists of a Fourier accumulation SAFT (FA-SAFT) and a directional model-based (D-MB) deconvolution method. Inspired from Fourier denoising technique and directional SAFT, FA-SAFT mainly compensates for the defocusing effect. Besides, D-MB deconvolution enhances the resolution as well as preserves the image fidelity, especially for the objects with line patterns such as microvasculature. Full width at half maximum of 26-31 um over DOF of 1.8 mm and minimum resolvable distance of 46-49 um are experimentally achieved by imaging tungsten wire phantom. Moreover, imaging of leaf skeleton phantom and in vivo imaging of mouse blood vessels also prove that our algorithm is capable of providing high-resolution, high-SNR, and good-fidelity results for complex structures and for in vivo applications. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: 34 pages (including 16 pages of supplementary materials)

arXiv:2108.09116 [pdf, other]

Sparse Signal Processing for Massive Connectivity via Mixed-Integer Programming

Authors: Shuang Liang, Yuanming Shi, Yong Zhou

Abstract: Massive connectivity is a critical challenge of Internet of Things (IoT) networks. In this paper, we consider the grant-free uplink transmission of an IoT network with a multi-antenna base station (BS) and a large number of single-antenna IoT devices. Due to the sporadic nature of IoT devices, we formulate the joint activity detection and channel estimation (JADCE) problem as a group-sparse matrix… ▽ More Massive connectivity is a critical challenge of Internet of Things (IoT) networks. In this paper, we consider the grant-free uplink transmission of an IoT network with a multi-antenna base station (BS) and a large number of single-antenna IoT devices. Due to the sporadic nature of IoT devices, we formulate the joint activity detection and channel estimation (JADCE) problem as a group-sparse matrix estimation problem. Although many algorithms have been proposed to solve the JADCE problem, most of them are developed based on compressive sensing technique, yielding suboptimal solutions. In this paper, we first develop an efficient weighted $l_1$-norm minimization algorithm to better approximate the group sparsity than the existing mixed $l_1/l_2$-norm minimization. Although an enhanced estimation performance in terms of the mean squared error (MSE) can be achieved, the weighted $l_1$-norm minimization algorithm is still a convex relaxation of the original group-sparse matrix estimation problem, yielding a suboptimal solution. To this end, we further reformulate the JADCE problem as a mixed integer programming (MIP) problem, which can be solved by using the branch-and-bound method. As a result, we are able to obtain an optimal solution of the JADCE problem, which can be adopted as an upper bound to evaluate the effectiveness of the existing algorithms. Moreover, we also derive the minimum pilot sequence length required to fully recover the estimated matrix in the noiseless scenario. Simulation results show the performance gains of the proposed optimal algorithm over the proposed weighted $l_1$-norm algorithm and the conventional mixed $l_1/l_2$-norm algorithm. Results also show that the proposed algorithms require a short pilot sequence than the conventional algorithm to achieve the same estimation performance. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: IEEE/CIC ICCC 2021

arXiv:2108.08503 [pdf, other]

On Capacity Optimality of OAMP: Beyond IID Sensing Matrices and Gaussian Signaling

Authors: Lei Liu, Shansuo Liang, Li Ping

Abstract: This paper investigates a large unitarily invariant system (LUIS) involving a unitarily invariant sensing matrix, an arbitrarily fixed signal distribution, and forward error control (FEC) coding. A universal Gram-Schmidt orthogonalization is considered for constructing orthogonal approximate message passing (OAMP), enabling its applicability to a wide range of prototypes without the constraint of… ▽ More This paper investigates a large unitarily invariant system (LUIS) involving a unitarily invariant sensing matrix, an arbitrarily fixed signal distribution, and forward error control (FEC) coding. A universal Gram-Schmidt orthogonalization is considered for constructing orthogonal approximate message passing (OAMP), enabling its applicability to a wide range of prototypes without the constraint of differentiability. We develop two single-input-single-output variational transfer functions for OAMP with Lipschitz continuous local estimators, facilitating an analysis of achievable rates. Furthermore, when the state evolution of OAMP has a unique fixed point, we reveal that OAMP can achieve the constrained capacity predicted by the replica method of LUIS based on matched FEC coding, regardless of the signal distribution. The replica method is rigorously validated for LUIS with Gaussian signaling and certain sub-classes of LUIS with arbitrary signal distributions. Several area properties are established based on the variational transfer functions of OAMP. Meanwhile, we present a replica constrained capacity-achieving coding principle for LUIS. This principle serves as the basis for optimizing irregular low-density parity-check (LDPC) codes specifically tailored for binary signaling in our simulation results. The performance of OAMP with these optimized codes exhibits a remarkable improvement over the unoptimized codes and even surpasses the well-known Turbo-LMMSE algorithm. For quadrature phase-shift keying (QPSK) modulation, we observe bit error rates (BER) performance near the replica constrained capacity across diverse channel conditions. △ Less

Submitted 9 November, 2023; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: Double columns, 17 pages, 9 figures

arXiv:2108.02607 [pdf, other]

doi 10.1145/3474085.3475275

UniCon: Unified Context Network for Robust Active Speaker Detection

Authors: Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen

Abstract: We introduce a new efficient framework, the Unified Context Network (UniCon), for robust active speaker detection (ASD). Traditional methods for ASD usually operate on each candidate's pre-cropped face track separately and do not sufficiently consider the relationships among the candidates. This potentially limits performance, especially in challenging scenarios with low-resolution faces, multiple… ▽ More We introduce a new efficient framework, the Unified Context Network (UniCon), for robust active speaker detection (ASD). Traditional methods for ASD usually operate on each candidate's pre-cropped face track separately and do not sufficiently consider the relationships among the candidates. This potentially limits performance, especially in challenging scenarios with low-resolution faces, multiple candidates, etc. Our solution is a novel, unified framework that focuses on jointly modeling multiple types of contextual information: spatial context to indicate the position and scale of each candidate's face, relational context to capture the visual relationships among the candidates and contrast audio-visual affinities with each other, and temporal context to aggregate long-term information and smooth out local uncertainties. Based on such information, our model optimizes all candidates in a unified process for robust and reliable ASD. A thorough ablation study is performed on several challenging ASD benchmarks under different settings. In particular, our method outperforms the state-of-the-art by a large margin of about 15% mean Average Precision (mAP) absolute on two challenging subsets: one with three candidate speakers, and the other with faces smaller than 64 pixels. Together, our UniCon achieves 92.0% mAP on the AVA-ActiveSpeaker validation set, surpassing 90% for the first time on this challenging dataset at the time of submission. Project website: https://unicon-asd.github.io/. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 10 pages, 6 figures; to appear at ACM Multimedia 2021

arXiv:2107.03904 [pdf, other]

A hybrid deep learning framework for Covid-19 detection via 3D Chest CT Images

Authors: Shuang Liang

Abstract: In this paper, we present a hybrid deep learning framework named CTNet which combines convolutional neural network and transformer together for the detection of COVID-19 via 3D chest CT images. It consists of a CNN feature extractor module with SE attention to extract sufficient features from CT scans, together with a transformer model to model the discriminative features of the 3D CT scans. Compa… ▽ More In this paper, we present a hybrid deep learning framework named CTNet which combines convolutional neural network and transformer together for the detection of COVID-19 via 3D chest CT images. It consists of a CNN feature extractor module with SE attention to extract sufficient features from CT scans, together with a transformer model to model the discriminative features of the 3D CT scans. Compared to previous works, CTNet provides an effective and efficient method to perform COVID-19 diagnosis via 3D CT scans with data resampling strategy. Advanced results on a large and public benchmarks, COV19-CT-DB database was achieved by the proposed CTNet, over the state-of-the-art baseline approachproposed together with the dataset. △ Less

Submitted 9 July, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 5 pages, 1 figure, 2 tables

arXiv:2105.12718 [pdf]

Magnetic Particle Spectroscopy (MPS) with One-stage Lock-in Implementation for Magnetic Bioassays with Improved Sensitivities

Authors: Vinit Kumar Chugh, Kai Wu, Venkatramana D. Krishna, Arturo di Girolamo, Robert P. Bloom, Yongqiang Andrew Wang, Renata Saha, Shuang Liang, Maxim C-J Cheeran, Jian-Ping Wang

Abstract: In recent years, magnetic particle spectroscopy (MPS) has become a highly sensitive and versatile sensing technique for quantitative bioassays. It relies on the dynamic magnetic responses of magnetic nanoparticles (MNPs) for the detection of target analytes in liquid phase. There are many research studies reporting the application of MPS for detecting a variety of analytes including viruses, toxin… ▽ More In recent years, magnetic particle spectroscopy (MPS) has become a highly sensitive and versatile sensing technique for quantitative bioassays. It relies on the dynamic magnetic responses of magnetic nanoparticles (MNPs) for the detection of target analytes in liquid phase. There are many research studies reporting the application of MPS for detecting a variety of analytes including viruses, toxins, and nucleic acids, etc. Herein, we report a modified version of MPS platform with the addition of a one-stage lock-in design to remove the feedthrough signals induced by external driving magnetic fields, thus capturing only MNP responses for improved system sensitivity. This one-stage lock-in MPS system is able to detect as low as 781 ng multi-core Nanomag50 iron oxide MNPs (micromod Partikeltechnologie GmbH) and 78 ng single-core SHB30 iron oxide MNPs (Ocean NanoTech). In addition, using a streptavidin-biotin binding system as a proof-of-concept, we show that these single-core SHB30 MNPs can be used for Brownian relaxation-based bioassays while the multi-core Nanomag50 cannot be used. The effects of MNP amount on the concentration dependent response profiles for detecting streptavidin was also investigated. Results show that by using lower concentration/amount of MNPs, concentration-response curves shift to lower concentration/amount of target analytes. This lower concentrationresponse indicates the possibility of improved bioassay sensitivities by using lower amounts of MNPs. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: 26 Pages, 11 Figures

arXiv:2104.10832 [pdf, other]

Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss

Authors: Yaogen Yang, Haozhe Zhang, Xiaoyi Qin, Shanshan Liang, Huahua Cui, Mingyang Xu, Ming Li

Abstract: Building cross-lingual voice conversion (VC) systems for multiple speakers and multiple languages has been a challenging task for a long time. This paper describes a parallel non-autoregressive network to achieve bilingual and code-switched voice conversion for multiple speakers when there are only mono-lingual corpora for each language. We achieve cross-lingual VC between Mandarin speech with mul… ▽ More Building cross-lingual voice conversion (VC) systems for multiple speakers and multiple languages has been a challenging task for a long time. This paper describes a parallel non-autoregressive network to achieve bilingual and code-switched voice conversion for multiple speakers when there are only mono-lingual corpora for each language. We achieve cross-lingual VC between Mandarin speech with multiple speakers and English speech with multiple speakers by applying bilingual bottleneck features. To boost voice cloning performance, we use an adversarial speaker classifier with a gradient reversal layer to reduce the source speaker's information from the output of encoder. Furthermore, in order to improve speaker similarity between reference speech and converted speech, we adopt an embedding consistency loss between the synthesized speech and its natural reference speech in our network. Experimental results show that our proposed method can achieve high quality converted speech with mean opinion score (MOS) around 4. The conversion system performs well in terms of speaker similarity for both in-set speaker conversion and out-set-of one-shot conversion. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: Submitted to Interspeech 2021

arXiv:2012.03500 [pdf, other]

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Authors: Chenfeng Miao, Shuang Liang, Zhencheng Liu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao

Abstract: In this work, we address the Text-to-Speech (TTS) task by proposing a non-autoregressive architecture called EfficientTTS. Unlike the dominant non-autoregressive TTS models, which are trained with the need of external aligners, EfficientTTS optimizes all its parameters with a stable, end-to-end training procedure, while allowing for synthesizing high quality speech in a fast and efficient manner.… ▽ More In this work, we address the Text-to-Speech (TTS) task by proposing a non-autoregressive architecture called EfficientTTS. Unlike the dominant non-autoregressive TTS models, which are trained with the need of external aligners, EfficientTTS optimizes all its parameters with a stable, end-to-end training procedure, while allowing for synthesizing high quality speech in a fast and efficient manner. EfficientTTS is motivated by a new monotonic alignment modeling approach (also introduced in this work), which specifies monotonic constraints to the sequence alignment with almost no increase of computation. By combining EfficientTTS with different feed-forward network structures, we develop a family of TTS models, including both text-to-melspectrogram and text-to-waveform networks. We experimentally show that the proposed models significantly outperform counterpart models such as Tacotron 2 and Glow-TTS in terms of speech quality, training efficiency and synthesis speed, while still producing the speeches of strong robustness and great diversity. In addition, we demonstrate that proposed approach can be easily extended to autoregressive models such as Tacotron 2. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 15 pages, 9 figures

arXiv:2010.14035 [pdf, other]

Two-Parametric Nyquist Pulses with Better Performance Based on Inverse Hyperbolic Functions

Authors: Songbing Liang, Stylianos D. Assimonis

Abstract: In this article, three new inter-symbol interference (ISI)-free pulses with enhanced performance compared to the state-of-the-art are proposed and studied in terms of frequency and time domain characteristics. They are based on inverse hyperbolic functions and on the concept of inner and outer functions, which was first introduced by the authors. New pulses are two-parametric, i.e., their design d… ▽ More In this article, three new inter-symbol interference (ISI)-free pulses with enhanced performance compared to the state-of-the-art are proposed and studied in terms of frequency and time domain characteristics. They are based on inverse hyperbolic functions and on the concept of inner and outer functions, which was first introduced by the authors. New pulses are two-parametric, i.e., their design depends only on the roll-off factor and the timing jitter parameter, and they outperform most of the well-known pulses reported in the literature, since they present lower error probability, smaller maximum distortion and wider eye-diagram. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 6 pages, 3 figures, submitted to the IEEE Communication Letters

arXiv:2009.08605 [pdf, other]

Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer

Authors: Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, Zhongfeng Wang

Abstract: Designing hardware accelerators for deep neural networks (DNNs) has been much desired. Nonetheless, most of these existing accelerators are built for either convolutional neural networks (CNNs) or recurrent neural networks (RNNs). Recently, the Transformer model is replacing the RNN in the natural language processing (NLP) area. However, because of intensive matrix computations and complicated dat… ▽ More Designing hardware accelerators for deep neural networks (DNNs) has been much desired. Nonetheless, most of these existing accelerators are built for either convolutional neural networks (CNNs) or recurrent neural networks (RNNs). Recently, the Transformer model is replacing the RNN in the natural language processing (NLP) area. However, because of intensive matrix computations and complicated data flow being involved, the hardware design for the Transformer model has never been reported. In this paper, we propose the first hardware accelerator for two key components, i.e., the multi-head attention (MHA) ResBlock and the position-wise feed-forward network (FFN) ResBlock, which are the two most complex layers in the Transformer. Firstly, an efficient method is introduced to partition the huge matrices in the Transformer, allowing the two ResBlocks to share most of the hardware resources. Secondly, the computation flow is well designed to ensure the high hardware utilization of the systolic array, which is the biggest module in our design. Thirdly, complicated nonlinear functions are highly optimized to further reduce the hardware complexity and also the latency of the entire system. Our design is coded using hardware description language (HDL) and evaluated on a Xilinx FPGA. Compared with the implementation on GPU with the same setting, the proposed design demonstrates a speed-up of 14.6x in the MHA ResBlock, and 3.4x in the FFN ResBlock, respectively. Therefore, this work lays a good foundation for building efficient hardware accelerators for multiple Transformer networks. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: 6 pages, 8 figures. This work has been accepted by IEEE SOCC (System-on-chip Conference) 2020, and peresnted by Siyuan Lu in SOCC2020. It also received the Best Paper Award in the Methdology Track in this conference

arXiv:2008.00217 [pdf, other]

Efficient Adversarial Attacks for Visual Object Tracking

Authors: Siyuan Liang, Xingxing Wei, Siyuan Yao, Xiaochun Cao

Abstract: Visual object tracking is an important task that requires the tracker to find the objects quickly and accurately. The existing state-ofthe-art object trackers, i.e., Siamese based trackers, use DNNs to attain high accuracy. However, the robustness of visual tracking models is seldom explored. In this paper, we analyze the weakness of object trackers based on the Siamese network and then extend adv… ▽ More Visual object tracking is an important task that requires the tracker to find the objects quickly and accurately. The existing state-ofthe-art object trackers, i.e., Siamese based trackers, use DNNs to attain high accuracy. However, the robustness of visual tracking models is seldom explored. In this paper, we analyze the weakness of object trackers based on the Siamese network and then extend adversarial examples to visual object tracking. We present an end-to-end network FAN (Fast Attack Network) that uses a novel drift loss combined with the embedded feature loss to attack the Siamese network based trackers. Under a single GPU, FAN is efficient in the training speed and has a strong attack performance. The FAN can generate an adversarial example at 10ms, achieve effective targeted attack (at least 40% drop rate on OTB) and untargeted attack (at least 70% drop rate on OTB). △ Less

Submitted 1 August, 2020; originally announced August 2020.

Journal ref: eccv 2020

arXiv:1909.07820 [pdf, other]

Data Centers Job Scheduling with Deep Reinforcement Learning

Authors: Sisheng Liang, Zhou Yang, Fang Jin, Yong Chen

Abstract: Efficient job scheduling on data centers under heterogeneous complexity is crucial but challenging since it involves the allocation of multi-dimensional resources over time and space. To adapt the complex computing environment in data centers, we proposed an innovative Advantage Actor-Critic (A2C) deep reinforcement learning based approach called A2cScheduler for job scheduling. A2cScheduler consi… ▽ More Efficient job scheduling on data centers under heterogeneous complexity is crucial but challenging since it involves the allocation of multi-dimensional resources over time and space. To adapt the complex computing environment in data centers, we proposed an innovative Advantage Actor-Critic (A2C) deep reinforcement learning based approach called A2cScheduler for job scheduling. A2cScheduler consists of two agents, one of which, dubbed the actor, is responsible for learning the scheduling policy automatically and the other one, the critic, reduces the estimation error. Unlike previous policy gradient approaches, A2cScheduler is designed to reduce the gradient estimation variance and to update parameters efficiently. We show that the A2cScheduler can achieve competitive scheduling performance using both simulated workloads and real data collected from an academic data center. △ Less

Submitted 1 March, 2020; v1 submitted 15 September, 2019; originally announced September 2019.

Comments: 13 pages

Showing 1–50 of 52 results for author: Liang, S