-
Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting
Authors:
Shuai Wang,
Dehao Zhang,
Kexin Shi,
Yuchen Wang,
Wenjie Wei,
Jibin Wu,
Malu Zhang
Abstract:
Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative…
▽ More
Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative modules: 1) Global-Local Spiking Convolution (GLSC) module and 2) Bottleneck-PLIF module. Compared to the hand-crafted feature extraction methods, the GLSC module achieves speech feature extraction that is sparser, more energy-efficient, and yields better performance. The Bottleneck-PLIF module further processes the signals from GLSC with the aim to achieve higher accuracy with fewer parameters. Extensive experiments are conducted on the Google Speech Commands Dataset (V1 and V2). The results show our method achieves competitive performance among SNN-based KWS models with fewer parameters.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
A Study of the Latest Updates of the Readout System for the Hybird-Pixel Detector at HEPS
Authors:
Hangxu Li,
Jie Zhang,
Wei Wei,
Zhenjie Li,
Xiaolu Ji,
Yan Zhang,
Xuanzheng Yang,
Shuihan Zhang,
Xueke Ma,
Peng Liu,
Zheng Wang,
Yuanbai Chen
Abstract:
The High Energy Photon Source (HEPS) represents a fourth-generation light source. This facility has made unprecedented advancements in accelerator technology, necessitating the development of new detectors to satisfy physical requirements such as single-photon resolution, large dynamic range, and high frame rates. Since 2016, the Institute of High Energy Physics has introduced the first user-exper…
▽ More
The High Energy Photon Source (HEPS) represents a fourth-generation light source. This facility has made unprecedented advancements in accelerator technology, necessitating the development of new detectors to satisfy physical requirements such as single-photon resolution, large dynamic range, and high frame rates. Since 2016, the Institute of High Energy Physics has introduced the first user-experimental hybrid pixel detector, progressing to the fourth-generation million-pixel detector designed for challenging conditions, with the dual-threshold single-photon detector HEPS-Beijing PIXel (HEPS-BPIX) set as the next-generation target. HEPS-BPIX will employ the entirely new Application-Specific Integrated Circuit (ASIC) BP40 for pixel information readout. Data flow will be managed and controlled through readout electronics based on a two-tier Field-Programmable Gate Array (FPGA) system: the Front-End Electronics (FEE) and the Input-Output Board (IOB) handle the fan-out for 12 ASICs, and the u4FCP is tasked with processing serial data on high-speed links, transferring pixel-level data to the back-end RTM and uTCA chassis, or independently outputting through a network port, enabling remote control of the entire detector. The new HEPS-BPIX firmware has undergone a comprehensive redesign and update to meet the electronic characteristics of the new chip and to improve the overall performance of the detector. We provide an overview of the core subunits of HEPS-BPIX, emphasizing the readout system, evaluating the new hardware and firmware, and highlighting some of its innovative features and characteristics.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A Novel Mutual Insurance Model for Hedging Against Cyber Risks in Power Systems Deploying Smart Technologies
Authors:
Pikkin Lau,
Lingfeng Wang,
Wei Wei,
Zhaoxi Liu,
Chee-Wooi Ten
Abstract:
In this paper, a novel cyber-insurance model design is proposed based on system risk evaluation with smart technology applications. The cyber insurance policy for power systems is tailored via cyber risk modeling, reliability impact analysis, and insurance premium calculation. A stochastic Epidemic Network Model is developed to evaluate the cyber risk by propagating cyberattacks among graphical vu…
▽ More
In this paper, a novel cyber-insurance model design is proposed based on system risk evaluation with smart technology applications. The cyber insurance policy for power systems is tailored via cyber risk modeling, reliability impact analysis, and insurance premium calculation. A stochastic Epidemic Network Model is developed to evaluate the cyber risk by propagating cyberattacks among graphical vulnerabilities. Smart technologies deployed in risk modeling include smart monitoring and job thread assignment. Smart monitoring boosts the substation availability against cyberattacks with preventive and corrective measures. The job thread assignment solution reduces the execution failures by distributing the control and monitoring tasks to multiple threads. Reliability assessment is deployed to estimate load losses convertible to monetary losses. These monetary losses would be shared through a mutual insurance plan. To ensure a fair distribution of indemnity, a new Shapley mutual insurance principle is devised. Effectiveness of the proposed Shapley mutual insurance design is validated via case studies. The Shapley premium is compared with existent premium designs. It is shown that the Shapley premium has high indemnity levels closer to those of Tail Conditional Expectation premium. Meanwhile, the Shapley premium is nearly as affordable as the coalitional premium and keeps a relatively low insolvency probability.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Sample Robust Scheduling of Electricity-Gas Systems Under Wind Power Uncertainty
Authors:
Rong-Peng Liu,
Yunhe Hou,
Yujia Li,
Shunbo Lei,
Wei Wei,
Xiaozhe Wang
Abstract:
This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of…
▽ More
This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of the simplified model by exploring its structural features and, accordingly, develop a solution method.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection
Authors:
Wenqing Wei,
Zhengdong Yang,
Yuan Gao,
Jiyi Li,
Chenhui Chu,
Shogo Okada,
Sheng Li
Abstract:
The early-stage Alzheimer's disease (AD) detection has been considered an important field of medical studies. Like traditional machine learning methods, speech-based automatic detection also suffers from data privacy risks because the data of specific patients are exclusive to each medical institution. A common practice is to use federated learning to protect the patients' data privacy. However, i…
▽ More
The early-stage Alzheimer's disease (AD) detection has been considered an important field of medical studies. Like traditional machine learning methods, speech-based automatic detection also suffers from data privacy risks because the data of specific patients are exclusive to each medical institution. A common practice is to use federated learning to protect the patients' data privacy. However, its distributed learning process also causes performance reduction. To alleviate this problem while protecting user privacy, we propose a federated contrastive pre-training (FedCPC) performed before federated training for AD speech detection, which can learn a better representation from raw data and enables different clients to share data in the pre-training and training stages. Experimental results demonstrate that the proposed methods can achieve satisfactory performance while preserving data privacy.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning
Authors:
Dichucheng Li,
Yinghao Ma,
Weixing Wei,
Qiuqiang Kong,
Yulun Wu,
Mingjin Che,
Fan Xia,
Emmanouil Benetos,
Wei Li
Abstract:
Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. This approach addresse…
▽ More
Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. This approach addresses data scarcity and class imbalance challenges. Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks. Additionally, we apply a post-processing approach for event-level prediction, where an IPT activation initiates an event only if the onset output confirms an onset in that frame. Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets. Further experiments demonstrate the efficacy of multi-task finetuning on each IPT class.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Parallel in-memory wireless computing
Authors:
Cong Wang,
Gong-Jie Ruan,
Zai-Zheng Yang,
Xing-Jian Yangdong,
Yixiang Li,
Liang Wu,
Yingmeng Ge,
Yichen Zhao,
Chen Pan,
Wei Wei,
Li-Bo Wang,
Bin Cheng,
Zaichen Zhang,
Chuan Zhang,
Shi-Jun Liang,
Feng Miao
Abstract:
Parallel wireless digital communication with ultralow power consumption is critical for emerging edge technologies such as 5G and Internet of Things. However, the physical separation between digital computing units and analogue transmission units in traditional wireless technology leads to high power consumption. Here we report a parallel in-memory wireless computing scheme. The approach combines…
▽ More
Parallel wireless digital communication with ultralow power consumption is critical for emerging edge technologies such as 5G and Internet of Things. However, the physical separation between digital computing units and analogue transmission units in traditional wireless technology leads to high power consumption. Here we report a parallel in-memory wireless computing scheme. The approach combines in-memory computing with wireless communication using memristive crossbar arrays. We show that the system can be used for the radio transmission of a binary stream of 480 bits with a bit error rate of 0. The in-memory wireless computing uses two orders of magnitude less power than conventional technology (based on digital-to-analogue and analogue-to-digital converters). We also show that the approach can be applied to acoustic and optical wireless communications
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Invisible Watermarking for Audio Generation Diffusion Models
Authors:
Xirong Cao,
Xiang Li,
Divyesh Jadav,
Yanzhao Wu,
Zhehui Chen,
Chen Zeng,
Wenqi Wei
Abstract:
Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains. In the rapidly evolving field of audio-based machine learning, safeguarding model integrity and establishing data copyright are of paramount importance. This paper presents the first waterm…
▽ More
Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains. In the rapidly evolving field of audio-based machine learning, safeguarding model integrity and establishing data copyright are of paramount importance. This paper presents the first watermarking technique applied to audio diffusion models trained on mel-spectrograms. This offers a novel approach to the aforementioned challenges. Our model excels not only in benign audio generation, but also incorporates an invisible watermarking trigger mechanism for model verification. This watermark trigger serves as a protective layer, enabling the identification of model ownership and ensuring its integrity. Through extensive experiments, we demonstrate that invisible watermark triggers can effectively protect against unauthorized modifications while maintaining high utility in benign audio generation tasks.
△ Less
Submitted 31 October, 2023; v1 submitted 22 September, 2023;
originally announced September 2023.
-
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Authors:
Zhihang Xu,
Shaofei Zhang,
Xi Wang,
Jiajun Zhang,
Wenning Wei,
Lei He,
Sheng Zhao
Abstract:
In this paper, we present MuLanTTS, the Microsoft end-to-end neural text-to-speech (TTS) system designed for the Blizzard Challenge 2023. About 50 hours of audiobook corpus for French TTS as hub task and another 2 hours of speaker adaptation as spoke task are released to build synthesized voices for different test purposes including sentences, paragraphs, homographs, lists, etc. Building upon Deli…
▽ More
In this paper, we present MuLanTTS, the Microsoft end-to-end neural text-to-speech (TTS) system designed for the Blizzard Challenge 2023. About 50 hours of audiobook corpus for French TTS as hub task and another 2 hours of speaker adaptation as spoke task are released to build synthesized voices for different test purposes including sentences, paragraphs, homographs, lists, etc. Building upon DelightfulTTS, we adopt contextual and emotion encoders to adapt the audiobook data to enrich beyond sentences for long-form prosody and dialogue expressiveness. Regarding the recording quality, we also apply denoise algorithms and long audio processing for both corpora. For the hub task, only the 50-hour single speaker data is used for building the TTS system, while for the spoke task, a multi-speaker source model is used for target speaker fine tuning. MuLanTTS achieves mean scores of quality assessment 4.3 and 4.5 in the respective tasks, statistically comparable with natural speech while keeping good similarity according to similarity assessment. The excellent and similarity in this year's new and dense statistical evaluation show the effectiveness of our proposed system in both tasks.
△ Less
Submitted 11 September, 2023; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Multi-View Attention Learning for Residual Disease Prediction of Ovarian Cancer
Authors:
Xiangneng Gao,
Shulan Ruan,
Jun Shi,
Guoqing Hu,
Wei Wei
Abstract:
In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the im…
▽ More
In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the importance of 3D image information of disease, which might brings a limited performance for residual disease prediction, especially in small-scale datasets. To this end, in this paper, we propose a novel Multi-View Attention Learning (MuVAL) method for residual disease prediction, which focuses on the comprehensive learning of 3D Computed Tomography (CT) images in a multi-view manner. Specifically, we first obtain multi-view of 3D CT images from transverse, coronal and sagittal views. To better represent the image features in a multi-view manner, we further leverage attention mechanism to help find the more relevant slices in each view. Extensive experiments on a dataset of 111 patients show that our method outperforms existing deep-learning methods.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Interpretable System Identification and Long-term Prediction on Time-Series Data
Authors:
Xiaoyi Liu,
Duxin Chen,
Wenjia Wei,
Xia Zhu,
Wenwu Yu
Abstract:
Time-series prediction has drawn considerable attention during the past decades fueled by the emerging advances of deep learning methods. However, most neural network based methods lack interpretability and fail in extracting the hidden mechanism of the targeted physical system. To overcome these shortcomings, an interpretable sparse system identification method without any prior knowledge is prop…
▽ More
Time-series prediction has drawn considerable attention during the past decades fueled by the emerging advances of deep learning methods. However, most neural network based methods lack interpretability and fail in extracting the hidden mechanism of the targeted physical system. To overcome these shortcomings, an interpretable sparse system identification method without any prior knowledge is proposed in this study. This method adopts the Fourier transform to reduces the irrelevant items in the dictionary matrix, instead of indiscriminate usage of polynomial functions in most system identification methods. It shows an interpretable system representation and greatly reduces computing cost. With the adoption of $l_1$ norm in regularizing the parameter matrix, a sparse description of the system model can be achieved. Moreover, Three data sets including the water conservancy data, global temperature data and financial data are used to test the performance of the proposed method. Although no prior knowledge was known about the physical background, experimental results show that our method can achieve long-term prediction regardless of the noise and incompleteness in the original data more accurately than the widely-used baseline data-driven methods. This study may provide some insight into time-series prediction investigations, and suggests that an white-box system identification method may extract the easily overlooked yet inherent periodical features and may beat neural-network based black-box methods on long-term prediction tasks.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Sizing Grid-Connected Wind Power Generation and Energy Storage with Wake Effect and Endogenous Uncertainty: A Distributionally Robust Method
Authors:
Rui Xie,
Wei Wei,
Yue Chen
Abstract:
Wind power, as a green energy resource, is growing rapidly worldwide, along with energy storage systems (ESSs) to mitigate its volatility. Sizing of wind power generation and ESSs has become an important problem to be addressed. Wake effect in a wind farm can cause wind speed deficits and a drop in downstream wind turbine power generation, which however was rarely considered in the sizing problem…
▽ More
Wind power, as a green energy resource, is growing rapidly worldwide, along with energy storage systems (ESSs) to mitigate its volatility. Sizing of wind power generation and ESSs has become an important problem to be addressed. Wake effect in a wind farm can cause wind speed deficits and a drop in downstream wind turbine power generation, which however was rarely considered in the sizing problem in power systems. In this paper, a bi-objective distributionally robust optimization (DRO) model is proposed to determine the capacities of wind power generation and ESSs considering the wake effect. An ambiguity set based on Wasserstein metric is established to characterize the wind power and demand uncertainties. In particular, wind power uncertainty is affected by the wind power generation capacity which is determined in the first stage. Thus, the proposed model is a DRO problem with endogenous uncertainty (or decision-dependent uncertainty). To solve the proposed model, a stochastic programming approximation method based on minimum Lipschitz constants is developed to turn the DRO model into a linear program. Then, an iterative algorithm is built, embedded with methods for evaluating the minimum Lipschitz constants. Case studies demonstrate the necessity of considering wake effect and the effectiveness of the proposed method.
△ Less
Submitted 11 June, 2023; v1 submitted 30 December, 2022;
originally announced December 2022.
-
A Faithful Deep Sensitivity Estimation for Accelerated Magnetic Resonance Imaging
Authors:
Zi Wang,
Haoming Fang,
Chen Qian,
Boxuan Shi,
Lijun Bao,
Liuhong Zhu,
Jianjun Zhou,
Wenping Wei,
Jianzhong Lin,
Di Guo,
Xiaobo Qu
Abstract:
Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learn…
▽ More
Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learning methods still rely on pre-estimated sensitivity maps and ignore their inaccuracy, resulting in the significant quality degradation of reconstructed images. In this work, we propose a Joint Deep Sensitivity estimation and Image reconstruction network, called JDSI. During the image artifacts removal, it gradually provides more faithful sensitivity maps with high-frequency information, leading to improved image reconstructions. To understand the behavior of the network, the mutual promotion of sensitivity estimation and image reconstruction is revealed through the visualization of network intermediate results. Results on in vivo datasets and radiologist reader study demonstrate that, for both calibration-based and calibrationless reconstruction, the proposed JDSI achieves the state-of-the-art performance visually and quantitatively, especially when the acceleration factor is high. Additionally, JDSI owns nice robustness to patients and autocalibration signals.
△ Less
Submitted 24 December, 2023; v1 submitted 23 October, 2022;
originally announced October 2022.
-
Sample-efficient Model Predictive Control Design of Soft Robotics by Bayesian Optimization
Authors:
Anuj Pal,
Tianyi He,
Wenpeng Wei
Abstract:
This paper presents a sample-efficient data-driven method to design model predictive control (MPC) for cable-actuated soft robotics using Bayesian optimization. Instead of modeling the complex dynamics of the soft robots, the proposed approach uses Bayesian optimization to search the best-guessed low-dimensional prediction model and its associated controller to minimize the objective function of c…
▽ More
This paper presents a sample-efficient data-driven method to design model predictive control (MPC) for cable-actuated soft robotics using Bayesian optimization. Instead of modeling the complex dynamics of the soft robots, the proposed approach uses Bayesian optimization to search the best-guessed low-dimensional prediction model and its associated controller to minimize the objective function of closed-loop responses. The prediction model is updated by Bayesian optimization from the closed-loop input-output data in each iteration. A linear MPC is then designed based on the updated prediction model, and evaluated based on the closed-loop responses. Different from directly searching controller parameters, the closed-loop system stability, and inputs/outputs constraints can be easily handled in the MPC design. After a few iterations, a convergent solution of a (sub-)optimal controller can be obtained, which minimizes the user-defined closed-loop performance index. The proposed method is simulated and validated by a high-fidelity simulation of a cable-actuated soft robot. The simulation results demonstrate that the proposed approach can achieve desired tracking controller for the soft robot without a prior-known model.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription
Authors:
Weixing Wei,
Peilin Li,
Yi Yu,
Wei Li
Abstract:
While neural network models are making significant progress in piano transcription, they are becoming more resource-consuming due to requiring larger model size and more computing power. In this paper, we attempt to apply more prior about piano to reduce model size and improve the transcription performance. The sound of a piano note contains various overtones, and the pitch of a key does not chang…
▽ More
While neural network models are making significant progress in piano transcription, they are becoming more resource-consuming due to requiring larger model size and more computing power. In this paper, we attempt to apply more prior about piano to reduce model size and improve the transcription performance. The sound of a piano note contains various overtones, and the pitch of a key does not change over time. To make full use of such latent information, we propose HPPNet that using the Harmonic Dilated Convolution to capture the harmonic structures and the Frequency Grouped Recurrent Neural Network to model the pitch-invariance over time. Experimental results on the MAESTRO dataset show that our piano transcription system achieves state-of-the-art performance both in frame and note scores (frame F1 93.15%, note F1 97.18%). Moreover, the model size is much smaller than the previous state-of-the-art deep learning models.
△ Less
Submitted 30 August, 2022; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Multi-core fiber enabled fading noise suppression in φ-OFDR based quantitative distributed vibration sensing
Authors:
Yuxiang Feng,
Weilin Xie,
Yinxia Meng,
Jiang Yang,
Qiang Yang,
Yan Ren,
Tianwai Bo,
Zhongwei Tan,
Wei Wei,
Yi Dong
Abstract:
Coherent fading has been regarded as a critical issue in phase-sensitive optical frequency domain reflectometry (φ-OFDR) based distributed fiber-optic sensing. Here, we report on an approach for fading noise suppression in φ-OFDR with multi-core fiber. By exploiting the independent nature of the randomness in the distribution of reflective index in each of the cores, the drastic phase fluctuations…
▽ More
Coherent fading has been regarded as a critical issue in phase-sensitive optical frequency domain reflectometry (φ-OFDR) based distributed fiber-optic sensing. Here, we report on an approach for fading noise suppression in φ-OFDR with multi-core fiber. By exploiting the independent nature of the randomness in the distribution of reflective index in each of the cores, the drastic phase fluctuations due to the fading phenomina can be effectively alleviated by applying weighted vectorial averaging for the Rayleigh backscattering traces from each of the cores with distinct fading distributions. With the consistent linear response with respect to external excitation of interest for each of the cores, demonstration for the propsoed φ-OFDR with a commercial seven-core fiber has achieved highly sensitive quantitative distributed vibration sensing with about 2.2 nm length precision and 2 cm sensing resolution along the 500 m fiber, corresponding to a range resolution factor as high as about about 4E-5. Featuring long distance, high sensitivity, high resolution, and fading robustness, this approach has shown promising potentials in various sensing techniques for a wide range of practical scenarios.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation
Authors:
Weixing Wei,
Peilin Li,
Yi Yu,
Wei Li
Abstract:
Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, whi…
▽ More
Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors. The code and pre-trained model are available at https://github.com/WX-Wei/HarmoF0.
△ Less
Submitted 20 June, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Robust Generation Dispatch with Strategic Renewable Power Curtailment and Decision-Dependent Uncertainty
Authors:
Yue Chen,
Wei Wei
Abstract:
As renewable energy sources replace traditional power sources (such as thermal generators), uncertainty grows while there are fewer controllable units. To reduce operational risks and avoid frequent real-time emergency controls, a preparatory schedule of renewable generation curtailment is required. This paper proposes a novel two-stage robust generation dispatch (RGD) model, where the preparatory…
▽ More
As renewable energy sources replace traditional power sources (such as thermal generators), uncertainty grows while there are fewer controllable units. To reduce operational risks and avoid frequent real-time emergency controls, a preparatory schedule of renewable generation curtailment is required. This paper proposes a novel two-stage robust generation dispatch (RGD) model, where the preparatory curtailment schedule is optimized in the pre-dispatch stage. The curtailment schedule will then influence the variation range of real-time renewable power output, resulting in a decision-dependent uncertainty (DDU) set. In the re-dispatch stage, the controllable units adjust their outputs within the reserve capacities to maintain power balancing. To overcome the difficulty in solving the RGD with DDU, an adaptive column-and-constraint generation (AC\&CG) algorithm is developed. We prove that the proposed algorithm can generate the optimal solution in finite iterations. Numerical examples show the advantages of the proposed model and algorithm, and validate their practicability and scalability.
△ Less
Submitted 1 July, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Optimal configuration of cooperative stationary and mobile energy storage considering ambient temperature: A case for Winter Olympic Game
Authors:
He Meng,
Hongjie Jia,
Tao Xu,
Wei Wei,
Yuhan Wu,
Lemeng Liang,
Shuqi Cai,
Zuozheng Liu,
Rujing Wang
Abstract:
The international mega-event, such as the Winter Olympic Game, has been considered as one of the most carbon intensive activities worldwide. The commitment of fully renewable energy accommodation and utilization while ensuring the extreme high reliability has brought significant challenges on system operation due to the stochastic nature of the renewables. The battery energy storage system (BESS)…
▽ More
The international mega-event, such as the Winter Olympic Game, has been considered as one of the most carbon intensive activities worldwide. The commitment of fully renewable energy accommodation and utilization while ensuring the extreme high reliability has brought significant challenges on system operation due to the stochastic nature of the renewables. The battery energy storage system (BESS) composed of stationary energy storage system (SESS) and shared mobile energy storage system (MESS) can be utilized to meet the requirements of short-term load surges, renewable accommodation and emergency power supply for important loads during the mega-event. The BESS can continue to serve the venues electricity consumption to satisfy the carbon neutrality after the event. On the other hand, the low ambient temperature of Winter Olympic game has significant impact on BESSs degradation and performance which need to be integrated to the charging and discharging model of BESS. To this end, a joint two-stage optimal configuration method considering the ambient temperature of SESS and MESS has been developed to support the mega-event carbon neutrality, to reduce redundant BESS capacity allocation and improve the system life cycle cost-benefit. Simulation results have demonstrated the rationality and effectiveness of the collaborative operation of SESS and the MESS under various scenarios.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Underwater Differential Game: Finite-Time Target Hunting Task with Communication Delay
Authors:
Wei Wei,
JingJing Wang,
Jun Du,
Zhengru Fang,
Chunxiao Jiang,
Yong Ren
Abstract:
This work considers designing an unmanned target hunting system for a swarm of unmanned underwater vehicles (UUVs) to hunt a target with high maneuverability. Differential game theory is used to analyze combat policies of UUVs and the target within finite time. The challenge lies in UUVs must conduct their control policies in consideration of not only the consistency of the hunting team but also e…
▽ More
This work considers designing an unmanned target hunting system for a swarm of unmanned underwater vehicles (UUVs) to hunt a target with high maneuverability. Differential game theory is used to analyze combat policies of UUVs and the target within finite time. The challenge lies in UUVs must conduct their control policies in consideration of not only the consistency of the hunting team but also escaping behaviors of the target. To obtain stable feedback control policies satisfying Nash equilibrium, we construct the Hamiltonian function with Leibniz's formula. For further taken underwater disturbances and communication delay into consideration, modified deep reinforcement learning (DRL) is provided to investigate the underwater target hunting task in an unknown dynamic environment. Simulations show that underwater disturbances have a large impact on the system considering communication delay. Moreover, consistency tests show that UUVs perform better consistency with a relatively small range of disturbances.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Approaching the Transient Stability Boundary of a Power System: Theory and Applications
Authors:
Peng Yang,
Feng Liu,
Wei Wei,
Zhaojian Wang
Abstract:
Estimating the stability boundary is a fundamental and challenging problem in transient stability studies. It is known that a proper level set of a Lyapunov function or an energy function can provide an inner approximation of the stability boundary, and the estimation can be expanded by trajectory reversing methods. In this paper, we streamline the theoretical foundation of the expansion methodolo…
▽ More
Estimating the stability boundary is a fundamental and challenging problem in transient stability studies. It is known that a proper level set of a Lyapunov function or an energy function can provide an inner approximation of the stability boundary, and the estimation can be expanded by trajectory reversing methods. In this paper, we streamline the theoretical foundation of the expansion methodology, and generalize it by relaxing the request that the initial guess should be a subset of the stability region. We investigate topological characteristics of the expanded boundary, showing how an initial guess can approach the exact stability boundary locally or globally. We apply the theory to transient stability assessment, and propose expansion algorithms to improve the well-known Potential Energy Boundary Surface (PEBS) and Boundary of stability region based Controlling Unstable equilibrium point (BCU) methods. Case studies on the IEEE 39-bus system well verify our results and demonstrate that estimations of the stability boundary and the critical clearing time can be significantly improved with modest computational cost.
△ Less
Submitted 30 September, 2021; v1 submitted 26 September, 2021;
originally announced September 2021.
-
Storage and Transmission Capacity Requirements of a Remote Solar Power Generation System
Authors:
Yue Chen,
Wei Wei,
Cheng Wang,
Miadreza Shafie-khah,
João P. S. Catalão
Abstract:
Large solar power stations usually locate in remote areas and connect to the main grid via a long transmission line. Energy storage unit is deployed locally with the solar plant to smooth its output. Capacities of the grid-connection transmission line and the energy storage unit have a significant impact on the utilization rate of solar energy, as well as the investment cost. This paper characteri…
▽ More
Large solar power stations usually locate in remote areas and connect to the main grid via a long transmission line. Energy storage unit is deployed locally with the solar plant to smooth its output. Capacities of the grid-connection transmission line and the energy storage unit have a significant impact on the utilization rate of solar energy, as well as the investment cost. This paper characterizes the feasible set of capacity parameters under a given solar spillage rate and a fixed investment budget. A linear programming based projection algorithm is proposed to obtain such a feasible set, offering valuable references for system planning and policy making.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Flexibility Requirement when Tracking Renewable Power Fluctuation with Peer-to-Peer Energy Sharing
Authors:
Yue Chen,
Wei Wei,
Mingxuan Li,
Laijun Chen,
João P. S. Catalão
Abstract:
Flexible load at the demand-side has been regarded as an effective measure to cope with volatile distributed renewable generations. To unlock the demand-side flexibility, this paper proposes a peer-to-peer energy sharing mechanism that facilitates energy exchange among users while preserving privacy. We prove the existence and partial uniqueness of the energy sharing market equilibrium and provide…
▽ More
Flexible load at the demand-side has been regarded as an effective measure to cope with volatile distributed renewable generations. To unlock the demand-side flexibility, this paper proposes a peer-to-peer energy sharing mechanism that facilitates energy exchange among users while preserving privacy. We prove the existence and partial uniqueness of the energy sharing market equilibrium and provide a centralized optimization to obtain the equilibrium. The centralized optimization is further linearized by a convex combination approach, turning into a multi-parametric linear program (MP-LP) with renewable output deviations being the parameters. The flexibility requirement of individual users is calculated based on this MP-LP. To be specific, an adaptive vertex generation algorithm is established to construct a piecewise linear estimator of the optimal total cost subject to a given error tolerance. Critical regions and optimal strategies are retrieved from the obtained approximate cost function to evaluate the flexibility requirement. The proposed algorithm does not rely on the exact characterization of optimal basis invariant sets and thus is not influenced by model degeneracy, a common difficulty faced by existing approaches. Case studies validate the theoretical results and show that the proposed method is scalable.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
An Improved Surrogate Method for Solving the Energy Storage Optimal Bidding Problem
Authors:
Yue Chen,
Wei Wei,
Tongxin Li,
Yunhe Hou,
Feng Liu,
João P. S. Catalão
Abstract:
Energy storage is expected to play an increasingly important role in mitigating variations that come along with the growing penetration of renewable energy. In this paper, we study the optimal bidding of an energy storage unit in a semi-centralized market. The energy storage unit offers its available storage capacity and maximum charging/discharging rate to the operator; then the operator clears t…
▽ More
Energy storage is expected to play an increasingly important role in mitigating variations that come along with the growing penetration of renewable energy. In this paper, we study the optimal bidding of an energy storage unit in a semi-centralized market. The energy storage unit offers its available storage capacity and maximum charging/discharging rate to the operator; then the operator clears the real-time market by minimizing the total cost. The energy storage unit is paid/charged at locational marginal price (LMP). The problem casts down to a bilevel optimization problem with a mixed-integer lower-level. An improved surrogate-based method with the combined spatial-temporal entropy term is developed to solve this problem. Numerical examples demonstrate the scalability, efficiency, and accuracy of the proposed method.
△ Less
Submitted 28 September, 2021; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Speech2Video: Cross-Modal Distillation for Speech to Video Generation
Authors:
Shijing Si,
Jianzong Wang,
Xiaoyang Qu,
Ning Cheng,
Wenqi Wei,
Xinghua Zhu,
Jing Xiao
Abstract:
This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique can spark interesting applications in entertainment, customer service, and human-computer-interaction industries. Indeed, the timbre, accent and speed in speeches could contain rich information relevant to speakers' appearance. The challenge mainly lies in disentangl…
▽ More
This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique can spark interesting applications in entertainment, customer service, and human-computer-interaction industries. Indeed, the timbre, accent and speed in speeches could contain rich information relevant to speakers' appearance. The challenge mainly lies in disentangling the distinct visual attributes from audio signals. In this article, we propose a light-weight, cross-modal distillation method to extract disentangled emotional and identity information from unlabelled video inputs. The extracted features are then integrated by a generative adversarial network into talking face video clips. With carefully crafted discriminators, the proposed framework achieves realistic generation results. Experiments with observed individuals demonstrated that the proposed framework captures the emotional expressions solely from speeches, and produces spontaneous facial motion in the video output. Compared to the baseline method where speeches are combined with a static image of the speaker, the results of the proposed framework is almost indistinguishable. User studies also show that the proposed method outperforms the existing algorithms in terms of emotion expression in the generated videos.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
DARNet: Dual-Attention Residual Network for Automatic Diagnosis of COVID-19 via CT Images
Authors:
Jun Shi,
Huite Yi,
Shulan Ruan,
Zhaohui Wang,
Xiaoyu Hao,
Hong An,
Wei Wei
Abstract:
The ongoing global pandemic of Coronavirus Disease 2019 (COVID-19) poses a serious threat to public health and the economy. Rapid and accurate diagnosis of COVID-19 is crucial to prevent the further spread of the disease and reduce its mortality. Chest Computed tomography (CT) is an effective tool for the early diagnosis of lung diseases including pneumonia. However, detecting COVID-19 from CT is…
▽ More
The ongoing global pandemic of Coronavirus Disease 2019 (COVID-19) poses a serious threat to public health and the economy. Rapid and accurate diagnosis of COVID-19 is crucial to prevent the further spread of the disease and reduce its mortality. Chest Computed tomography (CT) is an effective tool for the early diagnosis of lung diseases including pneumonia. However, detecting COVID-19 from CT is demanding and prone to human errors as some early-stage patients may have negative findings on images. Recently, many deep learning methods have achieved impressive performance in this regard. Despite their effectiveness, most of these methods underestimate the rich spatial information preserved in the 3D structure or suffer from the propagation of errors. To address this problem, we propose a Dual-Attention Residual Network (DARNet) to automatically identify COVID-19 from other common pneumonia (CP) and healthy people using 3D chest CT images. Specifically, we design a dual-attention module consisting of channel-wise attention and depth-wise attention mechanisms. The former is utilized to enhance channel independence, while the latter is developed to recalibrate the depth-level features. Then, we integrate them in a unified manner to extract and refine the features at different levels to further improve the diagnostic performance. We evaluate DARNet on a large public CT dataset and obtain superior performance. Besides, the ablation study and visualization analysis prove the effectiveness and interpretability of the proposed method.
△ Less
Submitted 30 August, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
On Addressing Practical Challenges for RNN-Transducer
Authors:
Rui Zhao,
Jian Xue,
Jinyu Li,
Wenning Wei,
Lei He,
Yifan Gong
Abstract:
In this paper, several works are proposed to address practical challenges for deploying RNN Transducer (RNN-T) based speech recognition system. These challenges are adapting a well-trained RNN-T model to a new domain without collecting the audio data, obtaining time stamps and confidence scores at word level. The first challenge is solved with a splicing data method which concatenates the speech s…
▽ More
In this paper, several works are proposed to address practical challenges for deploying RNN Transducer (RNN-T) based speech recognition system. These challenges are adapting a well-trained RNN-T model to a new domain without collecting the audio data, obtaining time stamps and confidence scores at word level. The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data. To get the time stamp, a phone prediction branch is added to the RNN-T model by sharing the encoder for the purpose of force alignment. Finally, we obtain word-level confidence scores by utilizing several types of features calculated during decoding and from confusion network. Evaluated with Microsoft production data, the splicing data adaptation method improves the baseline and adaptation with the text to speech method by 58.03% and 15.25% relative word error rate reduction, respectively. The proposed time stamping method can get less than 50ms word timing difference from the ground truth alignment on average while maintaining the recognition accuracy of the RNN-T model. We also obtain high confidence annotation performance with limited computation cost.
△ Less
Submitted 18 July, 2021; v1 submitted 27 April, 2021;
originally announced May 2021.
-
Beyond Visual Attractiveness: Physically Plausible Single Image HDR Reconstruction for Spherical Panoramas
Authors:
Wei Wei,
Li Guan,
Yue Liu,
Hao Kang,
Haoxiang Li,
Ying Wu,
Gang Hua
Abstract:
HDR reconstruction is an important task in computer vision with many industrial needs. The traditional approaches merge multiple exposure shots to generate HDRs that correspond to the physical quantity of illuminance of the scene. However, the tedious capturing process makes such multi-shot approaches inconvenient in practice. In contrast, recent single-shot methods predict a visually appealing HD…
▽ More
HDR reconstruction is an important task in computer vision with many industrial needs. The traditional approaches merge multiple exposure shots to generate HDRs that correspond to the physical quantity of illuminance of the scene. However, the tedious capturing process makes such multi-shot approaches inconvenient in practice. In contrast, recent single-shot methods predict a visually appealing HDR from a single LDR image through deep learning. But it is not clear whether the previously mentioned physical properties would still hold, without training the network to explicitly model them. In this paper, we introduce the physical illuminance constraints to our single-shot HDR reconstruction framework, with a focus on spherical panoramas. By the proposed physical regularization, our method can generate HDRs which are not only visually appealing but also physically plausible. For evaluation, we collect a large dataset of LDR and HDR images with ground truth illuminance measures. Extensive experiments show that our HDR images not only maintain high visual quality but also top all baseline methods in illuminance prediction accuracy.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Data-Driven Dispatchable Regions with Potentially Active Boundaries for Renewable Power Generation: Concept and Construction
Authors:
Yanqi Liu,
Zhigang Li,
Wei Wei,
Jiehui Zheng,
Hongcai Zhang
Abstract:
The dispatchable region of volatile renewable power generation (RPG) quantifies how much uncertainty the power system can handle at a given operating point. State-of-the-art dispatchable region (DR) research has studied how system operational constraints influence the DR but has seldom considered the effect of the uncertainty features of RPG outputs. The traditional DR is generally described by a…
▽ More
The dispatchable region of volatile renewable power generation (RPG) quantifies how much uncertainty the power system can handle at a given operating point. State-of-the-art dispatchable region (DR) research has studied how system operational constraints influence the DR but has seldom considered the effect of the uncertainty features of RPG outputs. The traditional DR is generally described by a large number of boundaries, and it is computationally intensive to construct. To bridge these gaps, a novel type of DR is defined, which is enclosed by potentially active boundaries (PABs) that consider the operational constraints and uncertainty features of RPG outputs. The proposed DR is easier to construct because the PABs are only a small part of the traditional DR boundaries. The procedure for constructing the proposed DR is described in terms of the progressive search for PABs, which is formulated as a mixed-integer linear program by incorporating the discrete observed data points of RPG outputs as an approximate distribution. A parallel solution paradigm is also developed to expedite the construction procedure when using a large observed dataset. Simulation tests on the IEEE 30-bus and 118-bus systems verify the effectiveness and scalability of the proposed DR and the efficiency of the proposed algorithm.
△ Less
Submitted 23 April, 2022; v1 submitted 13 December, 2020;
originally announced December 2020.
-
Unsupervised Alternating Optimization for Blind Hyperspectral Imagery Super-resolution
Authors:
Jiangtao Nie,
Lei Zhang,
Wei Wei,
Zhiqiang Lang,
Yanning Zhang
Abstract:
Despite the great success of deep model on Hyperspectral imagery (HSI) super-resolution(SR) for simulated data, most of them function unsatisfactory when applied to the real data, especially for unsupervised HSI SR methods. One of the main reason comes from the fact that the predefined degeneration models (e.g. blur in spatial domain) utilized by most HSI SR methods often exist great discrepancy w…
▽ More
Despite the great success of deep model on Hyperspectral imagery (HSI) super-resolution(SR) for simulated data, most of them function unsatisfactory when applied to the real data, especially for unsupervised HSI SR methods. One of the main reason comes from the fact that the predefined degeneration models (e.g. blur in spatial domain) utilized by most HSI SR methods often exist great discrepancy with the real one, which results in these deep models overfit and ultimately degrade their performance on real data. To well mitigate such a problem, we explore the unsupervised blind HSI SR method. Specifically, we investigate how to effectively obtain the degeneration models in spatial and spectral domain, respectively, and makes them can well compatible with the fusion based SR reconstruction model. To this end, we first propose an alternating optimization based deep framework to estimate the degeneration models and reconstruct the latent image, with which the degeneration models estimation and HSI reconstruction can mutually promotes each other. Then, a meta-learning based mechanism is further proposed to pre-train the network, which can effectively improve the speed and generalization ability adapting to different complex degeneration. Experiments on three benchmark HSI SR datasets report an excellent superiority of the proposed method on handling blind HSI fusion problem over other competing methods.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
The study of calibration for the hybrid pixel detector with single photon counting in HEPS-BPIX
Authors:
Ye Ding,
Zhenjie Li,
Wei Wei,
Jie Zhang,
Hangxu Li,
Yan Zhang,
Xiaolu Ji,
Peng Liu,
Yuanbai Chen,
Kejun Zhu
Abstract:
The calibration process for the hybrid array pixel detector designed for High Energy Photon Source in China, we called HEPS-BPIX, is presented in this paper. Based on the threshold scanning, the relationship between energy and threshold is quantified for the threshold calibration. For the threshold trimming, the precise algorithm basing on LDAC characteristic and fast algorithm basing on LDAC scan…
▽ More
The calibration process for the hybrid array pixel detector designed for High Energy Photon Source in China, we called HEPS-BPIX, is presented in this paper. Based on the threshold scanning, the relationship between energy and threshold is quantified for the threshold calibration. For the threshold trimming, the precise algorithm basing on LDAC characteristic and fast algorithm basing on LDAC scanning are proposed in this paper to study the performance of the threshold DACs which will be applied to the pixel. The threshold dispersion has been reduced from 46.28 mV without algorithm to 6.78 mV with the precise algorithm, whereas it is 7.61 mV with fast algorithm. For the temperature from 5 to 60 , the threshold dispersion of precise algorithm varies in the range of about 5.69 mV, whereas it is about 33.21 mV with the fast algorithm which can be re-corrected to 1.49 mV. The measurement results show that the fast algorithm could get the applicable threshold dispersion for a silicon pixel module and take a shorter time, while the precise algorithm could get better threshold dispersion, but time consuming. The temperature dependence of the silicon pixel module noise is also studied to assess the detector working status. The minimum detection energy can be reduced about 0.83 keV at a 20 lower temperature.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results
Authors:
Kai Zhang,
Martin Danelljan,
Yawei Li,
Radu Timofte,
Jie Liu,
Jie Tang,
Gangshan Wu,
Yu Zhu,
Xiangyu He,
Wenjie Xu,
Chenghua Li,
Cong Leng,
Jian Cheng,
Guangyang Wu,
Wenyi Wang,
Xiaohong Liu,
Hengyuan Zhao,
Xiangtao Kong,
Jingwen He,
Yu Qiao,
Chao Dong,
Xiaotong Luo,
Liang Chen,
Jiangtao Zhang,
Maitreya Suin
, et al. (60 additional authors not shown)
Abstract:
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter co…
▽ More
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
A Real-time Robot-based Auxiliary System for Risk Evaluation of COVID-19 Infection
Authors:
Wenqi Wei,
Jianzong Wang,
Jiteng Ma,
Ning Cheng,
Jing Xiao
Abstract:
In this paper, we propose a real-time robot-based auxiliary system for risk evaluation of COVID-19 infection. It combines real-time speech recognition, temperature measurement, keyword detection, cough detection and other functions in order to convert live audio into actionable structured data to achieve the COVID-19 infection risk assessment function. In order to better evaluate the COVID-19 infe…
▽ More
In this paper, we propose a real-time robot-based auxiliary system for risk evaluation of COVID-19 infection. It combines real-time speech recognition, temperature measurement, keyword detection, cough detection and other functions in order to convert live audio into actionable structured data to achieve the COVID-19 infection risk assessment function. In order to better evaluate the COVID-19 infection, we propose an end-to-end method for cough detection and classification for our proposed system. It is based on real conversation data from human-robot, which processes speech signals to detect cough and classifies it if detected. The structure of our model are maintained concise to be implemented for real-time applications. And we further embed this entire auxiliary diagnostic system in the robot and it is placed in the communities, hospitals and supermarkets to support COVID-19 testing. The system can be further leveraged within a business rules engine, thus serving as a foundation for real-time supervision and assistance applications. Our model utilizes a pretrained, robust training environment that allows for efficient creation and customization of customer-specific health states.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
Authors:
Jinyu Li,
Rui Zhao,
Zhong Meng,
Yanqing Liu,
Wenning Wei,
Sarangarajan Parthasarathy,
Vadim Mazalov,
Zhenghao Wang,
Lei He,
Sheng Zhao,
Yifan Gong
Abstract:
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition. In this paper, we describe our recent development of RNN-T models with reduced GPU memory consumption during training, better initialization strategy, and advanced encoder modeling with future lookahead.…
▽ More
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition. In this paper, we describe our recent development of RNN-T models with reduced GPU memory consumption during training, better initialization strategy, and advanced encoder modeling with future lookahead. When trained with Microsoft's 65 thousand hours of anonymized training data, the developed RNN-T model surpasses a very well trained hybrid model with both better recognition accuracy and lower latency. We further study how to customize RNN-T models to a new domain, which is important for deploying E2E models to practical scenarios. By comparing several methods leveraging text-only data in the new domain, we found that updating RNN-T's prediction and joint networks using text-to-speech generated from domain-specific text is the most effective.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism
Authors:
Wang Dai,
Jinsong Zhang,
Yingming Gao,
Wei Wei,
Dengfeng Ke,
Binghuai Lin,
Yanlu Xie
Abstract:
Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for forman…
▽ More
Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for formant tracking. In addition to the conventional implementation, we modified the architecture from three aspects. First, we turned off the "causal" mode of dilated convolution, making the dilated convolution see the future speech frames. Second, each hidden layer reused the output information from all the previous layers through dense connection. Third, we also adopted a gating mechanism to alleviate the problem of gradient disappearance by selectively forgetting unimportant information. The model was validated on the open access formant database VTR. The experiment showed that our proposed model was easy to converge and achieved an overall mean absolute percent error (MAPE) of 8.2% on speech-labeled frames, compared to three competitive baselines of 9.4% (LSTM), 9.1% (Bi-LSTM) and 8.9% (TCN).
△ Less
Submitted 8 August, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Line Art Correlation Matching Feature Transfer Network for Automatic Animation Colorization
Authors:
Zhang Qian,
Wang Bo,
Wen Wei,
Li Hai,
Liu Jun Hui
Abstract:
Automatic animation line art colorization is a challenging computer vision problem, since the information of the line art is highly sparse and abstracted and there exists a strict requirement for the color and style consistency between frames. Recently, a lot of Generative Adversarial Network (GAN) based image-to-image translation methods for single line art colorization have emerged. They can gen…
▽ More
Automatic animation line art colorization is a challenging computer vision problem, since the information of the line art is highly sparse and abstracted and there exists a strict requirement for the color and style consistency between frames. Recently, a lot of Generative Adversarial Network (GAN) based image-to-image translation methods for single line art colorization have emerged. They can generate perceptually appealing results conditioned on line art images. However, these methods can not be adopted for the purpose of animation colorization because there is a lack of consideration of the in-between frame consistency. Existing methods simply input the previous colored frame as a reference to color the next line art, which will mislead the colorization due to the spatial misalignment of the previous colored frame and the next line art especially at positions where apparent changes happen. To address these challenges, we design a kind of correlation matching feature transfer model (called CMFT) to align the colored reference feature in a learnable way and integrate the model into an U-Net based generator in a coarse-to-fine manner. This enables the generator to transfer the layer-wise synchronized features from the deep semantic code to the content progressively. Extension evaluation shows that CMFT model can effectively improve the in-between consistency and the quality of colored frames especially when the motion is intense and diverse.
△ Less
Submitted 10 November, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
Distributed Generalized Nash Equilibrium Seeking for Energy Sharing Games
Authors:
Zhaojian Wang,
Feng Liu,
Zhiyuan Ma,
Yue Chen,
Mengshuo Jia,
Wei Wei,
Qiuwei Wu
Abstract:
With the proliferation of distributed generators and energy storage systems, traditional passive consumers in power systems have been gradually evolving into the so-called "prosumers", i.e., proactive consumers, which can both produce and consume power. To encourage energy exchange among prosumers, energy sharing is increasingly adopted, which is usually formulated as a generalized Nash game (GNG)…
▽ More
With the proliferation of distributed generators and energy storage systems, traditional passive consumers in power systems have been gradually evolving into the so-called "prosumers", i.e., proactive consumers, which can both produce and consume power. To encourage energy exchange among prosumers, energy sharing is increasingly adopted, which is usually formulated as a generalized Nash game (GNG). In this paper, a distributed approach is proposed to seek the Generalized Nash equilibrium (GNE) of the energy sharing game. To this end, we convert the GNG into an equivalent optimization problem. A Krasnosel'ski{ǐ}-Mann iteration type algorithm is thereby devised to solve the problem and consequently find the GNE in a distributed manner. The convergence of the proposed algorithm is proved rigorously based on the nonexpansive operator theory. The performance of the algorithm is validated by experiments with three prosumers, and the scalability is tested by simulations using 123 prosumers.
△ Less
Submitted 9 April, 2020; v1 submitted 7 November, 2019;
originally announced November 2019.
-
AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results
Authors:
Kai Zhang,
Shuhang Gu,
Radu Timofte,
Zheng Hui,
Xiumei Wang,
Xinbo Gao,
Dongliang Xiong,
Shuai Liu,
Ruipeng Gang,
Nan Nan,
Chenghua Li,
Xueyi Zou,
Ning Kang,
Zhan Wang,
Hang Xu,
Chaofeng Wang,
Zheng Li,
Linlin Wang,
Jun Shi,
Wenyu Sun,
Zhiqiang Lang,
Jiangtao Nie,
Wei Wei,
Lei Zhang,
Yazhe Niu
, et al. (4 additional authors not shown)
Abstract:
This paper reviews the AIM 2019 challenge on constrained example-based single image super-resolution with focus on proposed solutions and results. The challenge had 3 tracks. Taking the three main aspects (i.e., number of parameters, inference/running time, fidelity (PSNR)) of MSRResNet as the baseline, Track 1 aims to reduce the amount of parameters while being constrained to maintain or improve…
▽ More
This paper reviews the AIM 2019 challenge on constrained example-based single image super-resolution with focus on proposed solutions and results. The challenge had 3 tracks. Taking the three main aspects (i.e., number of parameters, inference/running time, fidelity (PSNR)) of MSRResNet as the baseline, Track 1 aims to reduce the amount of parameters while being constrained to maintain or improve the running time and the PSNR result, Tracks 2 and 3 aim to optimize running time and PSNR result with constrain of the other two aspects, respectively. Each track had an average of 64 registered participants, and 12 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Deep neural networks for automated classification of colorectal polyps on histopathology slides: A multi-institutional evaluation
Authors:
Jason W. Wei,
Arief A. Suriawinata,
Louis J. Vaickus,
Bing Ren,
Xiaoying Liu,
Mikhail Lisovsky,
Naofumi Tomita,
Behnaz Abdollahi,
Adam S. Kim,
Dale C. Snover,
John A. Baron,
Elizabeth L. Barry,
Saeed Hassanpour
Abstract:
Histological classification of colorectal polyps plays a critical role in both screening for colorectal cancer and care of affected patients. An accurate and automated algorithm for the classification of colorectal polyps on digitized histopathology slides could benefit clinicians and patients. Evaluate the performance and assess the generalizability of a deep neural network for colorectal polyp c…
▽ More
Histological classification of colorectal polyps plays a critical role in both screening for colorectal cancer and care of affected patients. An accurate and automated algorithm for the classification of colorectal polyps on digitized histopathology slides could benefit clinicians and patients. Evaluate the performance and assess the generalizability of a deep neural network for colorectal polyp classification on histopathology slide images using a multi-institutional dataset. In this study, we developed a deep neural network for classification of four major colorectal polyp types, tubular adenoma, tubulovillous/villous adenoma, hyperplastic polyp, and sessile serrated adenoma, based on digitized histopathology slides from our institution, Dartmouth-Hitchcock Medical Center (DHMC), in New Hampshire. We evaluated the deep neural network on an internal dataset of 157 histopathology slide images from DHMC, as well as on an external dataset of 238 histopathology slide images from 24 different institutions spanning 13 states in the United States. We measured accuracy, sensitivity, and specificity of our model in this evaluation and compared its performance to local pathologists' diagnoses at the point-of-care retrieved from corresponding pathology laboratories. For the internal evaluation, the deep neural network had a mean accuracy of 93.5% (95% CI 89.6%-97.4%), compared with local pathologists' accuracy of 91.4% (95% CI 87.0%-95.8%). On the external test set, the deep neural network achieved an accuracy of 87.0% (95% CI 82.7%-91.3%), comparable with local pathologists' accuracy of 86.6% (95% CI 82.3%-90.9%). If confirmed in clinical settings, our model could assist pathologists by improving the diagnostic efficiency, reproducibility, and accuracy of colorectal cancer screenings.
△ Less
Submitted 23 November, 2019; v1 submitted 27 September, 2019;
originally announced September 2019.
-
Distributed Optimal Load Frequency Control Considering Nonsmooth Cost Functions
Authors:
Zhaojian Wang,
Feng Liua,
Changhong Zhao,
Zhiyuan Ma,
Wei Wei
Abstract:
This work addresses the distributed frequency control problem in power systems considering controllable load with a nonsmooth cost. The nonsmoothness exists widely in power systems, such as tiered price, greatly challenging the design of distributed optimal controllers. In this regard, we first formulate an optimization problem that minimizes the nonsmooth regulation cost, where both capacity limi…
▽ More
This work addresses the distributed frequency control problem in power systems considering controllable load with a nonsmooth cost. The nonsmoothness exists widely in power systems, such as tiered price, greatly challenging the design of distributed optimal controllers. In this regard, we first formulate an optimization problem that minimizes the nonsmooth regulation cost, where both capacity limits of controllable load and tie-line flow are considered. Then, a distributed controller is derived using the Clark generalized gradient. We also prove the optimality of the equilibrium of the closed-loop system as well as its asymptotic stability. Simulations carried out on the IEEE 68-bus system verifies the effectiveness of the proposed method.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.
-
Risk Assessment of Multi-timescale Cascading Outages based on Markovian Tree Search
Authors:
Rui Yao,
Shaowei Huang,
Kai Sun,
Feng Liu,
Xuemin Zhang,
Shengwei Mei,
Wei Wei,
Lijie Ding
Abstract:
In the risk assessment of cascading outages, the rationality of simulation and efficiency of computation are both of great significance. To overcome the drawback of sampling-based methods that huge computation resources are required and the shortcoming of initial contingency selection practices that the dependencies in sequences of outages are omitted, this paper proposes a novel risk assessment a…
▽ More
In the risk assessment of cascading outages, the rationality of simulation and efficiency of computation are both of great significance. To overcome the drawback of sampling-based methods that huge computation resources are required and the shortcoming of initial contingency selection practices that the dependencies in sequences of outages are omitted, this paper proposes a novel risk assessment approach by searching on Markovian Tree. The Markovian tree model is reformulated from the quasi-dynamic multi-timescale simulation model proposed recently to ensure reasonable modeling and simulation of cascading outages. Then a tree search scheme is established to avoid duplicated simulations on same cascade paths, significantly saving computation time. To accelerate the convergence of risk assessment, a risk estimation index is proposed to guide the search for states with major contributions to the risk, and the risk assessment is realized based on the risk estimation index with a forward tree search and backward update algorithm. The effectiveness of the proposed method is illustrated on a 4-node power system, and its convergence profile as well as efficiency is demonstrated on the RTS-96 test system.
△ Less
Submitted 11 October, 2016; v1 submitted 12 March, 2016;
originally announced March 2016.