Search | arXiv e-print repository

arXiv:2011.07217 [pdf]

Voltage-controlled magnetic anisotropy under the electronic structure modulation in quantum wells

Authors: Qingyi Xiang, Yoshio Miura, Muftah Al-Mahdawi, Thomas Scheike, Xiandong Xu, Yuya Sakuraba, Shinya Kasai, Zhenchao Wen, Hiroaki Sukegawa, Seiji Mitani, Kazuhiro Hono

Abstract: Voltage-controlled magnetic anisotropy (VCMA) offers an emerging approach to realize energy-efficient magnetization switching in spintronic devices such as magnetic random access memories (MRAMs). Here, we show that manipulating the condensed states, i.e., introducing quantum well (QW) can significantly influence the VCMA in a Cr/Fe-QW/MgAl2O4 based magnetic tunnel junction (MTJ). Only for the MTJ… ▽ More Voltage-controlled magnetic anisotropy (VCMA) offers an emerging approach to realize energy-efficient magnetization switching in spintronic devices such as magnetic random access memories (MRAMs). Here, we show that manipulating the condensed states, i.e., introducing quantum well (QW) can significantly influence the VCMA in a Cr/Fe-QW/MgAl2O4 based magnetic tunnel junction (MTJ). Only for the MTJ with an even number of Fe atomic layers, we observed a novel A-shaped VCMA curve for a particular QW state, where magnetic anisotropy energy (MAE) reaches a local maximum at zero bias and reduces when applying both positive and negative bias, i.e., a novel bi-polar VCMA effect. Our ab initio calculations demonstrate that the QW states give an additional contribution to perpendicular magnetic anisotropy (PMA), which can explain not only the A-shaped VCMA but also the Fe-layer-number parity dependence of VCMA. The present study suggests that the QW-modulated VCMA should open a new pathway to design VCMA-assisted MRAM. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: 26 pages, 8 figures

arXiv:2011.05591 [pdf, other]

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Authors: Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song

Abstract: Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learn… ▽ More Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence data. Therefore, all of the training data can be used to train the enhanced model. Our experiments are conducted on TIMIT dataset. Experimental results show that our proposed method could achieve a better performance than DNN and comparable even better performance than BLSTM. Meanwhile, compared with the BLSTM, the proposed method drastically reduce the inference time. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: Accepted by ISCSLP 2021

arXiv:2011.05526 [pdf, ps, other]

doi 10.3847/1538-4357/abbfa3

The Mode Switching in Pulsar J1326$-$6700

Authors: Z. G. Wen, W. M. Yan, J. P. Yuan, H. G. Wang, J. L. Chen, M. Mijit, R. Yuen, N. Wang, Z. Y. Tu, S. J. Dang

Abstract: We report on a detailed study of the mode switching in pulsar J1326$-$6700 by analyzing the data acquired from the Parkes 64 m radio telescope at 1369 MHz. During the abnormal mode, the emission at the central and trailing components becomes extremely weak. Meanwhile, the leading emission shifts toward earlier longitude by almost 2°, and remains in this position for typically less than a minute. T… ▽ More We report on a detailed study of the mode switching in pulsar J1326$-$6700 by analyzing the data acquired from the Parkes 64 m radio telescope at 1369 MHz. During the abnormal mode, the emission at the central and trailing components becomes extremely weak. Meanwhile, the leading emission shifts toward earlier longitude by almost 2°, and remains in this position for typically less than a minute. The mean flux density of the normal mode is almost five times that of the abnormal mode. Our data show that, for PSR J1326$-$6700, 85% of the time was spent in the normal mode and 15% was in the abnormal mode. The intrinsic distributions of mode timescales can be well described by Weibull distributions, which present a certain amount of memory in mode switching. Furthermore, a quasiperiodicity has been identified in the mode switching in pulsar J1326$-$6700. The estimated delay emission heights based on the kinematical effects indicate that the abnormal mode may have originated from higher altitude than the normal mode. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: 10 pages, 8 figures

arXiv:2011.04249 [pdf, other]

Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

Authors: Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen

Abstract: The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR). However, these methods only utilize the enhanced feature as the input of the speech recognition component, which are affected by the speech distortion problem. In order to address this problem, this paper proposes a gated recurr… ▽ More The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR). However, these methods only utilize the enhanced feature as the input of the speech recognition component, which are affected by the speech distortion problem. In order to address this problem, this paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR. The GRF algorithm is used to dynamically combine the noisy and enhanced features. Therefore, the GRF can not only remove the noise signals from the enhanced features, but also learn the raw fine structures from the noisy features so that it can alleviate the speech distortion. The proposed method consists of speech enhancement, GRF and speech recognition. Firstly, the mask based speech enhancement network is applied to enhance the input speech. Secondly, the GRF is applied to address the speech distortion problem. Thirdly, to improve the performance of ASR, the state-of-the-art speech transformer algorithm is used as the speech recognition component. Finally, the joint training framework is utilized to optimize these three components, simultaneously. Our experiments are conducted on an open-source Mandarin speech corpus called AISHELL-1. Experimental results show that the proposed method achieves the relative character error rate (CER) reduction of 10.04\% over the conventional joint enhancement and transformer method only using the enhanced features. Especially for the low signal-to-noise ratio (0 dB), our proposed method can achieves better performances with 12.67\% CER reduction, which suggests the potential of our proposed method. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

arXiv:2011.02120 [pdf, other]

Learning Discriminative Representations for Fine-Grained Diabetic Retinopathy Grading

Authors: Li Tian, Liyan Ma, Zhijie Wen, Shaorong Xie, Yupeng Xu

Abstract: Diabetic retinopathy (DR) is one of the leading causes of blindness. However, no specific symptoms of early DR lead to a delayed diagnosis, which results in disease progression in patients. To determine the disease severity levels, ophthalmologists need to focus on the discriminative parts of the fundus images. In recent years, deep learning has achieved great success in medical image analysis. Ho… ▽ More Diabetic retinopathy (DR) is one of the leading causes of blindness. However, no specific symptoms of early DR lead to a delayed diagnosis, which results in disease progression in patients. To determine the disease severity levels, ophthalmologists need to focus on the discriminative parts of the fundus images. In recent years, deep learning has achieved great success in medical image analysis. However, most works directly employ algorithms based on convolutional neural networks (CNNs), which ignore the fact that the difference among classes is subtle and gradual. Hence, we consider automatic image grading of DR as a fine-grained classification task, and construct a bilinear model to identify the pathologically discriminative areas. In order to leverage the ordinal information among classes, we use an ordinal regression method to obtain the soft labels. In addition, other than only using a categorical loss to train our network, we also introduce the metric loss to learn a more discriminative feature space. Experimental results demonstrate the superior performance of the proposed method on two public IDRiD and DeepDR datasets. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Comments: 5 pages

arXiv:2011.00171 [pdf, other]

doi 10.1038/s41586-020-2827-2

Diverse polarization angle swings from a repeating fast radio burst source

Authors: R. Luo, B. J. Wang, Y. P. Men, C. F. Zhang, J. C. Jiang, H. Xu, W. Y. Wang, K. J. Lee, J. L. Han, B. Zhang, R. N. Caballero, M. Z. Chen, X. L. Chen, H. Q. Gan, Y. J. Guo, L. F. Hao, Y. X. Huang, P. Jiang, H. Li, J. Li, Z. X. Li, J. T. Luo, J. Pan, X. Pei, L. Qian , et al. (12 additional authors not shown)

Abstract: Fast radio bursts (FRBs) are millisecond-duration radio transients of unknown origin. Two possible mechanisms that could generate extremely coherent emission from FRBs invoke neutron star magnetospheres or relativistic shocks far from the central energy source. Detailed polarization observations may help us to understand the emission mechanism. However, the available FRB polarization data have bee… ▽ More Fast radio bursts (FRBs) are millisecond-duration radio transients of unknown origin. Two possible mechanisms that could generate extremely coherent emission from FRBs invoke neutron star magnetospheres or relativistic shocks far from the central energy source. Detailed polarization observations may help us to understand the emission mechanism. However, the available FRB polarization data have been perplexing, because they show a host of polarimetric properties, including either a constant polarization angle during each burst for some repeaters or variable polarization angles in some other apparently one-off events. Here we report observations of 15 bursts from FRB 180301 and find various polarization angle swings in seven of them. The diversity of the polarization angle features of these bursts is consistent with a magnetospheric origin of the radio emission, and disfavours the radiation models invoking relativistic shocks. △ Less

Submitted 30 October, 2020; originally announced November 2020.

Comments: Published online in Nature on 29 Oct, 2020

Journal ref: Nature, Volume 586, Pages 693--696 (2020)

arXiv:2010.14818 [pdf]

doi 10.35848/1882-0786/abd37d

Self-sweeping ytterbium-doped fiber laser based on a fiber saturable absorber

Authors: Zengrun Wen, Kaile Wang, Baole Lu, Haowei Chen, Jintao Bai

Abstract: Generally speaking, the self-sweeping effect relies on the dynamical grating formed in a gain fiber. Here, the normal self-sweeping was generated in a pump-free ytterbium-doped fiber which serves as a fiber saturable absorber and is introduced to the laser cavity by a circulator in this experiment. The sweeping rate and the sweeping range alter as usual, both of which can be controlled by the pump… ▽ More Generally speaking, the self-sweeping effect relies on the dynamical grating formed in a gain fiber. Here, the normal self-sweeping was generated in a pump-free ytterbium-doped fiber which serves as a fiber saturable absorber and is introduced to the laser cavity by a circulator in this experiment. The sweeping rate and the sweeping range alter as usual, both of which can be controlled by the pump power. Further, a new self-pulse signal is observed and discussed in this work, which shows the difference of the self-sweeping effects between active fiber and fiber saturable absorber. △ Less

Submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.14798 [pdf, other]

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

Authors: Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi wen

Abstract: Despite the recent significant advances witnessed in end-to-end (E2E) ASR system for code-switching, hunger for audio-text paired data limits the further improvement of the models' performance. In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage. The model is decoupled into two parts:… ▽ More Despite the recent significant advances witnessed in end-to-end (E2E) ASR system for code-switching, hunger for audio-text paired data limits the further improvement of the models' performance. In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage. The model is decoupled into two parts: audio-to-phoneme (A2P) network and phoneme-to-text (P2T) network. The A2P network can learn acoustic pattern scenarios using large-scale monolingual paired data. Meanwhile, it generates multiple phoneme sequence candidates for single audio data in real-time during the training process. Then the generated phoneme-text paired data is used to train the P2T network. This network can be pre-trained with large amounts of external unpaired text data. By using monolingual data and unpaired text data, the decoupled transformer model reduces the high dependency on code-switching paired training data of E2E model to a certain extent. Finally, the two networks are optimized jointly through attention fusion. We evaluate the proposed method on the public Mandarin-English code-switching dataset. Compared with our transformer baseline, the proposed method achieves 18.14% relative mix error rate reduction. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: 5 pages, 1 figures

arXiv:2010.14791 [pdf, other]

One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition

Authors: Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen

Abstract: The RNN-Transducers and improved attention-based encoder-decoder models are widely applied to streaming speech recognition. Compared with these two end-to-end models, the CTC model is more efficient in training and inference. However, it cannot capture the linguistic dependencies between the output tokens. Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder an… ▽ More The RNN-Transducers and improved attention-based encoder-decoder models are widely applied to streaming speech recognition. Compared with these two end-to-end models, the CTC model is more efficient in training and inference. However, it cannot capture the linguistic dependencies between the output tokens. Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder and the two-stage inference method into the streaming CTC model. During inference, the CTC decoder first generates many candidates in a streaming fashion. Then the transformer decoder selects the best candidate based on the corresponding acoustic encoded states. The second-stage transformer decoder can be regarded as a conditional language model. We assume that a large enough number and enough diversity of candidates generated in the first stage can compensate the CTC model for the lack of language modeling ability. All the experiments are conducted on a Chinese Mandarin dataset AISHELL-1. The results show that our proposed model can implement streaming decoding in a fast and straightforward way. Our model can achieve up to a 20% reduction in the character error rate than the baseline CTC model. In addition, our model can also perform non-streaming inference with only a little performance degradation. △ Less

Submitted 3 April, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.12356 [pdf, ps, other]

Meromorphic functions of finite $\varphi$-order and linear $q$-difference equations

Authors: Janne Heittokangas, Jun Wang, Zhi-Tao Wen, Hui Yu

Abstract: The $\varphi$-order was introduced in 2009 for meromorphic functions in the unit disc, and was used as a growth indicator for solutions of linear differential equations. In this paper, the properties of meromorphic functions in the complex plane are investigated in terms of the $\varphi$-order, which measures the growth of functions between the classical order and the logarithmic order. Several re… ▽ More The $\varphi$-order was introduced in 2009 for meromorphic functions in the unit disc, and was used as a growth indicator for solutions of linear differential equations. In this paper, the properties of meromorphic functions in the complex plane are investigated in terms of the $\varphi$-order, which measures the growth of functions between the classical order and the logarithmic order. Several results on value distribution of meromorphic functions are discussed by using the $\varphi$-order and the $\varphi$-exponent of convergence. Instead of linear differential equations, the applications in the complex plane lie in linear $q$-difference equations. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: 28 pages

MSC Class: 39A13

arXiv:2010.11551 [pdf, ps, other]

doi 10.1093/mnras/staa3308

Photometric redshifts for galaxies in the Subaru Hyper Suprime-Cam and unWISE and a catalogue of identified clusters of galaxies

Authors: Z. L. Wen, J. L. Han

Abstract: We first present a catalogue of photometric redshifts for 14.68 million galaxies derived from the 7-band photometric data of Hyper Suprime-Cam Subaru Strategic Program and the Wide-field Infrared Survey Explorer using the nearest-neighbour algorithm. The redshift uncertainty is about 0.024 for galaxies of z<0.7, and steadily increases with redshift to about 0.11 at z~2. From such a large data set,… ▽ More We first present a catalogue of photometric redshifts for 14.68 million galaxies derived from the 7-band photometric data of Hyper Suprime-Cam Subaru Strategic Program and the Wide-field Infrared Survey Explorer using the nearest-neighbour algorithm. The redshift uncertainty is about 0.024 for galaxies of z<0.7, and steadily increases with redshift to about 0.11 at z~2. From such a large data set, we identify 21,661 clusters of galaxies, among which 5537 clusters have redshifts z>1 and 642 clusters have z>1.5, significantly enlarging the high redshift sample of galaxy clusters. Cluster richness and mass are estimated, and these clusters have an equivalent mass of M_{500} > 0.7*10^{14} Msun. We find that the stellar mass of the brightest cluster galaxies (BCGs) in each richness bin does not significantly evolve with redshift. The fraction of star-forming BCGs increases with redshift, but does not depend on cluster mass. △ Less

Submitted 19 November, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 16 pages, 18 figures, 2 tables, updated version after proof checking, online data available

arXiv:2010.02330 [pdf, other]

A Benchmark and Baseline for Language-Driven Image Editing

Authors: Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu

Abstract: Language-driven image editing can significantly save the laborious image editing work and be friendly to the photography novice. However, most similar work can only deal with a specific image domain or can only do global retouching. To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annota… ▽ More Language-driven image editing can significantly save the laborious image editing work and be friendly to the photography novice. However, most similar work can only deal with a specific image domain or can only do global retouching. To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations. Besides, we also propose a baseline method that fully utilizes the annotation to solve this problem. Our new method treats each editing operation as a sub-module and can automatically predict operation parameters. Not only performing well on challenging user data, but such an approach is also highly interpretable. We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted by ACCV 2020

arXiv:2009.03140 [pdf, ps, other]

Edge Learning with Unmanned Ground Vehicle: Joint Path, Energy and Sample Size Planning

Authors: Dan Liu, Shuai Wang, Zhigang Wen, Lei Cheng, Miaowen Wen, Yik-Chung Wu

Abstract: Edge learning (EL), which uses edge computing as a platform to execute machine learning algorithms, is able to fully exploit the massive sensing data generated by Internet of Things (IoT). However, due to the limited transmit power at IoT devices, collecting the sensing data in EL systems is a challenging task. To address this challenge, this paper proposes to integrate unmanned ground vehicle (UG… ▽ More Edge learning (EL), which uses edge computing as a platform to execute machine learning algorithms, is able to fully exploit the massive sensing data generated by Internet of Things (IoT). However, due to the limited transmit power at IoT devices, collecting the sensing data in EL systems is a challenging task. To address this challenge, this paper proposes to integrate unmanned ground vehicle (UGV) with EL. With such a scheme, the UGV could improve the communication quality by approaching various IoT devices. However, different devices may transmit different data for different machine learning jobs and a fundamental question is how to jointly plan the UGV path, the devices' energy consumption, and the number of samples for different jobs? This paper further proposes a graph-based path planning model, a network energy consumption model and a sample size planning model that characterizes F-measure as a function of the minority class sample size. With these models, the joint path, energy and sample size planning (JPESP) problem is formulated as a large-scale mixed integer nonlinear programming (MINLP) problem, which is nontrivial to solve due to the high-dimensional discontinuous variables related to UGV movement. To this end, it is proved that each IoT device should be served only once along the path, thus the problem dimension is significantly reduced. Furthermore, to handle the discontinuous variables, a tabu search (TS) based algorithm is derived, which converges in expectation to the optimal solution to the JPESP problem. Simulation results under different task scenarios show that our optimization schemes outperform the fixed EL and the full path EL schemes. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: 16 pages, 6 figures, to appear in IEEE Internet of Things Journal

arXiv:2008.09976 [pdf, other]

Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing

Authors: Cuiyun Gao, Jichuan Zeng, Zhiyuan Wen, David Lo, Xin Xia, Irwin King, Michael R. Lyu

Abstract: Millions of mobile apps are available in app stores, such as Apple's App Store and Google Play. For a mobile app, it would be increasingly challenging to stand out from the enormous competitors and become prevalent among users. Good user experience and well-designed functionalities are the keys to a successful app. To achieve this, popular apps usually schedule their updates frequently. If we can… ▽ More Millions of mobile apps are available in app stores, such as Apple's App Store and Google Play. For a mobile app, it would be increasingly challenging to stand out from the enormous competitors and become prevalent among users. Good user experience and well-designed functionalities are the keys to a successful app. To achieve this, popular apps usually schedule their updates frequently. If we can capture the critical app issues faced by users in a timely and accurate manner, developers can make timely updates, and good user experience can be ensured. There exist prior studies on analyzing reviews for detecting emerging app issues. These studies are usually based on topic modeling or clustering techniques. However, the short-length characteristics and sentiment of user reviews have not been considered. In this paper, we propose a novel emerging issue detection approach named MERIT to take into consideration the two aforementioned characteristics. Specifically, we propose an Adaptive Online Biterm Sentiment-Topic (AOBST) model for jointly modeling topics and corresponding sentiments that takes into consideration app versions. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version, and automatically interpret the meaning of the topics with most relevant phrases and sentences. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT in identifying emerging app issues, improving the state-of-the-art method by 22.3% in terms of F1-score. In terms of efficiency, MERIT can return results within acceptable time. △ Less

Submitted 23 August, 2020; originally announced August 2020.

arXiv:2008.07353 [pdf, ps, other]

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

Authors: Wenlong Mou, Zheng Wen, Xi Chen

Abstract: We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which a… ▽ More We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which are intractably large in many practical problems. To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP). Using a simulator oracle, we prove a near-optimal sample complexity upper bound that only depends linearly on the eluder dimension. We further prove a similar regret bound in deterministic systems without the simulator. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:2008.03942 [pdf, other]

Joint Bandwidth Allocation and Path Selection in WANs with Path Cardinality Constraints

Authors: Jinxin Wang, Fan Zhang, Zhonglin Xie, Gong Zhang, Zaiwen Wen

Abstract: In this paper, we study a joint bandwidth allocation and path selection problem via solving a multi-objective minimization problem under the path cardinality constraints, namely MOPC. Our problem formulation captures various types of objectives including the proportional fairness, the total completion time, as well as the worst-case link utilization ratio. Such an optimization problem is very chal… ▽ More In this paper, we study a joint bandwidth allocation and path selection problem via solving a multi-objective minimization problem under the path cardinality constraints, namely MOPC. Our problem formulation captures various types of objectives including the proportional fairness, the total completion time, as well as the worst-case link utilization ratio. Such an optimization problem is very challenging since it is highly non-convex. Almost all existing works deal with such a problem using relaxation techniques to transform it to be a convex optimization problem. However, we provide a novel solution framework based on the classic alternating direction method of multipliers (ADMM) approach for solving this problem. Our proposed algorithm is simple and easy to be implemented. Each step of our algorithm consists of either finding the maximal root of a single-cubic equation which is guaranteed to have at least one positive solution or solving a one-dimensional convex subproblem in a fixed interval. Under some mild assumptions, we prove that any limiting point of the generated sequence under our proposed algorithm is a stationary point. Extensive numerical simulations are performed to demonstrate the advantages of our algorithm compared with various baselines. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: Submitted to IEEE TSP and being under review

arXiv:2007.15788 [pdf, other]

Stochastic Low-rank Tensor Bandits for Multi-dimensional Online Decision Making

Authors: Jie Zhou, Botao Hao, Zheng Wen, Jingfei Zhang, Will Wei Sun

Abstract: Multi-dimensional online decision making plays a crucial role in many real applications such as online recommendation and digital marketing. In these problems, a decision at each time is a combination of choices from different types of entities. To solve it, we introduce stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low-rank tensor. We consider t… ▽ More Multi-dimensional online decision making plays a crucial role in many real applications such as online recommendation and digital marketing. In these problems, a decision at each time is a combination of choices from different types of entities. To solve it, we introduce stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low-rank tensor. We consider two settings, tensor bandits without context and tensor bandits with context. In the first setting, the platform aims to find the optimal decision with the highest expected reward, a.k.a, the largest entry of true reward tensor. In the second setting, some modes of the tensor are contexts and the rest modes are decisions, and the goal is to find the optimal decision given the contextual information. We propose two learning algorithms tensor elimination and tensor epoch-greedy for tensor bandits without context, and derive finite-time regret bounds for them. Comparing with existing competitive methods, tensor elimination has the best overall regret bound and tensor epoch-greedy has a sharper dependency on dimensions of the reward tensor. Furthermore, we develop a practically effective Bayesian algorithm called tensor ensemble sampling for tensor bandits with context. Extensive simulations and real analysis in online advertising data back up our theoretical findings and show that our algorithms outperform various state-of-the-art approaches that ignore the tensor low-rank structure. △ Less

Submitted 13 February, 2024; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: Accepted by Journal of the American Statistical Association

arXiv:2007.10583 [pdf, other]

doi 10.1016/j.optlastec.2021.107045

Narrow bandwidth Q-switched Erbium-doped fiber laser based on dynamic saturable absorption filtering effect

Authors: Zengrun Wen, Kaile Wang, Shuangcheng Chen, Xinyuan Qi, Baole Lu, Jintao Bai

Abstract: We proposed a narrow spectral bandwidth Erbium-doped fiber (EDF) laser Q-switched by a homemade saturable dynamic induced grating (SDIG) which is introduced via reforming the structure of a fiber saturable absorbers FSA with a piece of EDF and a fiber Bragg grating. The SDIG integrates both saturable absorption and spectral filtering effect simultaneously, which was confirmed through theoretical a… ▽ More We proposed a narrow spectral bandwidth Erbium-doped fiber (EDF) laser Q-switched by a homemade saturable dynamic induced grating (SDIG) which is introduced via reforming the structure of a fiber saturable absorbers FSA with a piece of EDF and a fiber Bragg grating. The SDIG integrates both saturable absorption and spectral filtering effect simultaneously, which was confirmed through theoretical analysis and experimental results for the first time, to the best of our knowledge. Further study verified that the spectral width of the Q-switched emissions is decided by the length of the SDIG and the input power of the pump source. The Q-switched pulse with the narrowest spectral width of about 29.1 pm achieved in this work is the narrowest bandwidth pulse in the domain of the FSA Q-switched fiber lasers when the length of SDIG and pump power are 20 cm and 250 mW, respectively. Our method provides a simple way to obtain the Q-switched pulses with narrow bandwidths, which have promising applications for nonlinear frequency conversion, Doppler LIDAR and coherent beam combinations. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2007.06202 [pdf, ps, other]

Structured Policy Iteration for Linear Quadratic Regulator

Authors: Youngsuk Park, Ryan A. Rossi, Zheng Wen, Gang Wu, Handong Zhao

Abstract: Linear quadratic regulator (LQR) is one of the most popular frameworks to tackle continuous Markov decision process tasks. With its fundamental theory and tractable optimal policy, LQR has been revisited and analyzed in recent years, in terms of reinforcement learning scenarios such as the model-free or model-based setting. In this paper, we introduce the \textit{Structured Policy Iteration} (S-PI… ▽ More Linear quadratic regulator (LQR) is one of the most popular frameworks to tackle continuous Markov decision process tasks. With its fundamental theory and tractable optimal policy, LQR has been revisited and analyzed in recent years, in terms of reinforcement learning scenarios such as the model-free or model-based setting. In this paper, we introduce the \textit{Structured Policy Iteration} (S-PI) for LQR, a method capable of deriving a structured linear policy. Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy: more interpretable, memory-efficient, and well-suited for the distributed setting. In order to derive such a policy, we first cast a regularized LQR problem when the model is known. Then, our Structured Policy Iteration (S-PI) algorithm, which takes a policy evaluation step and a policy improvement step in an iterative manner, can solve this regularized LQR efficiently. We further extend the S-PI algorithm to the model-free setting where a smoothing procedure is adopted to estimate the gradient. In both the known-model and model-free setting, we prove convergence analysis under the proper choice of parameters. Finally, the experiments demonstrate the advantages of S-PI in terms of balancing the LQR performance and level of structure by varying the weight parameter. △ Less

Submitted 13 July, 2020; originally announced July 2020.

arXiv:2007.04915 [pdf, other]

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Authors: Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

Abstract: We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act effic… ▽ More We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines. △ Less

Submitted 9 July, 2020; originally announced July 2020.

arXiv:2007.03865 [pdf, ps, other]

doi 10.3847/1538-4357/aba2e8

Giant micropulse emission in the Vela pulsar at C band

Authors: J. L. Chen, Z. G. Wen, L. F. Hao, J. P. Yuan, J. Li, H. G. Wang, W. M. Yan, K. J. Lee, N. Wang, Y. H. Xu, Z. X. Li, Y. X. Huang, R. Yuen, M. Mijit

Abstract: We present here the analysis of giant micropulses from the Vela pulsar. A total of 4187 giant micropulses with peak flux density $>$2.5 Jy were detected during almost 4 hours of observations carried out with the Yunnan 40-m radio telescope at 6800 MHz. Nine of the giant micropulses arrived approximately 3 to 4 ms earlier than the peak of average pulse profile, longer than that at lower frequencies… ▽ More We present here the analysis of giant micropulses from the Vela pulsar. A total of 4187 giant micropulses with peak flux density $>$2.5 Jy were detected during almost 4 hours of observations carried out with the Yunnan 40-m radio telescope at 6800 MHz. Nine of the giant micropulses arrived approximately 3 to 4 ms earlier than the peak of average pulse profile, longer than that at lower frequencies. The remaining giant micropulses were clustered into three distributions which correspond to three main emission regions, including four occurring on the trailing edge of averaged profile. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: 9 pages, 8 figures

arXiv:2007.03861 [pdf, other]

On the Analysis of Model-free Methods for the Linear Quadratic Regulator

Authors: Zeyu Jin, Johann Michael Schmitt, Zaiwen Wen

Abstract: Many reinforcement learning methods achieve great success in practice but lack theoretical foundation. In this paper, we study the convergence analysis on the problem of the Linear Quadratic Regulator (LQR). The global linear convergence properties and sample complexities are established for several popular algorithms such as the policy gradient algorithm, TD-learning and the actor-critic (AC) alg… ▽ More Many reinforcement learning methods achieve great success in practice but lack theoretical foundation. In this paper, we study the convergence analysis on the problem of the Linear Quadratic Regulator (LQR). The global linear convergence properties and sample complexities are established for several popular algorithms such as the policy gradient algorithm, TD-learning and the actor-critic (AC) algorithm. Our results show that the actor-critic algorithm can reduce the sample complexity compared with the policy gradient algorithm. Although our analysis is still preliminary, it explains the benefit of AC algorithm in a certain sense. △ Less

Submitted 7 July, 2020; originally announced July 2020.

arXiv:2006.09606 [pdf, other]

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

Authors: Minghan Yang, Dong Xu, Hongyu Chen, Zaiwen Wen, Mengyun Chen

Abstract: In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian informat… ▽ More In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods. △ Less

Submitted 25 March, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:2006.07464 [pdf, other]

Hypermodels for Exploration

Authors: Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

Abstract: We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gain… ▽ More We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gains, enabling behavior that would otherwise require hundreds or thousands of elements, and even succeed in situations where ensemble methods fail to learn regardless of size. This allows more accurate approximation of Thompson sampling as well as use of more sophisticated exploration schemes. In particular, we consider an approximate form of information-directed sampling and demonstrate performance gains relative to Thompson sampling. As alternatives to ensembles, we consider linear and neural network hypermodels, also known as hypernetworks. We prove that, with neural network base models, a linear hypermodel can represent essentially any distribution over functions, and as such, hypernetworks are no more expressive. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: Published as a conference paper at ICLR 2020

arXiv:2006.05924 [pdf, ps, other]

Sketchy Empirical Natural Gradient Methods for Deep Learning

Authors: Minghan Yang, Dong Xu, Zaiwen Wen, Mengyun Chen, Pengxiang Xu

Abstract: In this paper, we develop an efficient sketchy empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement… ▽ More In this paper, we develop an efficient sketchy empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement are still not tractable due to the high dimensionality. We design randomized techniques for different neural network structures to resolve these challenges. For layers with a reasonable dimension, sketching can be performed on a regularized least squares subproblem. Otherwise, since the gradient is a vectorization of the product between two matrices, we apply sketching on the low-rank approximations of these matrices to compute the most expensive parts. A distributed version of SENG is also developed for extremely large-scale applications. Global convergence to stationary points is established under some mild assumptions and a fast linear convergence is analyzed under the neural tangent kernel (NTK) case. Extensive experiments on convolutional neural networks show the competitiveness of SENG compared with the state-of-the-art methods. On the task ResNet50 with ImageNet-1k, SENG achieves 75.9\% Top-1 testing accuracy within 41 epochs. Experiments on the distributed large-batch training show that the scaling efficiency is quite reasonable. △ Less

Submitted 25 March, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

arXiv:2006.03857 [pdf, other]

EPARS: Early Prediction of At-risk Students with Online and Offline Learning Behaviors

Authors: Yu Yang, Zhiyuan Wen, Jiannong Cao, Jiaxing Shen, Hongzhi Yin, Xiaofang Zhou

Abstract: Early prediction of students at risk (STAR) is an effective and significant means to provide timely intervention for dropout and suicide. Existing works mostly rely on either online or offline learning behaviors which are not comprehensive enough to capture the whole learning processes and lead to unsatisfying prediction performance. We propose a novel algorithm (EPARS) that could early predict ST… ▽ More Early prediction of students at risk (STAR) is an effective and significant means to provide timely intervention for dropout and suicide. Existing works mostly rely on either online or offline learning behaviors which are not comprehensive enough to capture the whole learning processes and lead to unsatisfying prediction performance. We propose a novel algorithm (EPARS) that could early predict STAR in a semester by modeling online and offline learning behaviors. The online behaviors come from the log of activities when students use the online learning management system. The offline behaviors derive from the check-in records of the library. Our main observations are two folds. Significantly different from good students, STAR barely have regular and clear study routines. We devised a multi-scale bag-of-regularity method to extract the regularity of learning behaviors that is robust to sparse data. Second, friends of STAR are more likely to be at risk. We constructed a co-occurrence network to approximate the underlying social network and encode the social homophily as features through network embedding. To validate the proposed algorithm, extensive experiments have been conducted among an Asian university with 15,503 undergraduate students. The results indicate EPARS outperforms baselines by 14.62% ~ 38.22% in predicting STAR. △ Less

Submitted 6 June, 2020; originally announced June 2020.

Comments: To be published in DASFAA 2020

arXiv:2005.08529 [pdf, other]

doi 10.1364/JOSAB.402915

Discretized optical dynamics in one-dimensionally synthetic photonic lattice

Authors: Zengrun Wen, Kaile Wang, Baole Lu, Xinyuan Qi, Haowei Chen, Jintao Bai

Abstract: Synthetic photonic lattice with temporally controlled potentials is a versatile platform for realizing wave dynamics associated with physical areas of optics and quantum physics. Here, discrete optics in one-dimensionally synthetic photonic lattice is investigated systematically, in which the light behavior is highly similar to those in evanescently coupled one-dimensional discrete waveguides. Suc… ▽ More Synthetic photonic lattice with temporally controlled potentials is a versatile platform for realizing wave dynamics associated with physical areas of optics and quantum physics. Here, discrete optics in one-dimensionally synthetic photonic lattice is investigated systematically, in which the light behavior is highly similar to those in evanescently coupled one-dimensional discrete waveguides. Such a synthetic dimension is constructed with position-dependent periodic effective gauge fields based on Aharonov-Bohm effect arising from the phase accumulations of the fiber loops. By tuning the phase accumulations and coupling coefficient of the coupler, the band translation and gap property can be modulated which further results in the impulse and tailored Gaussian wave packet responses as well as Talbot recurrences. In addition, Bloch oscillations and Anderson localization can also be obtained when the phase accumulations are linearly changed and weakly modulated in random, respectively. The periodic effective gauge fields configuration in our protocol enables SPL to be a research platform for one-dimensional dynamically modulated elements or even non-Hermitian waveguides. △ Less

Submitted 18 May, 2020; originally announced May 2020.

arXiv:2005.07903 [pdf, other]

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Authors: Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen

Abstract: Non-autoregressive transformer models have achieved extremely fast inference speed and comparable performance with autoregressive sequence-to-sequence models in neural machine translation. Most of the non-autoregressive transformers decode the target sequence from a predefined-length mask sequence. If the predefined length is too long, it will cause a lot of redundant calculations. If the predefin… ▽ More Non-autoregressive transformer models have achieved extremely fast inference speed and comparable performance with autoregressive sequence-to-sequence models in neural machine translation. Most of the non-autoregressive transformers decode the target sequence from a predefined-length mask sequence. If the predefined length is too long, it will cause a lot of redundant calculations. If the predefined length is shorter than the length of the target sequence, it will hurt the performance of the model. To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence. All the experiments are conducted on a public Chinese mandarin dataset AISHELL-1. The results show that the proposed model can accurately predict the length of the target sequence and achieve a competitive performance with the advanced transformers. What's more, the model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models. △ Less

Submitted 16 May, 2020; originally announced May 2020.

Comments: 5 pages

arXiv:2005.07326 [pdf]

doi 10.1063/5.0015491

How does boiling occur in lattice Boltzmann simulations?

Authors: Qing Li, Y. Yu, Z. X. Wen

Abstract: In recent years, the lattice Boltzmann (LB) method has been widely employed to simulate boiling phenomena [A. Márkus and G. Házi, Phys. Rev. E 83, 046705 (2011); Biferale et al., Phys. Rev. Lett. 108, 104502 (2012); Li et al., Phys. Rev. E 96, 063303 (2017); Wu et al., Int. J. Heat Mass Transfer 126, 773 (2018)]. However, a very important issue still remains open, i.e., how does boiling occur in t… ▽ More In recent years, the lattice Boltzmann (LB) method has been widely employed to simulate boiling phenomena [A. Márkus and G. Házi, Phys. Rev. E 83, 046705 (2011); Biferale et al., Phys. Rev. Lett. 108, 104502 (2012); Li et al., Phys. Rev. E 96, 063303 (2017); Wu et al., Int. J. Heat Mass Transfer 126, 773 (2018)]. However, a very important issue still remains open, i.e., how does boiling occur in the LB simulations? For instance, the existing LB studies showed that the boiling on a hydrophobic surface begins at a lower wall superheat than that on a hydrophilic surface, which qualitatively agrees well with experimental studies, but no one has yet explained how this phenomenon appears in the LB simulations and what happened in the simulations after changing the wettability of the heating surface. In this paper, the LB boiling mechanism is revealed by analyzing boiling on a flat surface with mixed wettability and boiling on a structured surface with homogeneous wettability. Through a theoretical analysis, we demonstrate that, when the same wall superheat is applied, in the LB boiling simulations the fluid density near the heating surface decreases faster on a hydrophobic surface than that on a hydrophilic surface. Accordingly, a lower wall superheat can induce the phase transition from liquid to vapor on a hydrophobic surface than that on a hydrophilic surface. Furthermore, a similar theoretical analysis shows that the fluid density decreases fastest at concave corners in the case of a structured surface with homogeneous wettability, which explains why vapor bubbles are nucleated at concave corners in the LB simulations of boiling on structured surfaces. △ Less

Submitted 20 May, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

Comments: 10 figures

Journal ref: Physics of Fluids 32, 093306 (2020)

arXiv:2005.05602 [pdf, ps, other]

Synthetic topological insulator with periodically modulated effective gauge fields

Authors: Zengrun Wen, Baole Lu, Kaiwen Ji, Kaile Wang, Haowei Chen, Xinyuan Qi, Jintao Bai

Abstract: We study both theoretically and numerically the topological edge states in synthetic photonic lattice with finitely periodic gauge potentials. The effective gauge fields are implemented by tailoring the phase alternatively and periodically, which finally results in symmetric total reflection at two boundaries of the one-dimensional synthetic lattice. Further tuning the nearest-neighbor coupling an… ▽ More We study both theoretically and numerically the topological edge states in synthetic photonic lattice with finitely periodic gauge potentials. The effective gauge fields are implemented by tailoring the phase alternatively and periodically, which finally results in symmetric total reflection at two boundaries of the one-dimensional synthetic lattice. Further tuning the nearest-neighbor coupling anisotropically, topological edge states occur at the two boundaries. Our work provides a new way to study the topological physics of one-dimensional coupled waveguide arrays with synthetic photonic lattice. △ Less

Submitted 12 May, 2020; originally announced May 2020.

arXiv:2005.04862 [pdf, other]

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Authors: Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Abstract: Although attention based end-to-end models have achieved promising performance in speech recognition, the multi-pass forward computation in beam-search increases inference time cost, which limits their practical applications. To address this issue, we propose a non-autoregressive end-to-end speech recognition system called LASO (listen attentively, and spell once). Because of the non-autoregressiv… ▽ More Although attention based end-to-end models have achieved promising performance in speech recognition, the multi-pass forward computation in beam-search increases inference time cost, which limits their practical applications. To address this issue, we propose a non-autoregressive end-to-end speech recognition system called LASO (listen attentively, and spell once). Because of the non-autoregressive property, LASO predicts a textual token in the sequence without the dependence on other tokens. Without beam-search, the one-pass propagation much reduces inference time cost of LASO. And because the model is based on the attention based feedforward structure, the computation can be implemented in parallel efficiently. We conduct experiments on publicly available Chinese dataset AISHELL-1. LASO achieves a character error rate of 6.4%, which outperforms the state-of-the-art autoregressive transformer model (6.7%). The average inference latency is 21 ms, which is 1/50 of the autoregressive transformer model. △ Less

Submitted 5 August, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

Comments: accepted by INTERSPEECH2020

arXiv:2005.01279 [pdf, other]

Improving Adversarial Text Generation by Modeling the Distant Future

Authors: Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Dinghan Shen, Guoyin Wang, Zheng Wen, Lawrence Carin

Abstract: Auto-regressive text generation models usually focus on local fluency, and may cause inconsistent semantic meaning in long text generation. Further, automatically generating words with similar semantics is challenging, and hand-crafted linguistic rules are difficult to apply. We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned is… ▽ More Auto-regressive text generation models usually focus on local fluency, and may cause inconsistent semantic meaning in long text generation. Further, automatically generating words with similar semantics is challenging, and hand-crafted linguistic rules are difficult to apply. We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues. Specifically, we propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments demonstrate that the proposed method leads to improved performance. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: ACL 2020. arXiv admin note: substantial text overlap with arXiv:1811.00696

arXiv:2004.13826 [pdf, other]

Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks

Authors: Yufeng Zhang, Xueli Yu, Zeyu Cui, Shu Wu, Zhongzhen Wen, Liang Wang

Abstract: Text classification is fundamental in natural language processing (NLP), and Graph Neural Networks (GNN) are recently applied in this task. However, the existing graph-based works can neither capture the contextual word relationships within each document nor fulfil the inductive learning of new words. In this work, to overcome such problems, we propose TextING for inductive text classification via… ▽ More Text classification is fundamental in natural language processing (NLP), and Graph Neural Networks (GNN) are recently applied in this task. However, the existing graph-based works can neither capture the contextual word relationships within each document nor fulfil the inductive learning of new words. In this work, to overcome such problems, we propose TextING for inductive text classification via GNN. We first build individual graphs for each document and then use GNN to learn the fine-grained word representations based on their local structures, which can also effectively produce embeddings for unseen words in the new document. Finally, the word nodes are aggregated as the document embedding. Extensive experiments on four benchmark datasets show that our method outperforms state-of-the-art text classification methods. △ Less

Submitted 12 May, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: To appear at ACL 2020

arXiv:2004.02420 [pdf, other]

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Authors: Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen

Abstract: Monaural speech dereverberation is a very challenging task because no spatial cues can be used. When the additive noises exist, this task becomes more challenging. In this paper, we propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features, which is based on the deep clustering (DC). DC is a state-of-the-art method for speech separation tha… ▽ More Monaural speech dereverberation is a very challenging task because no spatial cues can be used. When the additive noises exist, this task becomes more challenging. In this paper, we propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features, which is based on the deep clustering (DC). DC is a state-of-the-art method for speech separation that includes embedding learning and K-means clustering. As for our proposed method, it contains two stages: denoising and dereverberation. At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features. These embedding features are generated from the anechoic speech and residual reverberation signals. They can represent the inferred spectral masking patterns of the desired signals, which are discriminative features. At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another supervised neural network is utilized to estimate the anechoic speech from these deep embedding features. Finally, the denoising stage and dereverberation stage are optimized by the joint training method. Experimental results show that the proposed method outperforms the WPE and BLSTM baselines, especially in the low SNR condition. △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:2004.02118 [pdf, other]

doi 10.1145/3318464.3386145

GIANT: Scalable Creation of a Web-scale Ontology

Authors: Bang Liu, Weidong Guo, Di Niu, Jinwen Luo, Chaoyue Wang, Zhen Wen, Yu Xu

Abstract: Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, even… ▽ More Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: Accepted as full paper by SIGMOD 2020

arXiv:2004.00791 [pdf, ps, other]

doi 10.3847/1538-4357/ab8db6

Discovery of delayed spin-up behavior following two large glitches in the Crab pulsar, and the statistics of such processes

Authors: M. Y. Ge, S. N. Zhang, F. J. Lu, T. P. Li, J. P. Yuan, X. P. Zheng, Y. Huang, S. J. Zheng, Y. P. Chen, Z. Chang, Y. L. Tuo, Q. Cheng, C. Güngör, L. M. Song, Y. P. Xu, X. L. Cao, Y. Chen, C. Z. Liu, S. Zhang, J. L. Qu, Q. C. Bu, C. Cai, G. Chen, L. Chen, M. Z. Chen , et al. (111 additional authors not shown)

Abstract: Glitches correspond to sudden jumps of rotation frequency ($ν$) and its derivative ($\dotν$) of pulsars, the origin of which remains not well understood yet, partly because the jump processes of most glitches are not well time-resolved. There are three large glitches of the Crab pulsar, detected in 1989, 1996 and 2017, which were found to have delayed spin-up processes before the normal recovery p… ▽ More Glitches correspond to sudden jumps of rotation frequency ($ν$) and its derivative ($\dotν$) of pulsars, the origin of which remains not well understood yet, partly because the jump processes of most glitches are not well time-resolved. There are three large glitches of the Crab pulsar, detected in 1989, 1996 and 2017, which were found to have delayed spin-up processes before the normal recovery processes. Here we report two additional glitches of the Crab pulsar occurred in 2004 and 2011 for which we discovered delayed spin up processes, and present refined parameters of the largest glitch occurred in 2017. The initial rising time of the glitch is determined as $<0.48$ hour. We also carried out a statistical study of these five glitches with observed spin-up processes. The two glitches occurred in 2004 and 2011 have delayed spin-up time scales ($τ_{1}$) of $1.7\pm0.8$\,days and $1.6\pm0.4$\,days, respectively. We find that the $Δν$ vs. $|Δ{\dotν}|$ relation of these five glitches is similar to those with no detected delayed spin-up process, indicating that they are similar to the others in nature except that they have larger amplitudes. For these five glitches, the amplitudes of the delayed spin-up process ($|Δν_{\rm d1}|$) and recovery process ($Δν_{\rm d2}$), their time scales ($τ_{1}$, $τ_{2}$), and permanent changes in spin frequency ($Δν_{\rm p}$) and total frequency step ($Δν_{\rm g}$) have positive correlations. From these correlations, we suggest that the delayed spin-up processes are common for all glitches, but are too short and thus difficult to be detected for most glitches. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: 25 pages, 8 figures

arXiv:2003.10351 [pdf]

doi 10.1088/1361-6560/ab9c64

A modular phantom and software to characterize 3D geometric distortion in MRI

Authors: Jordan M. Slagowski, Yao Ding, Manik Aima, Zhifei Wen, Clifton D. Fuller, Caroline Chung, J. Matthew Debnam, Ken-Pin Hwang, Mo Kadbi, Janio Szklaruk, Jihong Wang

Abstract: MRI offers outstanding soft tissue contrast that may reduce uncertainties in target and organ-at-risk delineation and enable online adaptive image-guided treatment. Spatial distortions resulting from non-linearities in the gradient fields and non-uniformity in the main magnetic field must be accounted for across the imaging field-of-view to prevent systematic errors during treatment delivery. This… ▽ More MRI offers outstanding soft tissue contrast that may reduce uncertainties in target and organ-at-risk delineation and enable online adaptive image-guided treatment. Spatial distortions resulting from non-linearities in the gradient fields and non-uniformity in the main magnetic field must be accounted for across the imaging field-of-view to prevent systematic errors during treatment delivery. This work presents a modular phantom and software application to characterize geometric distortion (GD) within the large field-of-view MRI images required for radiation therapy simulation. The modular phantom is assembled from a series of rectangular foam blocks containing high-contrast fiducial markers in a known configuration. The modular phantom design facilitates transportation of the phantom between different MR scanners and MR-guided linear accelerators and allows the phantom to be adapted to fit different sized bores or coils. The phantom was evaluated using a 1.5T MR-guided linear accelerator (MR-Linac) and 1.5T and 3.0T diagnostic scanners. Performance was assessed by varying acquisition parameters to induce image distortions in a known manner. Imaging was performed using T1 and T2 weighted pulse sequences with 2D and 3D distortion correction algorithms and the receiver bandwidth (BW) varied as 250-815 Hz/pixel. Phantom set-up reproducibility was evaluated across independent set-ups. The software was validated by comparison with a non-modular phantom. Average geometric distortion was 0.94+/-0.58 mm for the MR-Linac, 0.90+/-0.53 mm for the 1.5 T scanner, and 1.15+/-0.62 mm for the 3.0T scanner, for a 400 mm diameter volume-of-interest. GD increased, as expected, with decreasing BW, and with the 2D versus 3D correction algorithm. Differences in GD attributed to phantom set-up were 0.13 mm or less. Differences in GD for the two software applications were less than 0.07 mm. △ Less

Submitted 6 April, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

Comments: 25 pages

arXiv:2003.07544 [pdf, other]

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

Authors: Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen, Xuefei Liu

Abstract: In this paper, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. At first, a time-frequency domain speech separation method is applied as the pre-separation stage. The aim of pre-separation stage is to separate the mixture preliminarily. Although this stage can separate the mixture, it still contains the residual int… ▽ More In this paper, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. At first, a time-frequency domain speech separation method is applied as the pre-separation stage. The aim of pre-separation stage is to separate the mixture preliminarily. Although this stage can separate the mixture, it still contains the residual interference. In order to enhance the pre-separated speech and improve the separation performance further, the end-to-end post-filter (E2EPF) with deep attention fusion features is proposed. The E2EPF can make full use of the prior knowledge of the pre-separated speech, which contributes to speech separation. It is a fully convolutional speech separation network and uses the waveform as the input features. Firstly, the 1-D convolutional layer is utilized to extract the deep representation features for the mixture and pre-separated signals in the time domain. Secondly, to pay more attention to the outputs of the pre-separation stage, an attention module is applied to acquire deep attention fusion features, which are extracted by computing the similarity between the mixture and the pre-separated speech. These deep attention fusion features are conducive to reduce the interference and enhance the pre-separated speech. Finally, these features are sent to the post-filter to estimate each target signals. Experimental results on the WSJ0-2mix dataset show that the proposed method outperforms the state-of-the-art speech separation method. Compared with the pre-separation method, our proposed method can acquire 64.1%, 60.2%, 25.6% and 7.5% relative improvements in scale-invariant source-to-noise ratio (SI-SNR), the signal-to-distortion ratio (SDR), the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility (STOI) measures, respectively. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: ACCEPTED by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

arXiv:2003.00739 [pdf, other]

Long Short-Term Sample Distillation

Authors: Liang Jiang, Zujie Wen, Zhongping Liang, Yafang Wang, Gerard de Melo, Zhe Li, Liangzhuang Ma, Jiaxing Zhang, Xiaolong Li, Yuan Qi

Abstract: In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training… ▽ More In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method. △ Less

Submitted 2 March, 2020; originally announced March 2020.

Comments: published as a conference paper at AAAI 2020

arXiv:2002.08513 [pdf, ps, other]

A Trust-Region Method For Nonsmooth Nonconvex Optimization

Authors: Ziang Chen, Andre Milzarek, Zaiwen Wen

Abstract: We propose a trust-region type method for a class of nonsmooth nonconvex optimization problems where the objective function is a summation of a (probably nonconvex) smooth function and a (probably nonsmooth) convex function. The model function of our trust-region subproblem is always quadratic and the linear term of the model is generated using abstract descent directions. Therefore, the trust-reg… ▽ More We propose a trust-region type method for a class of nonsmooth nonconvex optimization problems where the objective function is a summation of a (probably nonconvex) smooth function and a (probably nonsmooth) convex function. The model function of our trust-region subproblem is always quadratic and the linear term of the model is generated using abstract descent directions. Therefore, the trust-region subproblems can be easily constructed as well as efficiently solved by cheap and standard methods. When the accuracy of the model function at the solution of the subproblem is not sufficient, we add a safeguard on the stepsizes for improving the accuracy. For a class of functions that can be "truncated", an additional truncation step is defined and a stepsize modification strategy is designed. The overall scheme converges globally and we establish fast local convergence under suitable assumptions. In particular, using a connection with a smooth Riemannian trust-region method, we prove local quadratic convergence for partly smooth functions under a strict complementary condition. Preliminary numerical results on a family of $\ell_1$-optimization problems are reported and demonstrate the efficiency of our approach. △ Less

Submitted 23 October, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

arXiv:2002.06979 [pdf, ps, other]

Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning

Authors: Zixin Wen

Abstract: Unsupervised contrastive learning has gained increasing attention in the latest research and has proven to be a powerful method for learning representations from unlabeled data. However, little theoretical analysis was known for this framework. In this paper, we study the optimization of deep unsupervised contrastive learning. We prove that, by applying end-to-end training that simultaneously upda… ▽ More Unsupervised contrastive learning has gained increasing attention in the latest research and has proven to be a powerful method for learning representations from unlabeled data. However, little theoretical analysis was known for this framework. In this paper, we study the optimization of deep unsupervised contrastive learning. We prove that, by applying end-to-end training that simultaneously updates two deep over-parameterized neural networks, one can find an approximate stationary solution for the non-convex contrastive loss. This result is inherently different from the existing over-parameterized analysis in the supervised setting because, in contrast to learning a specific target function, unsupervised contrastive learning tries to encode the unlabeled data distribution into the neural networks, which generally has no optimal solution. Our analysis provides theoretical insights into the practical success of these unsupervised pretraining methods. △ Less

Submitted 30 May, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:2002.04398 [pdf, other]

PT-symmetric potentials having continuous spectra

Authors: Zichao Wen, Carl M. Bender

Abstract: One-dimensional PT-symmetric quantum-mechanical Hamiltonians having continuous spectra are studied. The Hamiltonians considered have the form $H=p^2+V(x)$, where $V(x)$ is odd in $x$, pure imaginary, and vanishes as $|x|\to\infty$. Five PT-symmetric potentials are studied: the Scarf-II potential $V_1(x)=iA_1\,{\rm sech}(x)\tanh(x)$, which decays exponentially for large $|x|$; the rational potentia… ▽ More One-dimensional PT-symmetric quantum-mechanical Hamiltonians having continuous spectra are studied. The Hamiltonians considered have the form $H=p^2+V(x)$, where $V(x)$ is odd in $x$, pure imaginary, and vanishes as $|x|\to\infty$. Five PT-symmetric potentials are studied: the Scarf-II potential $V_1(x)=iA_1\,{\rm sech}(x)\tanh(x)$, which decays exponentially for large $|x|$; the rational potentials $V_2(x)=iA_2\,x/(1+x^4)$ and $V_3(x)=iA_3\,x/(1+|x|^3)$, which decay algebraically for large $|x|$; the step-function potential $V_4(x)=iA_4\,{\rm sgn}(x)θ(2.5-|x|)$, which has compact support; the regulated Coulomb potential $V_5(x)=iA_5\,x/(1+x^2)$, which decays slowly as $|x|\to\infty$ and may be viewed as a long-range potential. The real parameters $A_n$ measure the strengths of these potentials. Numerical techniques for solving the time-independent Schrödinger eigenvalue problems associated with these potentials reveal that the spectra of the corresponding Hamiltonians exhibit universal properties. In general, the eigenvalues are partly real and partly complex. The real eigenvalues form the continuous part of the spectrum and the complex eigenvalues form the discrete part of the spectrum. The real eigenvalues range continuously in value from $0$ to $+\infty$. The complex eigenvalues occur in discrete complex-conjugate pairs and for $V_n(x)$ ($1\leq n\leq4$) the number of these pairs is finite and increases as the value of the strength parameter $A_n$ increases. However, for $V_5(x)$ there is an {\it infinite} sequence of discrete eigenvalues with a limit point at the origin. This sequence is complex, but it is similar to the Balmer series for the hydrogen atom because it has inverse-square convergence. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: 15 pages, 9 figures

arXiv:2002.01626 [pdf, other]

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

Authors: Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen

Abstract: Multi-channel deep clustering (MDC) has acquired a good performance for speech separation. However, MDC only applies the spatial features as the additional information. So it is difficult to learn mutual relationship between spatial and spectral features. Besides, the training objective of MDC is defined at embedding vectors, rather than real separated sources, which may damage the separation perf… ▽ More Multi-channel deep clustering (MDC) has acquired a good performance for speech separation. However, MDC only applies the spatial features as the additional information. So it is difficult to learn mutual relationship between spatial and spectral features. Besides, the training objective of MDC is defined at embedding vectors, rather than real separated sources, which may damage the separation performance. In this work, we propose a deep attention fusion method to dynamically control the weights of the spectral and spatial features and combine them deeply. In addition, to solve the training objective problem of MDC, the real separated sources are used as the training objectives. Specifically, we apply the deep clustering network to extract deep embedding features. Instead of using the unsupervised K-means clustering to estimate binary masks, another supervised network is utilized to learn soft masks from these deep embedding features. Our experiments are conducted on a spatialized reverberant version of WSJ0-2mix dataset. Experimental results show that the proposed method outperforms MDC baseline and even better than the oracle ideal binary mask (IBM). △ Less

Submitted 4 February, 2020; originally announced February 2020.

arXiv:2001.09624 [pdf, other]

SecEL: Privacy-Preserving, Verifiable and Fault-Tolerant Edge Learning for Autonomous Vehicles

Authors: Jiasi Weng, Jian Weng, Yue Zhang, Ming Li, Zhaodi Wen

Abstract: Mobile edge computing (MEC) is an emerging technology to transform the cloud-based computing services into the edge-based ones. Autonomous vehicular network (AVNET), as one of the most promising applications of MEC, can feature edge learning and communication techniques, improving the safety for autonomous vehicles (AVs). This paper focuses on the edge learning in AVNET, where AVs at the edge of t… ▽ More Mobile edge computing (MEC) is an emerging technology to transform the cloud-based computing services into the edge-based ones. Autonomous vehicular network (AVNET), as one of the most promising applications of MEC, can feature edge learning and communication techniques, improving the safety for autonomous vehicles (AVs). This paper focuses on the edge learning in AVNET, where AVs at the edge of the network share model parameters instead of data in a distributed manner, and an aggregator (e.g., a base station) aggregates parameters from AVs and at the end obtains a trained model. Despite promising, security issues, such as data leakage, computing integrity invasion and fault connection in existing edge learning cases are not considered fully. To the best of our knowledge, there lacks an effective scheme simultaneously covering the foregoing security issues. Therefore, we propose \textit{SecEL}, a privacy-preserving, verifiable and fault-tolerant scheme for edge learning in AVNET. First, we leverage the primitive of bivariate polynomial-based secret sharing to encrypt model parameters by one-time padding. Second, we use homomorphic authenticator based on message authentication code to support verifiable computation. Third, we mitigate the computation failure problem caused by fault connection. Last, we simulate and evaluate SecEL in terms of time cost, throughput and classification accuracy. The experiment results demonstrate the effectiveness of SecEL. △ Less

Submitted 16 February, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

arXiv:2001.06944 [pdf, other]

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Authors: Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang, Lawrence Carin

Abstract: Reinforcement learning (RL) has been widely studied for improving sequence-generation models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore render model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for dis… ▽ More Reinforcement learning (RL) has been widely studied for improving sequence-generation models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore render model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for distributional semantic matching. To further exploit it, a novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences for enhanced exploration and better semantic matching. Our solution can be understood as approximately executing proximal policy optimization with Wasserstein trust-regions. Experiments on a variety of unconditional and conditional sequence-generation tasks demonstrate the proposed approach consistently leads to improved performance. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: Accepted by AISTATS2020

arXiv:2001.01594 [pdf]

doi 10.1016/j.applthermaleng.2020.115849

Enhancement of nucleate boiling by combining the effects of surface structure and mixed wettability: A lattice Boltzmann study

Authors: W. X. Li, Q. Li, Y. Yu, Z. X. Wen

Abstract: The combination of microstructures and mixed wettability for enhancing nucleate boiling has attracted much attention in recent years. However, in the existing experimental and numerical studies, the tops of microstructures are entirely subjected to wettability modification, which makes the influences of mixed wettability dependant on the characteristic length of microstructures. In order to disclo… ▽ More The combination of microstructures and mixed wettability for enhancing nucleate boiling has attracted much attention in recent years. However, in the existing experimental and numerical studies, the tops of microstructures are entirely subjected to wettability modification, which makes the influences of mixed wettability dependant on the characteristic length of microstructures. In order to disclose the joint effects of surface structure and mixed wettability on nucleate boiling, in this work we propose an improved type of pillar-textured surface with mixed wettability, in which the tops of square pillars are partially subjected to wettability modification. Numerical investigation of the boiling heat transfer performance on the improved mixed-wettability surface is carried out using a three-dimensional thermal multiphase lattice Boltzmann model. The numerical results show that the width of the wettability-modified region plays an important role in the boiling performance of the improved mixed-wettability surface and the best boiling performance is achieved in the situation that the width of the wettability-modified region is sufficiently large but the bubble nucleated on the pillar top still does not interfere with the coalescence-departure mechanism of the bubbles nucleated around the pillar, which optimizes the joint effects of surface structure and mixed wettability for enhancing nucleate boiling. The influences of the shape of the wettability-modified region are also studied. Among the investigated shapes, the square is found to perform better than the other two shapes. △ Less

Submitted 3 May, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: 15 figures

Journal ref: Applied Thermal Engineering 180 (2020) 115849

arXiv:1912.04843 [pdf]

A novel generative reverse net assisted evolution algorithm for expensive-computational optimizations

Authors: Yu Li, Hu Wang, Ziming Wen, Xin Wang

Abstract: Simulation-based optimization is a useful method for practical design problems. However, it is difficult for complicated problems due to expensive-computational costs. A popular way to overcome this issue is to use a surrogate model to save the cost. Nevertheless, limited design parameters those are input to traditional surrogate models can difficultly represent the whole design problem, which mig… ▽ More Simulation-based optimization is a useful method for practical design problems. However, it is difficult for complicated problems due to expensive-computational costs. A popular way to overcome this issue is to use a surrogate model to save the cost. Nevertheless, limited design parameters those are input to traditional surrogate models can difficultly represent the whole design problem, which might result in unexpected errors. In this study, physical cloud images from simulations are employed and attempted to construct the surrogate model. Simultaneously, based on the strong pattern recognition and generation abilities of deep learning models, a novel Generative Reverse Net assisted Evolution Algorithm (GRN-EA) is proposed for expensive-design problems. In this study, a numerical example of a Variable-Stiffness (VS) composite hole-plate is employed to obtain the optimal distribution of the curved fiber. Moreover, to evaluate the performance of GRN-EA in practical engineering problems, a more complex sheet-forming case is optimized. According to the two problems, some expensive-simulations can be well handled in this study. △ Less

Submitted 8 December, 2019; originally announced December 2019.

arXiv:1912.02958 [pdf, other]

Synchronous Transformers for End-to-End Speech Recognition

Authors: Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen

Abstract: For most of the attention-based sequence-to-sequence models, the decoder predicts the output sequence conditioned on the entire input sequence processed by the encoder. The asynchronous problem between the encoding and decoding makes these models difficult to be applied for online speech recognition. In this paper, we propose a model named synchronous transformer to address this problem, which can… ▽ More For most of the attention-based sequence-to-sequence models, the decoder predicts the output sequence conditioned on the entire input sequence processed by the encoder. The asynchronous problem between the encoding and decoding makes these models difficult to be applied for online speech recognition. In this paper, we propose a model named synchronous transformer to address this problem, which can predict the output sequence chunk by chunk. Once a fixed-length chunk of the input sequence is processed by the encoder, the decoder begins to predict symbols immediately. During training, a forward-backward algorithm is introduced to optimize all the possible alignment paths. Our model is evaluated on a Mandarin dataset AISHELL-1. The experiments show that the synchronous transformer is able to perform encoding and decoding synchronously, and achieves a character error rate of 8.91% on the test set. △ Less

Submitted 23 February, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: Accepted by ICASSP 2020

arXiv:1912.01777 [pdf, other]

Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data

Authors: Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang

Abstract: Attention-based encoder-decoder (AED) models have achieved promising performance in speech recognition. However, because of the end-to-end training, an AED model is usually trained with speech-text paired data. It is challenging to incorporate external text-only data into AED models. Another issue of the AED model is that it does not use the right context of a text token while predicting the token… ▽ More Attention-based encoder-decoder (AED) models have achieved promising performance in speech recognition. However, because of the end-to-end training, an AED model is usually trained with speech-text paired data. It is challenging to incorporate external text-only data into AED models. Another issue of the AED model is that it does not use the right context of a text token while predicting the token. To alleviate the above two issues, we propose a unified method called LST (Learn Spelling from Teachers) to integrate knowledge into an AED model from the external text-only data and leverage the whole context in a sentence. The method is divided into two stages. First, in the representation stage, a language model is trained on the text. It can be seen as that the knowledge in the text is compressed into the LM. Then, at the transferring stage, the knowledge is transferred to the AED model via teacher-student learning. To further use the whole context of the text sentence, we propose an LM called causal cloze completer (COR), which estimates the probability of a token, given both the left context and the right context of it. Therefore, with LST training, the AED model can leverage the whole context in the sentence. Different from fusion based methods, which use LM during decoding, the proposed method does not increase any extra complexity at the inference stage. We conduct experiments on two scales of public Chinese datasets AISHELL-1 and AISHELL-2. The experimental results demonstrate the effectiveness of leveraging external text-only data and the whole context in a sentence with our proposed method, compared with baseline hybrid systems and AED model based systems. △ Less

Submitted 15 March, 2021; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: Submitted TASLP

arXiv:1912.01165 [pdf, ps, other]

doi 10.1093/mnras/stz3399

Periodic mode changing in PSR J1048-5832

Authors: W. M. Yan, R. N. Manchester, N. Wang, Z. G. Wen, J. P. Yuan, K. J. Lee, J. L. Chen

Abstract: By analysing the data acquired from the Parkes 64-m radio telescope at 1369 MHz, we report on the phase-stationary non-drift amplitude modulation observed in PSR J1048-5832. The high-sensitivity observations revealed that the central and trailing components of the pulse profile of this pulsar switch between a strong mode and a weak mode periodically. However, the leading component remains unchange… ▽ More By analysing the data acquired from the Parkes 64-m radio telescope at 1369 MHz, we report on the phase-stationary non-drift amplitude modulation observed in PSR J1048-5832. The high-sensitivity observations revealed that the central and trailing components of the pulse profile of this pulsar switch between a strong mode and a weak mode periodically. However, the leading component remains unchanged. Polarization properties of the strong and weak modes are investigated. Considering the similarity to mode changing, we argue that the periodic amplitude modulation in PSR J1048$-$5832 is periodic mode changing. The fluctuation spectral analysis showed that the modulation period is very short (~2.1 s or 17 P1), where P1 is the rotation period of the pulsar. We find that this periodic amplitude modulation is hard to explain by existing models that account for the periodic phenomena in pulsars like subpulse drifting. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 8 pages, 8 figures, 3 tables, accepted in MNRAS for publication

Showing 251–300 of 508 results for author: Wen, Z