Search | arXiv e-print repository

Fully epitaxial fcc(111) magnetic tunnel junctions with a Co90Fe10/MgAlO/Co90Fe10 structure

Authors: Jieyuan Song, Thomas Scheike, Cong He, Zhenchao Wen, Tadakatsu Ohkubo, Kazuhiro Hono, Hiroaki Sukegawa, Seiji Mitani

Abstract: Magnetic tunnel junctions (MTJs) with bcc(001)-type structures such as Fe(001)/MgO(001)/Fe(001), have been widely used as the core of various spintronic devices such as magnetoresistive memories; however, the limited material selection of (001)-type MTJs hinders the further development of spintronic devices. Here, as an alternative to the (001)-type MTJs, an fcc(111)-type MTJ using a fully epitaxi… ▽ More Magnetic tunnel junctions (MTJs) with bcc(001)-type structures such as Fe(001)/MgO(001)/Fe(001), have been widely used as the core of various spintronic devices such as magnetoresistive memories; however, the limited material selection of (001)-type MTJs hinders the further development of spintronic devices. Here, as an alternative to the (001)-type MTJs, an fcc(111)-type MTJ using a fully epitaxial CoFe/rock-salt MgAlO (MAO)/CoFe is explored to introduce close-packed lattice systems into MTJs. Using an atomically flat Ru(0001) epitaxial buffer layer, fcc(111) epitaxial growth of the CoFe/MAO/CoFe trilayer is achieved. Sharp CoFe(111)/MAO(111) interfaces are confirmed due to the introduction of periodic dislocations by forming a 5:6 in-plane lattice matching structure. The fabricated (111) MTJ exhibits a tunnel magnetoresistance ratio of 37% at room temperature (47% at 10 K). Symmetric differential conductance curves with respect to bias polarity are observed, indicating the achievement of nearly identical upper and lower MAO interface qualities. Despite the charge-uncompensated (111) orientation for a rock-salt-like MAO barrier, the achievement of flat, stable, and spin-polarized barrier interfaces opens a promising avenue for expanding the design of MTJ structures. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: 18 pages, 5 figures

arXiv:2307.14024 [pdf, other]

Multi-view Hypergraph Contrastive Policy Learning for Conversational Recommendation

Authors: Sen Zhao, Wei Wei, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen, Dangyang Chen, Feida Zhu

Abstract: Conversational recommendation systems (CRS) aim to interactively acquire user preferences and accordingly recommend items to users. Accurately learning the dynamic user preferences is of crucial importance for CRS. Previous works learn the user preferences with pairwise relations from the interactive conversation and item knowledge, while largely ignoring the fact that factors for a relationship i… ▽ More Conversational recommendation systems (CRS) aim to interactively acquire user preferences and accordingly recommend items to users. Accurately learning the dynamic user preferences is of crucial importance for CRS. Previous works learn the user preferences with pairwise relations from the interactive conversation and item knowledge, while largely ignoring the fact that factors for a relationship in CRS are multiplex. Specifically, the user likes/dislikes the items that satisfy some attributes (Like/Dislike view). Moreover social influence is another important factor that affects user preference towards the item (Social view), while is largely ignored by previous works in CRS. The user preferences from these three views are inherently different but also correlated as a whole. The user preferences from the same views should be more similar than that from different views. The user preferences from Like View should be similar to Social View while different from Dislike View. To this end, we propose a novel model, namely Multi-view Hypergraph Contrastive Policy Learning (MHCPL). Specifically, MHCPL timely chooses useful social information according to the interactive history and builds a dynamic hypergraph with three types of multiplex relations from different views. The multiplex relations in each view are successively connected according to their generation order. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.10230 [pdf, other]

Prompt Tuning on Graph-augmented Low-resource Text Classification

Authors: Zhihao Wen, Yuan Fang

Abstract: Text classification is a fundamental problem in information retrieval with many real-world applications, such as predicting the topics of online articles and the categories of e-commerce product descriptions. However, low-resource text classification, with no or few labeled samples, presents a serious concern for supervised learning. Meanwhile, many text data are inherently grounded on a network s… ▽ More Text classification is a fundamental problem in information retrieval with many real-world applications, such as predicting the topics of online articles and the categories of e-commerce product descriptions. However, low-resource text classification, with no or few labeled samples, presents a serious concern for supervised learning. Meanwhile, many text data are inherently grounded on a network structure, such as a hyperlink/citation network for online articles, and a user-item purchase network for e-commerce products. These graph structures capture rich semantic relationships, which can potentially augment low-resource text classification. In this paper, we propose a novel model called Graph-Grounded Pre-training and Prompting (G2P2) to address low-resource text classification in a two-pronged approach. During pre-training, we propose three graph interaction-based contrastive strategies to jointly pre-train a graph-text model; during downstream classification, we explore handcrafted discrete prompts and continuous prompt tuning for the jointly pre-trained model to achieve zero- and few-shot classification, respectively. Moreover, we explore the possibility of employing continuous prompt tuning for zero-shot inference. Specifically, we aim to generalize continuous prompts to unseen classes while leveraging a set of base classes. To this end, we extend G2P2 into G2P2$^*$, hinging on a new architecture of conditional prompt tuning. Extensive experiments on four real-world datasets demonstrate the strength of G2P2 in zero- and few-shot low-resource text classification tasks, and illustrate the advantage of G2P2$^*$ in dealing with unseen classes. △ Less

Submitted 19 August, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

Comments: 15 pages, accepted by TKDE (IEEE Transactions on Knowledge and Data Engineering). arXiv admin note: substantial text overlap with arXiv:2305.03324

arXiv:2307.08969 [pdf, other]

doi 10.1109/TVCG.2023.3327148

Quantivine: A Visualization Approach for Large-scale Quantum Circuit Representation and Analysis

Authors: Zhen Wen, Yihan Liu, Siwei Tan, Jieyi Chen, Minfeng Zhu, Dongming Han, Jianwei Yin, Mingliang Xu, Wei Chen

Abstract: Quantum computing is a rapidly evolving field that enables exponential speed-up over classical algorithms. At the heart of this revolutionary technology are quantum circuits, which serve as vital tools for implementing, analyzing, and optimizing quantum algorithms. Recent advancements in quantum computing and the increasing capability of quantum devices have led to the development of more complex… ▽ More Quantum computing is a rapidly evolving field that enables exponential speed-up over classical algorithms. At the heart of this revolutionary technology are quantum circuits, which serve as vital tools for implementing, analyzing, and optimizing quantum algorithms. Recent advancements in quantum computing and the increasing capability of quantum devices have led to the development of more complex quantum circuits. However, traditional quantum circuit diagrams suffer from scalability and readability issues, which limit the efficiency of analysis and optimization processes. In this research, we propose a novel visualization approach for large-scale quantum circuits by adopting semantic analysis to facilitate the comprehension of quantum circuits. We first exploit meta-data and semantic information extracted from the underlying code of quantum circuits to create component segmentations and pattern abstractions, allowing for easier wrangling of massive circuit diagrams. We then develop Quantivine, an interactive system for exploring and understanding quantum circuits. A series of novel circuit visualizations are designed to uncover contextual details such as qubit provenance, parallelism, and entanglement. The effectiveness of Quantivine is demonstrated through two usage scenarios of quantum circuits with up to 100 qubits and a formal user evaluation with quantum experts. A free copy of this paper and all supplemental materials are available at https://osf.io/2m9yh/?view_only=0aa1618c97244f5093cd7ce15f1431f9. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE VIS 2023

Journal ref: IEEE Transactions on Visualization and Computer Graphics, 2023

arXiv:2307.08929 [pdf, other]

Active learning of effective Hamiltonian for super-large-scale atomic structures

Authors: Xingyue Ma, Hongying Chen, Ri He, Zhanbo Yu, Sergei Prokhorenko, Zheng Wen, Zhicheng Zhong, Jorge Iñiguez, L. Bellaiche, Di Wu, Yurong Yang

Abstract: The first-principles-based effective Hamiltonian scheme provides one of the most accurate modeling technique for large-scale structures, especially for ferroelectrics. However, the parameterization of the effective Hamiltonian is complicated and can be difficult for some complex systems such as high-entropy perovskites. Here, we propose a general form of effective Hamiltonian and develop an active… ▽ More The first-principles-based effective Hamiltonian scheme provides one of the most accurate modeling technique for large-scale structures, especially for ferroelectrics. However, the parameterization of the effective Hamiltonian is complicated and can be difficult for some complex systems such as high-entropy perovskites. Here, we propose a general form of effective Hamiltonian and develop an active machine learning approach to parameterize the effective Hamiltonian based on Bayesian linear regression. The parameterization is employed in molecular dynamics simulations with the prediction of energy, forces, stress and their uncertainties at each step, which decides whether first-principles calculations are executed to retrain the parameters. Structures of BaTiO$_3$, Pb(Zr$_{0.75}$Ti$_{0.25}$)O$_3$ and (Pb,Sr)TiO$_3$ system are taken as examples to show the accuracy of this approach, as compared with conventional parametrization method and experiments. This machine learning approach provides a universal and automatic way to compute the effective Hamiltonian parameters for any considered complex systems with super-large-scale (more than $10^7$ atoms) atomic structures. △ Less

Submitted 14 May, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 11 pages, 4 figures

arXiv:2307.08699 [pdf, other]

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

Authors: Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu

Abstract: Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes. Compared to SGG, PSG has several challenging problems: pixel-level segment outputs and full relationship exploration (It also considers thing and stuff relation). Thus, current PSG methods have limited per… ▽ More Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes. Compared to SGG, PSG has several challenging problems: pixel-level segment outputs and full relationship exploration (It also considers thing and stuff relation). Thus, current PSG methods have limited performance, which hinders downstream tasks or applications. The goal of this work aims to design a novel and strong baseline for PSG. To achieve that, we first conduct an in-depth analysis to identify the bottleneck of the current PSG models, finding that inter-object pair-wise recall is a crucial factor that was ignored by previous PSG methods. Based on this and the recent query-based frameworks, we present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects. Moreover, we also observed the sparse nature of object pairs for both Motivated by this, we design a lightweight Matrix Learner within the PPN, which directly learns pair-wised relationships for pair proposal generation. Through extensive ablation and analysis, our approach significantly improves upon leveraging the segmenter solid baseline. Notably, our method achieves over 10\% absolute gains compared to our baseline, PSGFormer. The code of this paper is publicly available at https://github.com/king159/Pair-Net. △ Less

Submitted 9 August, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: IEEE TPAMI 2024. 13 pages. Project Page: https://github.com/king159/Pair-Net

arXiv:2307.05074 [pdf, other]

Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain

Authors: Chunxi Guo, Zhiliang Tian, Jintao Tang, Shasha Li, Zhihua Wen, Kaixuan Wang, Ting Wang

Abstract: Text-to-SQL aims at generating SQL queries for the given natural language questions and thus helping users to query databases. Prompt learning with large language models (LLMs) has emerged as a recent approach, which designs prompts to lead LLMs to understand the input question and generate the corresponding SQL. However, it faces challenges with strict SQL syntax requirements. Existing work promp… ▽ More Text-to-SQL aims at generating SQL queries for the given natural language questions and thus helping users to query databases. Prompt learning with large language models (LLMs) has emerged as a recent approach, which designs prompts to lead LLMs to understand the input question and generate the corresponding SQL. However, it faces challenges with strict SQL syntax requirements. Existing work prompts the LLMs with a list of demonstration examples (i.e. question-SQL pairs) to generate SQL, but the fixed prompts can hardly handle the scenario where the semantic gap between the retrieved demonstration and the input question is large. In this paper, we propose a retrieval-augmented prompting method for a LLM-based Text-to-SQL framework, involving sample-aware prompting and a dynamic revision chain. Our approach incorporates sample-aware demonstrations, which include the composition of SQL operators and fine-grained information related to the given question. To retrieve questions sharing similar intents with input questions, we propose two strategies for assisting retrieval. Firstly, we leverage LLMs to simplify the original questions, unifying the syntax and thereby clarifying the users' intentions. To generate executable and accurate SQLs without human intervention, we design a dynamic revision chain which iteratively adapts fine-grained feedback from the previously generated SQL. Experimental results on three Text-to-SQL benchmarks demonstrate the superiority of our method over strong baseline models. △ Less

Submitted 4 September, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

arXiv:2307.02046 [pdf, other]

Recommender Systems in the Era of Large Language Models (LLMs)

Authors: Zihuai Zhao, Wenqi Fan, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang, Zhen Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, Qing Li

Abstract: With the prosperity of e-commerce and web applications, Recommender Systems (RecSys) have become an important component of our daily life, providing personalized suggestions that cater to user preferences. While Deep Neural Networks (DNNs) have made significant advancements in enhancing recommender systems by modeling user-item interactions and incorporating textual side information, DNN-based met… ▽ More With the prosperity of e-commerce and web applications, Recommender Systems (RecSys) have become an important component of our daily life, providing personalized suggestions that cater to user preferences. While Deep Neural Networks (DNNs) have made significant advancements in enhancing recommender systems by modeling user-item interactions and incorporating textual side information, DNN-based methods still face limitations, such as difficulties in understanding users' interests and capturing textual side information, inabilities in generalizing to various recommendation scenarios and reasoning on their predictions, etc. Meanwhile, the emergence of Large Language Models (LLMs), such as ChatGPT and GPT4, has revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI), due to their remarkable abilities in fundamental responsibilities of language understanding and generation, as well as impressive generalization and reasoning capabilities. As a result, recent studies have attempted to harness the power of LLMs to enhance recommender systems. Given the rapid evolution of this research direction in recommender systems, there is a pressing need for a systematic overview that summarizes existing LLM-empowered recommender systems, to provide researchers in relevant fields with an in-depth understanding. Therefore, in this paper, we conduct a comprehensive review of LLM-empowered recommender systems from various aspects including Pre-training, Fine-tuning, and Prompting. More specifically, we first introduce representative methods to harness the power of LLMs (as a feature encoder) for learning representations of users and items. Then, we review recent techniques of LLMs for enhancing recommender systems from three paradigms, namely pre-training, fine-tuning, and prompting. Finally, we comprehensively discuss future directions in this emerging field. △ Less

Submitted 29 April, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE TKDE

arXiv:2307.00783 [pdf, other]

Monte Carlo Policy Gradient Method for Binary Optimization

Authors: Cheng Chen, Ruitao Chen, Tianyou Li, Ruichen Ao, Zaiwen Wen

Abstract: Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized… ▽ More Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the gradient efficiently. We further develop a filter scheme to replace the original objective function by the one with the local search technique to broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality for MCMC. Numerical results show that this framework is very promising to provide near-optimal solutions for quite a few binary optimization problems. △ Less

Submitted 3 July, 2023; originally announced July 2023.

MSC Class: 90C09; 90C27; 90C59; 60J45; 60J20

arXiv:2307.00731 [pdf, other]

doi 10.1088/1674-4527/ace179

Reciprocating Magnetic Fields in the Pulsar Wind Observed from the Black Widow Pulsar J1720-0534

Authors: Chen-Chen Miao, Victoria Blackmon, Wei-Wei Zhu, Dong-Zi Li, Mingyu Ge, Xiao-Peng You, Maura McLaughlin, Di Li, Na Wang, Pei Wang, Jia-Rui Niu, M. Cruces, Jian-Ping Yuan, Jun-Tao Bai, D. J. Champion, Yu-Tong Chen, Ming-Min Chi, P. C. C. Freire, Yi Feng, Zhen-Ye Gan, M. Kramer, Fei-Fei Kou, Yu-Xi Li, Xue-Li Miao, Ling-Qi Meng , et al. (19 additional authors not shown)

Abstract: We report the radio observations of the eclipsing black widow pulsar J1720-0534, a 3.26 ms pulsar in orbit with a low mass companion of mass 0.029 to 0.034 M$_{\odot}$. We obtain the phase-connected timing ephemeris and polarization profile of this millisecond pulsar (MSP) using the Five-hundred-meter Aperture Spherical Radio Telescope (FAST), the Green Bank Telescope (GBT), and the Parkes Telesco… ▽ More We report the radio observations of the eclipsing black widow pulsar J1720-0534, a 3.26 ms pulsar in orbit with a low mass companion of mass 0.029 to 0.034 M$_{\odot}$. We obtain the phase-connected timing ephemeris and polarization profile of this millisecond pulsar (MSP) using the Five-hundred-meter Aperture Spherical Radio Telescope (FAST), the Green Bank Telescope (GBT), and the Parkes Telescope. For the first time from such a system, an oscillatory polarisation angle change was observed from a particular eclipse egress with partial depolarization, indicating 10-milliGauss-level reciprocating magnetic fields oscillating in a length scale of 5000 km (assuming an orbital inclination angle of 90 degrees) outside the companion's magnetosphere. The dispersion measure variation observed during the ingresses and egresses shows the rapid raising of the electron density in the shock boundary between the companion's magnetosphere and the surrounding pulsar wind. We suggest that the observed oscillatory magnetic fields originate from the pulsar wind outside the companion's magnetosphere. △ Less

Submitted 28 August, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: 15 pages, 8 figures, 1 table, accepted by RAA

arXiv:2307.00358 [pdf, ps, other]

The Error in Multivariate Linear Extrapolation with Applications to Derivative-Free Optimization

Authors: Liyuan Cao, Zaiwen Wen, Ya-xiang Yuan

Abstract: We study in this paper the function approximation error of multivariate linear extrapolation. While the sharp error bound of linear interpolation already exists in the literature, linear extrapolation is used far more often in applications such as derivative-free optimization, and its error is not well-studied. A method to numerically compute the sharp error bound is introduced, and several analyt… ▽ More We study in this paper the function approximation error of multivariate linear extrapolation. While the sharp error bound of linear interpolation already exists in the literature, linear extrapolation is used far more often in applications such as derivative-free optimization, and its error is not well-studied. A method to numerically compute the sharp error bound is introduced, and several analytical bounds are presented along with the conditions under which they are sharp. The approximation error achievable by quadratic functions and the error bound for the bivariate case are analyzed in depth. Additionally, we provide the convergence theories regarding the simplex derivative-free optimization method as a demonstration of the utility of the derived bounds. All results are under the assumptions that the function being interpolated has Lipschitz continuous gradient and is interpolated on an affinely independent sample set. △ Less

Submitted 5 July, 2024; v1 submitted 1 July, 2023; originally announced July 2023.

Comments: 28 pages, 5 figures. arXiv admin note: text overlap with arXiv:2209.12606

arXiv:2306.15401 [pdf, other]

Explainable Multimodal Emotion Recognition

Authors: Zheng Lian, Haiyang Sun, Licai Sun, Hao Gu, Zhuofan Wen, Siyuan Zhang, Shun Chen, Mingyu Xu, Ke Xu, Kang Chen, Lan Chen, Shan Liang, Ya Li, Jiangyan Yi, Bin Liu, Jianhua Tao

Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence, whose main goal is to integrate multimodal clues to identify human emotional states. Current works generally assume accurate labels for benchmark datasets and focus on developing more effective architectures. However, emotion annotation relies on subjective judgment. To obtain more reliable labels, existing d… ▽ More Multimodal emotion recognition is an important research topic in artificial intelligence, whose main goal is to integrate multimodal clues to identify human emotional states. Current works generally assume accurate labels for benchmark datasets and focus on developing more effective architectures. However, emotion annotation relies on subjective judgment. To obtain more reliable labels, existing datasets usually restrict the label space to some basic categories, then hire plenty of annotators and use majority voting to select the most likely label. However, this process may result in some correct but non-candidate or non-majority labels being ignored. To ensure reliability without ignoring subtle emotions, we propose a new task called ``Explainable Multimodal Emotion Recognition (EMER)''. Unlike traditional emotion recognition, EMER takes a step further by providing explanations for these predictions. Through this task, we can extract relatively reliable labels since each label has a certain basis. Meanwhile, we borrow large language models (LLMs) to disambiguate unimodal clues and generate more complete multimodal explanations. From them, we can extract richer emotions in an open-vocabulary manner. This paper presents our initial attempt at this task, including introducing a new dataset, establishing baselines, and defining evaluation metrics. In addition, EMER can serve as a benchmark task to evaluate the audio-video-text understanding performance of multimodal LLMs. △ Less

Submitted 23 May, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.14112 [pdf, other]

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Authors: Zhoufutu Wen, Xinyu Zhao, Zhipeng Jin, Yi Yang, Wei Jia, Xiaodong Chen, Shuanglong Li, Lin Liu

Abstract: In the multimedia era, image is an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from limited and inc… ▽ More In the multimedia era, image is an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from limited and inconsistent data, and insufficient cross-modal interaction. Also, the separate optimization of retrieval and relevance models affects overall performance. To address this issue, we propose a vision-language framework consisting of two parts. First, we train a base model on large-scale image-text pairs to learn general multimodal representation. Then, we fine-tune the base model on advertising business data, unifying relevance modeling and retrieval through multi-objective learning. Our framework has been implemented in Baidu search advertising system "Phoneix Nest". Online evaluation shows that it improves cost per mille (CPM) and click-through rate (CTR) by 1.04% and 1.865%. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 6 pages, 3 figures, accepted to SIRIP 2023

arXiv:2306.10508 [pdf, other]

QCNeXt: A Next-Generation Framework For Joint Multi-Agent Trajectory Prediction

Authors: Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, Yu-Kai Huang

Abstract: Estimating the joint distribution of on-road agents' future trajectories is essential for autonomous driving. In this technical report, we propose a next-generation framework for joint multi-agent trajectory prediction called QCNeXt. First, we adopt the query-centric encoding paradigm for the task of joint multi-agent trajectory prediction. Powered by this encoding scheme, our scene encoder is equ… ▽ More Estimating the joint distribution of on-road agents' future trajectories is essential for autonomous driving. In this technical report, we propose a next-generation framework for joint multi-agent trajectory prediction called QCNeXt. First, we adopt the query-centric encoding paradigm for the task of joint multi-agent trajectory prediction. Powered by this encoding scheme, our scene encoder is equipped with permutation equivariance on the set elements, roto-translation invariance in the space dimension, and translation invariance in the time dimension. These invariance properties not only enable accurate multi-agent forecasting fundamentally but also empower the encoder with the capability of streaming processing. Second, we propose a multi-agent DETR-like decoder, which facilitates joint multi-agent trajectory prediction by modeling agents' interactions at future time steps. For the first time, we show that a joint prediction model can outperform marginal prediction models even on the marginal metrics, which opens up new research opportunities in trajectory prediction. Our approach ranks 1st on the Argoverse 2 multi-agent motion forecasting benchmark, winning the championship of the Argoverse Challenge at the CVPR 2023 Workshop on Autonomous Driving. △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: Technical report for the 1st place solution of the Argoverse 2 Multi-Agent Motion Forecasting Competition at the CVPR 2023 Workshop on Autonomous Driving

arXiv:2306.05118 [pdf, other]

doi 10.1145/3580305.3599796

Controllable Multi-Objective Re-ranking with Policy Hypernetworks

Authors: Sirui Chen, Yuan Wang, Zijing Wen, Zhiyu Li, Changshuo Zhang, Xiao Zhang, Quan Lin, Cheng Zhu, Jun Xu

Abstract: Multi-stage ranking pipelines have become widely used strategies in modern recommender systems, where the final stage aims to return a ranked list of items that balances a number of requirements such as user preference, diversity, novelty etc. Linear scalarization is arguably the most widely used technique to merge multiple requirements into one optimization objective, by summing up the requiremen… ▽ More Multi-stage ranking pipelines have become widely used strategies in modern recommender systems, where the final stage aims to return a ranked list of items that balances a number of requirements such as user preference, diversity, novelty etc. Linear scalarization is arguably the most widely used technique to merge multiple requirements into one optimization objective, by summing up the requirements with certain preference weights. Existing final-stage ranking methods often adopt a static model where the preference weights are determined during offline training and kept unchanged during online serving. Whenever a modification of the preference weights is needed, the model has to be re-trained, which is time and resources inefficient. Meanwhile, the most appropriate weights may vary greatly for different groups of targeting users or at different time periods (e.g., during holiday promotions). In this paper, we propose a framework called controllable multi-objective re-ranking (CMR) which incorporates a hypernetwork to generate parameters for a re-ranking model according to different preference weights. In this way, CMR is enabled to adapt the preference weights according to the environment changes in an online manner, without retraining the models. Moreover, we classify practical business-oriented tasks into four main categories and seamlessly incorporate them in a new proposed re-ranking model based on an Actor-Evaluator framework, which serves as a reliable real-world testbed for CMR. Offline experiments based on the dataset collected from Taobao App showed that CMR improved several popular re-ranking models by using them as underlying models. Online A/B tests also demonstrated the effectiveness and trustworthiness of CMR. △ Less

Submitted 17 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.04187 [pdf, other]

doi 10.18653/v1/2023.findings-acl.671

Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals

Authors: Hongru Liang, Jia Liu, Weihong Du, Dingnan Jin, Wenqiang Lei, Zujie Wen, Jiancheng Lv

Abstract: The machine reading comprehension (MRC) of user manuals has huge potential in customer service. However, current methods have trouble answering complex questions. Therefore, we introduce the Knowing-how & Knowing-that task that requires the model to answer factoid-style, procedure-style, and inconsistent questions about user manuals. We resolve this task by jointly representing the steps and facts… ▽ More The machine reading comprehension (MRC) of user manuals has huge potential in customer service. However, current methods have trouble answering complex questions. Therefore, we introduce the Knowing-how & Knowing-that task that requires the model to answer factoid-style, procedure-style, and inconsistent questions about user manuals. We resolve this task by jointly representing the steps and facts in a graph TARA, which supports a unified inference of various questions. Towards a systematical benchmarking study, we design a heuristic method to automatically parse user manuals into TARAs and build an annotated dataset to test the model's ability in answering real-world questions. Empirical results demonstrate that representing user manuals as TARAs is a desired solution for the MRC of user manuals. An in-depth investigation of TARA further sheds light on the issues and broader impacts of future representations of user manuals. We hope our work can move the MRC of user manuals to a more complex and realistic stage. △ Less

Submitted 8 August, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Journal ref: Findings of the Association for Computational Linguistics: ACL 2023. (2023)

arXiv:2306.04099 [pdf, other]

NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating True Coverage

Authors: Ziting Wen, Oscar Pizarro, Stefan Williams

Abstract: High annotation cost for training machine learning classifiers has driven extensive research in active learning and self-supervised learning. Recent research has shown that in the context of supervised learning different active learning strategies need to be applied at various stages of the training process to ensure improved performance over the random baseline. We refer to the point where the nu… ▽ More High annotation cost for training machine learning classifiers has driven extensive research in active learning and self-supervised learning. Recent research has shown that in the context of supervised learning different active learning strategies need to be applied at various stages of the training process to ensure improved performance over the random baseline. We refer to the point where the number of available annotations changes the suitable active learning strategy as the phase transition point. In this paper, we establish that when combining active learning with self-supervised models to achieve improved performance, the phase transition point occurs earlier. It becomes challenging to determine which strategy should be used for previously unseen datasets. We argue that existing active learning algorithms are heavily influenced by the phase transition because the empirical risk over the entire active learning pool estimated by these algorithms is inaccurate and influenced by the number of labeled samples. To address this issue, we propose a novel active learning strategy, neural tangent kernel clustering-pseudo-labels (NTKCPL). It estimates empirical risk based on pseudo-labels and the model prediction with NTK approximation. We analyze the factors affecting this approximation error and design a pseudo-label clustering generation method to reduce the approximation error. We validate our method on five datasets, empirically demonstrating that it outperforms the baseline methods in most cases and is valid over a wider range of training budgets. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.20068 [pdf, other]

TOFG: A Unified and Fine-Grained Environment Representation in Autonomous Driving

Authors: Zihao Wen, Yifan Zhang, Xinhong Chen, Jianping Wang

Abstract: In autonomous driving, an accurate understanding of environment, e.g., the vehicle-to-vehicle and vehicle-to-lane interactions, plays a critical role in many driving tasks such as trajectory prediction and motion planning. Environment information comes from high-definition (HD) map and historical trajectories of vehicles. Due to the heterogeneity of the map data and trajectory data, many data-driv… ▽ More In autonomous driving, an accurate understanding of environment, e.g., the vehicle-to-vehicle and vehicle-to-lane interactions, plays a critical role in many driving tasks such as trajectory prediction and motion planning. Environment information comes from high-definition (HD) map and historical trajectories of vehicles. Due to the heterogeneity of the map data and trajectory data, many data-driven models for trajectory prediction and motion planning extract vehicle-to-vehicle and vehicle-to-lane interactions in a separate and sequential manner. However, such a manner may capture biased interpretation of interactions, causing lower prediction and planning accuracy. Moreover, separate extraction leads to a complicated model structure and hence the overall efficiency and scalability are sacrificed. To address the above issues, we propose an environment representation, Temporal Occupancy Flow Graph (TOFG). Specifically, the occupancy flow-based representation unifies the map information and vehicle trajectories into a homogeneous data format and enables a consistent prediction. The temporal dependencies among vehicles can help capture the change of occupancy flow timely to further promote model performance. To demonstrate that TOFG is capable of simplifying the model architecture, we incorporate TOFG with a simple graph attention (GAT) based neural network and propose TOFG-GAT, which can be used for both trajectory prediction and motion planning. Experiment results show that TOFG-GAT achieves better or competitive performance than all the SOTA baselines with less training time. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: Accepted by ICRA 2023

arXiv:2305.13774 [pdf, other]

ADD 2023: the Second Audio Deepfake Detection Challenge

Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.10011 [pdf]

Super-Resolution Imaging via Angular Magnification

Authors: Yi Zhou, Dingpeng Liao, Kun Zhang, Zijie Ma, Shikai Wu, Jun Ma, Xuemei Dai, Zhengguo Shang, Zhongquan Wen, Gang Chen

Abstract: The far-field resolution of optical imaging systems is restricted by the Abbe diffraction limit, a direct result of the wave nature of light. One successful technological approach to circumventing this limit is to reduce the effective size of a point-spread-function. In the past decades, great endeavors have been made to engineer an effective point-spread-function by exploiting different mechanism… ▽ More The far-field resolution of optical imaging systems is restricted by the Abbe diffraction limit, a direct result of the wave nature of light. One successful technological approach to circumventing this limit is to reduce the effective size of a point-spread-function. In the past decades, great endeavors have been made to engineer an effective point-spread-function by exploiting different mechanisms, including optical nonlinearities and structured light illumination. However, these methods are hard to be applied to objects in a far distance. Here, we propose a new way to achieve super-resolution in a far field by utilizing angular magnification. We present the first proof-of-concept demonstration of such an idea and demonstrate a new class of lenses with angular magnification for far-field super-resolution imaging. Both theoretical and experimental results demonstrate a more than two-fold enhancement beyond the angular-resolution limit in the far-field imaging. The proposed approach can be applied to super-resolution imaging of objects in far distance. It has promising potential applications in super-resolution telescopes and remote sensing. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.05250 [pdf, ps, other]

doi 10.1093/mnras/stad1426

More relaxed intracluster gas than galaxies in clusters in quasi-equilibrium

Authors: Z. S. Yuan, J. L. Han, H. Böhringer, Z. L. Wen, G. Chon

Abstract: During cluster mergers, the intracluster gas and member galaxies undergo dynamic evolution, but at different timescales and reach different states. We collect 24 galaxy clusters in quasi-equilibrium state as indicated by the X-ray image, and calculate the cluster orientations and three kinds of dynamical parameters, i.e., the normalized centroid offset, the sphere index and the ellipticity, for th… ▽ More During cluster mergers, the intracluster gas and member galaxies undergo dynamic evolution, but at different timescales and reach different states. We collect 24 galaxy clusters in quasi-equilibrium state as indicated by the X-ray image, and calculate the cluster orientations and three kinds of dynamical parameters, i.e., the normalized centroid offset, the sphere index and the ellipticity, for these clusters from the distributions of member galaxies and also the intracluster gas. We find consistent alignments for the orientations estimated from the two components. However, the three kinds of dynamical parameters indicated by member galaxies are systematically larger than those derived from the gas component, suggesting that the gas component is more relaxed than member galaxies. Differences of dynamical features between the intracluster gas and member galaxies are independent of cluster mass and concentration. We conclude that the intracluster gas reaches the dynamic equilibrium state earlier than the almost collisionless member galaxies. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 9 pages, 5 figures, 1 table, accepted for publication in MNRAS

arXiv:2305.03324 [pdf, other]

doi 10.1145/3539618.3591641

Augmenting Low-Resource Text Classification with Graph-Grounded Pre-training and Prompting

Authors: Zhihao Wen, Yuan Fang

Abstract: Text classification is a fundamental problem in information retrieval with many real-world applications, such as predicting the topics of online articles and the categories of e-commerce product descriptions. However, low-resource text classification, with few or no labeled samples, poses a serious concern for supervised learning. Meanwhile, many text data are inherently grounded on a network stru… ▽ More Text classification is a fundamental problem in information retrieval with many real-world applications, such as predicting the topics of online articles and the categories of e-commerce product descriptions. However, low-resource text classification, with few or no labeled samples, poses a serious concern for supervised learning. Meanwhile, many text data are inherently grounded on a network structure, such as a hyperlink/citation network for online articles, and a user-item purchase network for e-commerce products. These graph structures capture rich semantic relationships, which can potentially augment low-resource text classification. In this paper, we propose a novel model called Graph-Grounded Pre-training and Prompting (G2P2) to address low-resource text classification in a two-pronged approach. During pre-training, we propose three graph interaction-based contrastive strategies to jointly pre-train a graph-text model; during downstream classification, we explore prompting for the jointly pre-trained model to achieve low-resource classification. Extensive experiments on four real-world datasets demonstrate the strength of G2P2 in zero- and few-shot low-resource text classification tasks. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 11 pages, accepted by SIGIR'23

arXiv:2305.02774 [pdf, other]

Spatial and Modal Optimal Transport for Fast Cross-Modal MRI Reconstruction

Authors: Qi Wang, Zhijie Wen, Jun Shi, Qian Wang, Dinggang Shen, Shihui Ying

Abstract: Multi-modal magnetic resonance imaging (MRI) plays a crucial role in comprehensive disease diagnosis in clinical medicine. However, acquiring certain modalities, such as T2-weighted images (T2WIs), is time-consuming and prone to be with motion artifacts. It negatively impacts subsequent multi-modal image analysis. To address this issue, we propose an end-to-end deep learning framework that utilize… ▽ More Multi-modal magnetic resonance imaging (MRI) plays a crucial role in comprehensive disease diagnosis in clinical medicine. However, acquiring certain modalities, such as T2-weighted images (T2WIs), is time-consuming and prone to be with motion artifacts. It negatively impacts subsequent multi-modal image analysis. To address this issue, we propose an end-to-end deep learning framework that utilizes T1-weighted images (T1WIs) as auxiliary modalities to expedite T2WIs' acquisitions. While image pre-processing is capable of mitigating misalignment, improper parameter selection leads to adverse pre-processing effects, requiring iterative experimentation and adjustment. To overcome this shortage, we employ Optimal Transport (OT) to synthesize T2WIs by aligning T1WIs and performing cross-modal synthesis, effectively mitigating spatial misalignment effects. Furthermore, we adopt an alternating iteration framework between the reconstruction task and the cross-modal synthesis task to optimize the final results. Then, we prove that the reconstructed T2WIs and the synthetic T2WIs become closer on the T2 image manifold with iterations increasing, and further illustrate that the improved reconstruction result enhances the synthesis process, whereas the enhanced synthesis result improves the reconstruction process. Finally, experimental results from FastMRI and internal datasets confirm the effectiveness of our method, demonstrating significant improvements in image reconstruction quality even at low sampling rates. △ Less

Submitted 21 May, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2305.02575 [pdf, other]

Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning

Authors: Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen

Abstract: Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the… ▽ More Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) that satisfy user preferences and give feedback to estimate the effectiveness of the director's option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director's option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods. △ Less

Submitted 26 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Journal ref: THE 32nd INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI2023)

arXiv:2304.13400 [pdf]

doi 10.1021/acs.nanolett.3c03085

Observation of Fluctuation Spin Hall Effect in Antiferromagnet

Authors: Chi Fang, Caihua Wan, Xiaoyue Zhang, Satoshi Okamoto, Tianyi Ma, Jianying Qin, Xiao Wang, Chenyang Guo, Jing Dong, Guoqiang Yu, Zhenchao Wen, Ning Tang, Stuart S. P. Parkin, Naoto Nagaosa, Yuan Lu, Xiufeng Han

Abstract: The spin Hall effect (SHE) can generate a pure spin current by an electric current, which is promisingly used to electrically control magnetization. To reduce power consumption of this control, a giant spin Hall angle (SHA) in the SHE is desired in low-resistivity systems for practical applications. Here, critical spin fluctuation near the antiferromagnetic (AFM) phase-transition is proved as an e… ▽ More The spin Hall effect (SHE) can generate a pure spin current by an electric current, which is promisingly used to electrically control magnetization. To reduce power consumption of this control, a giant spin Hall angle (SHA) in the SHE is desired in low-resistivity systems for practical applications. Here, critical spin fluctuation near the antiferromagnetic (AFM) phase-transition is proved as an effective mechanism to create an additional part of SHE, named as fluctuation spin Hall effect (FSHE). This FSHE enhances the SHA due to the AFM spin fluctuation between conduction electrons and local spins. We detect the FSHE with the inverse and direct spin Hall effect (ISHE and DSHE) set-up and their temperature (T) dependences in the Cr/MgO/Fe magnetic tunnel junctions (MTJs). The SHA is significantly enhanced when temperature is approached to the Néel temperature (T_N) and has a peak value of -0.34 at 200 K near T_N. This value is higher than the room-temperature value by 240% and comparable to that of heavy metals Ta and W. Furthermore, the spin Hall resistivity of Cr well fits the modeled T-dependence when T approaches T_N from low temperatures, implying the AFM spin fluctuation nature of strong SHA enhancement. Thus, this study demonstrates the critical spin fluctuation as a prospective way of increasing SHA and enriches the AFM material candidates for spin-orbitronic devices. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 27 pages, 9 figures

arXiv:2304.13301 [pdf, other]

Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval

Authors: Chunxi Guo, Zhiliang Tian, Jintao Tang, Pancheng Wang, Zhihua Wen, Kang Yang, Ting Wang

Abstract: Text-to-SQL is a task that converts a natural language question into a structured query language (SQL) to retrieve information from a database. Large language models (LLMs) work well in natural language generation tasks, but they are not specifically pre-trained to understand the syntax and semantics of SQL commands. In this paper, we propose an LLM-based framework for Text-to-SQL which retrieves… ▽ More Text-to-SQL is a task that converts a natural language question into a structured query language (SQL) to retrieve information from a database. Large language models (LLMs) work well in natural language generation tasks, but they are not specifically pre-trained to understand the syntax and semantics of SQL commands. In this paper, we propose an LLM-based framework for Text-to-SQL which retrieves helpful demonstration examples to prompt LLMs. However, questions with different database schemes can vary widely, even if the intentions behind them are similar and the corresponding SQL queries exhibit similarities. Consequently, it becomes crucial to identify the appropriate SQL demonstrations that align with our requirements. We design a de-semanticization mechanism that extracts question skeletons, allowing us to retrieve similar examples based on their structural similarity. We also model the relationships between question tokens and database schema items (i.e., tables and columns) to filter out scheme-related information. Our framework adapts the range of the database schema in prompts to balance length and valuable information. A fallback mechanism allows for a more detailed schema to be provided if the generated SQL query fails. Ours outperforms state-of-the-art models and demonstrates strong generalization ability on three cross-domain Text-to-SQL benchmarks. △ Less

Submitted 31 August, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.08005 [pdf]

doi 10.1021/acsphotonics.3c00755

Ultra-high-speed coherent anti-Stokes Raman spectroscopy with a hybrid dual-comb source

Authors: Tianjian Lv, Bing Han, Ming Yan, Zhaoyang Wen, Kun Huang, Kangwen Yang, Heping Zeng

Abstract: Coherent anti-Stokes Raman scattering (CARS) spectroscopy with time-delayed ultrashort pulses and a single-pixel photodetector has shown great potential for spectroscopic imaging and transient studies in chemistry and biological research. However, those systems rely on mechanical delay lines or two asynchronous optical combs with inflexible repetition frequencies, technically limiting their acquis… ▽ More Coherent anti-Stokes Raman scattering (CARS) spectroscopy with time-delayed ultrashort pulses and a single-pixel photodetector has shown great potential for spectroscopic imaging and transient studies in chemistry and biological research. However, those systems rely on mechanical delay lines or two asynchronous optical combs with inflexible repetition frequencies, technically limiting their acquisition speeds. Here, we demonstrate a hybrid dual-comb CARS system involving a broadband fiber laser and a highly-flexible, frequency-modulated electro-optic comb. We achieve multiplex CARS spectra (2800-3200 cm-1), with a moderate resolution (22 cm-1), at a maximum refresh rate of 1 MHz, limited by the radio-frequency synthesizer we use. Fast spectroscopic CARS imaging is demonstrated for liquid mixtures. Our system enables spectral measurements in the high-wavenumber C-H stretching region at a record speed that is an order of magnitude higher than state-of-the-art systems, which may open up new opportunities for fast chemical sensing and imaging. △ Less

Submitted 3 December, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: 22 pages, 6 figures

arXiv:2304.05007 [pdf, other]

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

Authors: Shaowei Wang, Yun Peng, Jin Li, Zikai Wen, Zhipeng Li, Shiyu Yu, Di Wang, Wei Yang

Abstract: The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address this issue, we propose the \emph{variation-ratio reduction} as a comprehensive framework for privacy amplification in both single-message and mult… ▽ More The shuffle model of differential privacy provides promising privacy-utility balances in decentralized, privacy-preserving data analysis. However, the current analyses of privacy amplification via shuffling lack both tightness and generality. To address this issue, we propose the \emph{variation-ratio reduction} as a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols. It leverages two new parameterizations: the total variation bounds of local messages and the probability ratio bounds of blanket messages, to determine indistinguishability levels. Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design, where our bounds are exactly tight. Additionally, variation-ratio reduction complements parallel composition in the shuffle model, yielding enhanced privacy accounting for popular sampling-based randomizers employed in statistical queries (e.g., range queries, marginal queries, and frequent itemset mining). Empirical findings demonstrate that our numerical amplification bounds surpass existing ones, conserving up to $30\%$ of the budget for single-message protocols, $75\%$ for multi-message ones, and a striking $75\%$-$95\%$ for parallel composition. Our bounds also result in a remarkably efficient $\tilde{O}(n)$ algorithm that numerically amplifies privacy in less than $10$ seconds for $n=10^8$ users. △ Less

Submitted 28 July, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: To appear in VLDB 2024. Code available at https://github.com/wangsw/PrivacyAmplification

arXiv:2303.11369 [pdf, other]

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Authors: Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen

Abstract: In this paper, we address the following problem: Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs. We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset. Its… ▽ More In this paper, we address the following problem: Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs. We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset. Its cumulative Bayesian regret goes down to zero exponentially fast in N, the offline dataset size if the expert is competent enough. Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning. Our empirical results show that the proposed iRLSVI algorithm is able to achieve significant reduction in regret as compared to two baselines: no offline data, and offline dataset but used without information about the generative policy. Our algorithm bridges online RL and imitation learning for the first time. △ Less

Submitted 16 July, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: Alphabetical order. Corresponding to Rahul Jain

arXiv:2303.10599 [pdf, ps, other]

Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators

Authors: Tianyou Li, Fan Chen, Huajie Chen, Zaiwen Wen

Abstract: Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational i… ▽ More Understanding stochastic gradient descent (SGD) and its variants is essential for machine learning. However, most of the preceding analyses are conducted under amenable conditions such as unbiased gradient estimator and bounded objective functions, which does not encompass many sophisticated applications, such as variational Monte Carlo, entropy-regularized reinforcement learning and variational inference. In this paper, we consider the SGD algorithm that employ the Markov Chain Monte Carlo (MCMC) estimator to compute the gradient, called MCMC-SGD. Since MCMC reduces the sampling complexity significantly, it is an asymptotically convergent biased estimator in practice. Moreover, by incorporating a general class of unbounded functions, it is much more difficult to analyze the MCMC sampling error. Therefore, we assume that the function is sub-exponential and use the Bernstein inequality for non-stationary Markov chains to derive error bounds of the MCMC estimator. Consequently, MCMC-SGD is proven to have a first order convergence rate $O(\log K/\sqrt{n K})$ with $K$ iterations and a sample size $n$. It partially explains how MCMC influences the behavior of SGD. Furthermore, we verify the correlated negative curvature condition under reasonable assumptions. It is shown that MCMC-SGD escapes from saddle points and reaches $(ε,ε^{1/4})$ approximate second order stationary points or $ε^{1/2}$-variance points at least $O(ε^{-11/2}\log^{2}(1/ε) )$ steps with high probability. Our analysis unveils the convergence pattern of MCMC-SGD across a broad class of stochastic optimization problems, and interprets the convergence phenomena observed in practical applications. △ Less

Submitted 23 March, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

arXiv:2303.10320 [pdf, other]

Topology automaton and conformal dimension of post-critical-finite self-similar sets

Authors: Hui Rao, Zhi-Ying Wen, Qihan Yuan, Yuan Zhang

Abstract: In this paper, we use a class of finite state automata, called topology automaton, to study the metric classification of a special class of post-critically finite self-similar sets. As an application, we prove that the conformal dimension of post-critically finite self-similar dendrites and fractal gasket with connected component is 1. In this paper, we use a class of finite state automata, called topology automaton, to study the metric classification of a special class of post-critically finite self-similar sets. As an application, we prove that the conformal dimension of post-critically finite self-similar dendrites and fractal gasket with connected component is 1. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 38 pages, 11 figures, 27 references

arXiv:2303.01211 [pdf, other]

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

Authors: Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv

Abstract: In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can… ▽ More In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can not capture this very well. To address this problem, we propose using the deepest network instruct shallow network for enhancing shallow networks. Specifically, the networks of FSD are divided into several segments, the deepest network being used as the teacher model, and all shallow networks become multiple student models by adding classifiers. Meanwhile, the distillation path between the deepest network feature and shallow network features is used to reduce the feature difference. A series of experimental results on the ASVspoof 2019 LA and PA datasets show the effectiveness of the proposed method, with significant improvements compared to the baseline. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.13524 [pdf, other]

doi 10.1117/1.APN.2.5.056007

Efficient reference-less transmission matrix retrieval for a multimode fiber using fast Fourier transform

Authors: Jingshan Zhong, Zhong Wen, Quanzhi Li, Qilin Deng, Qing Yang

Abstract: Transmission matrix (TM) linearly maps the incident and transmitted complex fields, and has been used widely due to its ability to characterize scattering media. It is computationally demanding to reconstruct the TM from intensity images measured by a reference-less experimental setup. Removing reference beam for interference gains the advantage of simple experimental setup. However, the long comp… ▽ More Transmission matrix (TM) linearly maps the incident and transmitted complex fields, and has been used widely due to its ability to characterize scattering media. It is computationally demanding to reconstruct the TM from intensity images measured by a reference-less experimental setup. Removing reference beam for interference gains the advantage of simple experimental setup. However, the long computational time still limits its practical application. We propose an efficient reference-less TM retrieval method for multimode fiber (MMF). Our method adopts a data acquisition scheme which employs Fourier transform matrix in the design of the incident fields. We develop a nonlinear optimization algorithm to solve the TM retrieval problem in a parallel manner. The data acquisition scheme allows the algorithm to be implemented with fast Fourier transform (FFT), and hence achieves great efficiency improvement. Further, our method acquires intensity images at a defocus plane and correct the error of relative phase offset of TM recovered from the intensity images measured at one fixed plane. We validate the proposed TM retrieval method with both simulations and experiments. By using FFT, our TM retrieval algorithm achieves 1200x speed-up in computational time, and recovers $2286 \times 8192$ TM of a 0.22 NA and $50 \ μm$ diameter MMF with 124.9 seconds by a computer of 32 CPU cores. With the advantages of efficiency and the correction of phase offset, our method paves the way for the application of reference-less TM retrieval in real practice. △ Less

Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.13087 [pdf, other]

Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation

Authors: Zhifa Ke, Junyu Zhang, Zaiwen Wen

Abstract: In this paper, a Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and effi… ▽ More In this paper, a Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and efficiently compute the GN updates by cheap matrix iterations. Under mild conditions, non-asymptotic finite-sample convergence to the globally optimal Q function is derived for various nonlinear function approximations. In particular, for neural network parameterization with relu activation, GNTD achieves an improved sample complexity of $\tilde{\mathcal{O}}(\varepsilon^{-1})$, as opposed to the $\mathcal{\mathcal{O}}(\varepsilon^{-2})$ sample complexity of the existing neural TD methods. An $\tilde{\mathcal{O}}(\varepsilon^{-1.5})$ sample complexity of GNTD is also established for general smooth function approximations. We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods. △ Less

Submitted 31 March, 2024; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.12400 [pdf, other]

Towards Stable Test-Time Adaptation in Dynamic Wild World

Authors: Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Mingkui Tan

Abstract: Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance whe… ▽ More Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, and 3) online imbalanced label distribution shifts, which are quite common in practice. In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. Conversely, TTA can perform more stably with batch-agnostic norm layers, \ie, group or layer norm. However, we observe that TTA with group and layer norms does not always succeed and still suffers many failure cases. By digging into the failure cases, we find that certain noisy test samples with large gradients may disturb the model adaption and result in collapsed trivial solutions, \ie, assigning the same class label for all samples. To address the above collapse issue, we propose a sharpness-aware and reliable entropy minimization method, called SAR, for further stabilizing TTA from two aspects: 1) remove partial noisy samples with large gradients, 2) encourage model weights to go to a flat minimum so that the model is robust to the remaining noisy samples. Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: accepted by International Conference on Learning Representations (ICLR) 2023 as Notable-Top-5%; 27 pages, 10 figures, 18 tables

arXiv:2302.12375 [pdf, other]

doi 10.1016/j.cma.2023.115965

Isogeometric analysis using G-spline surfaces with arbitrary unstructured quadrilateral layout

Authors: Zuowei Wen, Md. Sadman Faruque, Xin Li, Xiaodong Wei, Hugo Casquero

Abstract: G-splines are a generalization of B-splines that deals with extraordinary points by imposing G^1 constraints across their spoke edges, thus obtaining a continuous tangent plane throughout the surface. Using the isoparametric concept and the Bubnov-Galerkin method to solve partial differential equations with G-splines results in discretizations with global C^1 continuity in physical space. Extraord… ▽ More G-splines are a generalization of B-splines that deals with extraordinary points by imposing G^1 constraints across their spoke edges, thus obtaining a continuous tangent plane throughout the surface. Using the isoparametric concept and the Bubnov-Galerkin method to solve partial differential equations with G-splines results in discretizations with global C^1 continuity in physical space. Extraordinary points (EPs) are required to represent manifold surfaces with arbitrary topological genus. In this work, we allow both interior and boundary EPs and there are no limitations regarding how close EPs can be from each other. Reaching this level of flexibility is necessary so that splines with EPs can become mainstream in the design-through-analysis cycle of complex thin-walled structures. To the authors' knowledge, the two EP constructions based on imposing G^1 constraints proposed in this work are the first two EP constructions used in isogeometric analysis (IGA) that combine the following distinctive characteristics: (1) Only vertex-based control points are used and they behave as geometric shape handles, (2) any control point of the control net can potentially be an EP, (3) global C^1 continuity in physical space is obtained without introducing singularities, (4) faces around EPs are not split into multiple elements, and (5) good surface quality is attained. The studies of convergence and surface quality performed in this paper suggest that G-splines are more suitable for IGA than EP constructions based on the D-patch framework. Finally, we have represented the stiffener, the inner part, and the outer part of a B-pillar with G-spline surfaces and solved eigenvalue problems using both Kirchhoff-Love and Reissner-Mindlin shell theories. The results are compared with bilinear quadrilateral meshes and excellent agreement is found between G-splines and conventional finite elements. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.12046 [pdf, ps, other]

Observation of Q-switched and continuous wave regimes with mode-hopping in Er-doped fiber lasers incorporating a dynamic population grating

Authors: Zengrun Wen, Xiulin Fan, Kaile Wang, Weiming Wang, Song Gao, Wenjing Hao, Yuanmei Gao, Yangjian Cai, Liren Zheng

Abstract: Dynamic population gratings (DPGs) in rare-earth doped fibers are prevalent devices in fiber lasers for the production of single-longitudinal-mode emission, Q-switched pulses, and wavelength self-sweeping regimes. This study presents a transition from Q-switched state to continuous wave (CW) state, accompanying irregular mode-hopping, in an erbium-doped fiber laser with a heavily-doped DPG centere… ▽ More Dynamic population gratings (DPGs) in rare-earth doped fibers are prevalent devices in fiber lasers for the production of single-longitudinal-mode emission, Q-switched pulses, and wavelength self-sweeping regimes. This study presents a transition from Q-switched state to continuous wave (CW) state, accompanying irregular mode-hopping, in an erbium-doped fiber laser with a heavily-doped DPG centered at 1549.95 nm. Our results demonstrate that the transition between these two states can be achieved by adjusting the pump power. The repetition frequency of the Q-switched pulse increases monotonically with the increasing pump power, while the pulse duration initially narrows and then expands because the reduced peak intensity weakens the nonlinear effect. Additionally, modulation peaks are evident on both the Q-switched pulse train and the CW background, which are induced by the irregular mode-hopping caused by the DPG. Furthermore, we observe that the central wavelength fluctuates within a range of 0.05 nm. These results provide valuable insight into the DPG effect in heavily-doped fibers. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.09205 [pdf, other]

Approximate Thompson Sampling via Epistemic Neural Networks

Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Abstract: Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs acro… ▽ More Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs across inputs. Notably, accuracy of marginal predictive distributions does not suffice. Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions. We compare a range of ENNs through computational experiments that assess their performance in approximating TS across bandit and reinforcement learning environments. The results indicate that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance. Further, we demonstrate that the \textit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost. This enables effective application of TS with computation that scales gracefully to complex environments. △ Less

Submitted 17 February, 2023; originally announced February 2023.

arXiv:2302.05010 [pdf, other]

doi 10.1088/1674-4527/acc155

Constraints on dark energy from the CSST galaxy clusters

Authors: Yufei Zhang, Mingjing Chen, Zhonglue Wen, Wenjuan Fang

Abstract: We study the potential of the galaxy cluster sample expected from the China Space Station Telescope (CSST) survey to constrain dark energy properties. By modelling the distribution of observed cluster mass for a given true mass to be log-normal and adopting a selection threshold in the observed mass $M_{200m} \geq 0.836 \times 10^{14} h^{-1}M_{\odot}$, we find about $4.1 \times 10^{5}$ clusters in… ▽ More We study the potential of the galaxy cluster sample expected from the China Space Station Telescope (CSST) survey to constrain dark energy properties. By modelling the distribution of observed cluster mass for a given true mass to be log-normal and adopting a selection threshold in the observed mass $M_{200m} \geq 0.836 \times 10^{14} h^{-1}M_{\odot}$, we find about $4.1 \times 10^{5}$ clusters in the redshift range $0 \leq z \leq 1.5$ can be detected by the CSST. We construct the Fisher matrix for the cluster number counts from CSST, and forecast constraints on dark energy parameters for models with constant ($w_0$CDM) and time dependent ($w_0w_a$CDM) equation of state. In the self-calibration scheme, the dark energy equation of state parameter $w_0$ of $w_0$CDM model can be constrained to $Δw_0 = 0.036$. If $w_a$ is added as a free parameter, we obtain $Δw_0 = 0.077$ and $Δw_a = 0.39$ for the $w_0w_a$CDM model, with a Figure of Merit for ($w_0,w_a$) to be 68.99. Should we had perfect knowledge of the observable-mass scaling relation (``known SR" scheme), we would obtain $Δw_0 = 0.012$ for $w_0$CDM model, $Δw_0 = 0.062$ and $Δw_a = 0.24$ for $w_0w_a$CDM model. The dark energy Figure of Merit of ($w_0,w_a$) increases to 343.25. By extending the maximum redshift of the clusters from $z_{max} \sim 1.5$ to $z_{max} \sim 2$, the dark energy Figure of Merit for ($w_0,w_a$) increases to 89.72 (self-calibration scheme) and 610.97 (``known SR" scheme), improved by a factor of $\sim 1.30$ and $\sim 1.78$, respectively. We find that the impact of clusters' redshift uncertainty on the dark energy constraints is negligible as long as the redshift error of clusters is smaller than 0.01, achievable by CSST. We also find that the bias in logarithm mass must be calibrated to be $0.30$ or better to avoid significant dark energy parameter bias. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: 19 pages, 5 figures, 4 tables. Accepted for publication in Research in Astronomy and Astrophysics

Journal ref: Res. Astron. Astrophys. 23 045011 (2023)

arXiv:2302.04580 [pdf, other]

doi 10.24963/ijcai.2022/591

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Authors: Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

Abstract: Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the structureless summary covering a few input documents. Meanwhile, previous structured summary generation works focus on summarizing a single document into a multi-… ▽ More Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the structureless summary covering a few input documents. Meanwhile, previous structured summary generation works focus on summarizing a single document into a multi-section summary. These existing datasets and methods cannot meet the requirements of summarizing numerous academic papers into a structured summary. To deal with the scarcity of available data, we propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic. We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents. To organize the diverse content from dozens of input documents and ensure the efficiency of processing long text sequences, we propose a summarization method named category-based alignment and sparse transformer (CAST). The experimental results show that our CAST method outperforms various advanced summarization methods. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: IJCAI 2022

ACM Class: I.2.7; I.7

arXiv:2302.03815 [pdf, other]

Long Text and Multi-Table Summarization: Dataset and Method

Authors: Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

Abstract: Automatic document summarization aims to produce a concise summary covering the input document's salient information. Within a report document, the salient information can be scattered in the textual and non-textual content. However, existing document summarization datasets and methods usually focus on the text and filter out the non-textual content. Missing tabular data can limit produced summari… ▽ More Automatic document summarization aims to produce a concise summary covering the input document's salient information. Within a report document, the salient information can be scattered in the textual and non-textual content. However, existing document summarization datasets and methods usually focus on the text and filter out the non-textual content. Missing tabular data can limit produced summaries' informativeness, especially when summaries require covering quantitative descriptions of critical metrics in tables. Existing datasets and methods cannot meet the requirements of summarizing long text and multiple tables in each report. To deal with the scarcity of available data, we propose FINDSum, the first large-scale dataset for long text and multi-table summarization. Built on 21,125 annual reports from 3,794 companies, it has two subsets for summarizing each company's results of operations and liquidity. To summarize the long text and dozens of tables in each report, we present three types of summarization methods. Besides, we propose a set of evaluation metrics to assess the usage of numerical information in produced summaries. Dataset analyses and experimental results indicate the importance of jointly considering input textual and tabular data when summarizing report documents. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: EMNLP 2022 Findings

ACM Class: I.2.7; I.7

arXiv:2302.03773 [pdf, other]

What Matters In The Structured Pruning of Generative Language Models?

Authors: Michael Santacroce, Zixin Wen, Yelong Shen, Yuanzhi Li

Abstract: Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, rando… ▽ More Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, random, and movement pruning on the feed-forward layers in GPT-type models. Unexpectedly, random pruning results in performance that is comparable to the best established methods, across multiple natural language generation tasks. To understand these results, we provide a framework for measuring neuron-level redundancy of models pruned by different methods, and discover that established structured pruning methods do not take into account the distinctiveness of neurons, leaving behind excess redundancies. In view of this, we introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models. We then discuss the effects of our techniques on different redundancy metrics to explain the improved performance. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2302.03319 [pdf, ps, other]

Leveraging Demonstrations to Improve Online Learning: Quality Matters

Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

Abstract: We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning… ▽ More We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning algorithm and model. The demonstration data is generated by an expert with a given competence level, a notion we introduce. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule and derive a prior-dependent Bayesian regret bound. This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level. We also develop a practical, approximate informed TS algorithm through Bayesian bootstrapping and show substantial empirical regret reduction through experiments. △ Less

Submitted 17 May, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: Accepted at ICML 2023

arXiv:2301.10013 [pdf]

doi 10.1016/j.mtphys.2024.101450

Ultra-soft Thermal Diodes Enabled by Dual-Alkane-Based Phase Change Composites

Authors: Yunsong Pang, Junhong Li, Zhibin Wen, Ting Liang, Shan Gao, Dezhao Huang, Rong Sun Jianbin Xu Tengfei Luo, Xiaoliang Zeng

Abstract: Thermal diode, a type of device that allows heat to flow in one direction preferentially, can be employed in many thermal applications. However, if the mechanical compliance of the thermal diode is poor, which prevents its intimate contact with heat source or sink surfaces, the thermal rectification performance cannot be used to its full extent. In this work, we introduce a heterojunction thermal… ▽ More Thermal diode, a type of device that allows heat to flow in one direction preferentially, can be employed in many thermal applications. However, if the mechanical compliance of the thermal diode is poor, which prevents its intimate contact with heat source or sink surfaces, the thermal rectification performance cannot be used to its full extent. In this work, we introduce a heterojunction thermal diode made of a phase change material (PCM) consisting of dual alkanes (hexadecane and paraffine wax) and polyurethane. The fabricated thermal diode exhibits an ultra soft mechanical feature, with a low elastic modulus of 0.4 KPa and larger than 300% elongation until failure: the best values reported to date for thermal diodes. The measured thermal rectification factor is as high as 1.42 that in line with the theoretical model prediction. Molecular dynamic simulations reveal that the thermal rectification mechanism of the PCM based thermal diode originates from the crystal-amorphous phase transition of the hexadecane terminal as the temperature bias flips. Therefore, the heat flow in the forward direction is greater than the flux in the reverse direction. A series of experiments and finite element analyses are employed to verify the feasibility of thermal diodes for applications. Our results demonstrate that the fabricated thermal diode can be potentially used in building envelop to help with temperature regulation and thus reduce energy consumption for space cooling or heating. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Journal ref: Materials Today Physics (2024): 101450

arXiv:2301.06290 [pdf, ps, other]

All possible orders less than 1 of transcendental entire solutions of linear difference equations with polynomial coefficients

Authors: Katsuya Ishizaki, Zhi-Tao Wen

Abstract: In this paper, we study all possible orders which are less than 1 of transcendental entire solutions of linear difference equations \begin{equation} P_m(z)Δ^mf(z)+\cdots+P_1(z)Δf(z)+P_0(z)f(z)=0,\tag{+} \end{equation} where $P_j(z)$ are polynomials for $j=0,\ldots,m$. Firstly, we give the condition on existence of transcendental entire solutions of order less than 1 of difference equations (… ▽ More In this paper, we study all possible orders which are less than 1 of transcendental entire solutions of linear difference equations \begin{equation} P_m(z)Δ^mf(z)+\cdots+P_1(z)Δf(z)+P_0(z)f(z)=0,\tag{+} \end{equation} where $P_j(z)$ are polynomials for $j=0,\ldots,m$. Firstly, we give the condition on existence of transcendental entire solutions of order less than 1 of difference equations (+). Secondly, we give a list of all possible orders which are less than 1 of transcendental entire solutions of difference equations (+). Moreover, the maximum number of distinct orders which are less than 1 of transcendental entire solutions of difference equations (+) are shown. In addition, for any given rational number $0<ρ<1$, we can construct a linear difference equation with polynomial coefficients which has a transcendental entire solution of order $ρ$. At least, some examples are illustrated for our main theorems. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2301.03801 [pdf, other]

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

Authors: Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Jianhua Tao

Abstract: Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality. Due to their similarity, this paper proposes UnifySpeech, which brings TTS and VC into a unified framework for the first time. The model is based on the assumption that speech can be decoupled into three independent components: content… ▽ More Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality. Due to their similarity, this paper proposes UnifySpeech, which brings TTS and VC into a unified framework for the first time. The model is based on the assumption that speech can be decoupled into three independent components: content information, speaker information, prosody information. Both TTS and VC can be regarded as mining these three parts of information from the input and completing the reconstruction of speech. For TTS, the speech content information is derived from the text, while in VC it's derived from the source speech, so all the remaining units are shared except for the speech content extraction module in the two tasks. We applied vector quantization and domain constrain to bridge the gap between the content domains of TTS and VC. Objective and subjective evaluation shows that by combining the two task, TTS obtains better speaker modeling ability while VC gets hold of impressive speech content decoupling capability. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2301.01548 [pdf, ps, other]

doi 10.1088/1674-4527/acb251

Three New Spiral Galaxies with Active Nuclei Producing Double Radio Lobes

Authors: Xuyang Gao, Zhongsheng Yuan, Jinlin Han, Zhonglue Wen, Susu Shan

Abstract: Double radio lobes are generally believed to be produced by active nuclei of elliptical galaxies. However, several double-lobed radio sources have been solidly found to be associated with spiral galaxies. By cross-matching $\sim9\times10^5$ spiral galaxies selected from the Sloan Digital Sky Survey DR8 data with the full 1.4-GHz radio source catalogs of NRAO VLA Sky Survey and Faint Images of Radi… ▽ More Double radio lobes are generally believed to be produced by active nuclei of elliptical galaxies. However, several double-lobed radio sources have been solidly found to be associated with spiral galaxies. By cross-matching $\sim9\times10^5$ spiral galaxies selected from the Sloan Digital Sky Survey DR8 data with the full 1.4-GHz radio source catalogs of NRAO VLA Sky Survey and Faint Images of Radio Sky at Twenty-centimeters, we identify three new spiral galaxies: J0326$-$0623, J1110+0321 and J1134+3046 that produce double radio lobes, and five double-lobed spirals previously known. By combining the newly discovered and all the other known cases in literature, we confirm the relation that more massive spirals could produce more powerful large-scale radio jets. We find that most of these spiral galaxies are located in a galaxy group or a poor cluster, in which the environment is denser than in the field, and about half of them are the central brightest galaxies in their parent system. We therefore suggest that the environment is one of the key factors for a spiral to produce double radio lobes. △ Less

Submitted 16 February, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: typos corrected, accepted for publication in RAA

arXiv:2212.10191 [pdf, other]

Emotion Selectable End-to-End Text-based Speech Editing

Authors: Tao Wang, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, Chu Yuan Zhang

Abstract: Text-based speech editing allows users to edit speech by intuitively cutting, copying, and pasting text to speed up the process of editing speech. In the previous work, CampNet (context-aware mask prediction network) is proposed to realize text-based speech editing, significantly improving the quality of edited speech. This paper aims at a new task: adding emotional effect to the editing speech du… ▽ More Text-based speech editing allows users to edit speech by intuitively cutting, copying, and pasting text to speed up the process of editing speech. In the previous work, CampNet (context-aware mask prediction network) is proposed to realize text-based speech editing, significantly improving the quality of edited speech. This paper aims at a new task: adding emotional effect to the editing speech during the text-based speech editing to make the generated speech more expressive. To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech. Firstly, we propose an end-to-end emotion-selectable text-based speech editing model. The key idea of the model is to control the emotion of generated speech by introducing additional emotion attributes based on the context-aware mask prediction network. Secondly, to prevent the emotion of the generated speech from being interfered by the emotional components in the original speech, a neutral content generator is proposed to remove the emotion from the original speech, which is optimized by the generative adversarial framework. Thirdly, two data augmentation methods are proposed to enrich the emotional and pronunciation information in the training set, which can enable the model to edit the unseen speaker's speech. The experimental results that 1) Emo-CampNet can effectively control the emotion of the generated speech in the process of text-based speech editing; And can edit unseen speakers' speech. 2) Detailed ablation experiments further prove the effectiveness of emotional selectivity and data augmentation methods. The demo page is available at https://hairuo55.github.io/Emo-CampNet/ △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: Under review, 12 pages, 11 figures, demo page is available at https://hairuo55.github.io/Emo-CampNet/

arXiv:2212.09970 [pdf, other]

Data Augmentation on Graphs: A Technical Survey

Authors: Jiajun Zhou, Chenxuan Xie, Shengbo Gong, Zhenyu Wen, Xiangyu Zhao, Qi Xuan, Xiaoniu Yang

Abstract: In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. To advance research in this emerging direction, this survey provides a comprehensive review and summary of existing graph data augm… ▽ More In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. To advance research in this emerging direction, this survey provides a comprehensive review and summary of existing graph data augmentation (GDAug) techniques. Specifically, this survey first provides an overview of various feasible taxonomies and categorizes existing GDAug studies based on multi-scale graph elements. Subsequently, for each type of GDAug technique, this survey formalizes standardized technical definition, discuss the technical details, and provide schematic illustration. The survey also reviews domain-specific graph data augmentation techniques, including those for heterogeneous graphs, temporal graphs, spatio-temporal graphs, and hypergraphs. In addition, this survey provides a summary of available evaluation metrics and design guidelines for graph data augmentation. Lastly, it outlines the applications of GDAug at both the data and model levels, discusses open issues in the field, and looks forward to future directions. The latest advances in GDAug are summarized in GitHub. △ Less

Submitted 21 June, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Version 2. Under review

arXiv:2212.09450 [pdf, other]

doi 10.1145/3580305.3599249

Accelerating Antimicrobial Peptide Discovery with Latent Structure

Authors: Danqing Wang, Zeyu Wen, Fei Ye, Lei Li, Hao Zhou

Abstract: Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-sca… ▽ More Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-scale vector quantization in the latent space to represent secondary structures (e.g. alpha helix and beta sheet). By sampling in the latent space, LSSAMP can simultaneously generate peptides with ideal sequence attributes and secondary structures. Experimental results show that the peptides generated by LSSAMP have a high probability of antimicrobial activity. Our wet laboratory experiments verified that two of the 21 candidates exhibit strong antimicrobial activity. The code is released at https://github.com/dqwang122/LSSAMP. △ Less

Submitted 20 August, 2023; v1 submitted 28 November, 2022; originally announced December 2022.

Comments: KDD 2023

Showing 101–150 of 508 results for author: Wen, Z