Search | arXiv e-print repository

arXiv:2408.11313 [pdf, other]

Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer

Authors: Weipeng Jiang, Zhenting Wang, Juan Zhai, Shiqing Ma, Zhengyu Zhao, Chao Shen

Abstract: Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks… ▽ More Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks to maximize the likelihood of harmful LLM outputs through token-level optimization, also encounters several limitations: requiring white-box access, necessitating pre-constructed affirmative phrase, and suffering from low efficiency. In this paper, we present ECLIPSE, a novel and efficient black-box jailbreaking method utilizing optimizable suffixes. Drawing inspiration from LLMs' powerful generation and optimization capabilities, we employ task prompts to translate jailbreaking goals into natural language instructions. This guides the LLM to generate adversarial suffixes for malicious queries. In particular, a harmfulness scorer provides continuous feedback, enabling LLM self-reflection and iterative optimization to autonomously and efficiently produce effective suffixes. Experimental results demonstrate that ECLIPSE achieves an average attack success rate (ASR) of 0.92 across three open-source LLMs and GPT-3.5-Turbo, significantly surpassing GCG in 2.4 times. Moreover, ECLIPSE is on par with template-based methods in ASR while offering superior attack efficiency, reducing the average attack overhead by 83%. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.01305 [pdf, ps, other]

Ergodicity of Stochastic two-phase Stefan problem driven by pure jump Lévy noise

Authors: Xiaotian Ge, Shijie Shang, Jianliang Zhai, Tusheng Zhang

Abstract: In this paper, we consider stochastic two-phase Stefan problem driven by general jump Lévy noise. We first obtain the existence and uniqueness of the strong solution and then establish the ergodicity of the stochastic Stefan problem. Moreover, we give a precise characterization of the support of the invariant measures which provides the regularities of the stationary solutions of the stochastic fr… ▽ More In this paper, we consider stochastic two-phase Stefan problem driven by general jump Lévy noise. We first obtain the existence and uniqueness of the strong solution and then establish the ergodicity of the stochastic Stefan problem. Moreover, we give a precise characterization of the support of the invariant measures which provides the regularities of the stationary solutions of the stochastic free boundary problems. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.15462 [pdf, other]

Efficient Retrieval with Learned Similarities

Authors: Bailu Ding, Jiaqi Zhai

Abstract: Retrieval plays a fundamental role in recommendation systems, search, and natural language processing by efficiently finding relevant items from a large corpus given a query. Dot products have been widely used as the similarity function in such retrieval tasks, thanks to Maximum Inner Product Search (MIPS) that enabled efficient retrieval based on dot products. However, state-of-the-art retrieval… ▽ More Retrieval plays a fundamental role in recommendation systems, search, and natural language processing by efficiently finding relevant items from a large corpus given a query. Dot products have been widely used as the similarity function in such retrieval tasks, thanks to Maximum Inner Product Search (MIPS) that enabled efficient retrieval based on dot products. However, state-of-the-art retrieval algorithms have migrated to learned similarities. Such algorithms vary in form; the queries can be represented with multiple embeddings, complex neural networks can be deployed, the item ids can be decoded directly from queries using beam search, and multiple approaches can be combined in hybrid solutions. Unfortunately, we lack efficient solutions for retrieval in these state-of-the-art setups. Our work investigates techniques for approximate nearest neighbor search with learned similarity functions. We first prove that Mixture-of-Logits (MoL) is a universal approximator, and can express all learned similarity functions. We next propose techniques to retrieve the approximate top K results using MoL with a tight bound. We finally compare our techniques with existing approaches, showing that MoL sets new state-of-the-art results on recommendation retrieval tasks, and our approximate top-k retrieval with learned similarities outperforms baselines by up to two orders of magnitude in latency, while achieving > .99 recall rate of exact algorithms. △ Less

Submitted 13 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.14118 [pdf, other]

Beyond Code Generation: Assessing Code LLM Maturity with Postconditions

Authors: Fusen He, Juan Zhai, Minxue Pan

Abstract: Most existing code Large Language Model (LLM) benchmarks, e.g., EvalPlus, focus on the code generation tasks. Namely, they contain a natural language description of a problem and ask the LLM to write code to solve the problem. We argue that they do not capture all capabilities needed to assess the quality of a code LLM. In this paper, we propose a code LLM maturity model, based on the postconditio… ▽ More Most existing code Large Language Model (LLM) benchmarks, e.g., EvalPlus, focus on the code generation tasks. Namely, they contain a natural language description of a problem and ask the LLM to write code to solve the problem. We argue that they do not capture all capabilities needed to assess the quality of a code LLM. In this paper, we propose a code LLM maturity model, based on the postcondition generation problem, to access a more complete set of code LLM capabilities. We choose the postcondition generation problem as it requires the code LLM to understand the code including semantics, natural language, and also have the capability to generate unambiguous postconditions in programming languages (i.e., the generation capablity). Moreover, postconditions have various types, requiring different levels of these capabilities, making it suitable to evaluate the maturity of the code LLM. Based on our designed maturity model, we augment the EvalPlus dataset to a postcondition testing benchmark, and evaluated several open-sourced models. Our results highlight the necessary improvements needed for better LLMs for code. Code: https://github.com/MatureModel/PostcondGen △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.12319 [pdf, other]

Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model

Authors: Tao Wang, Wei Wen, Jingzhi Zhai, Kang Xu, Haoming Luo

Abstract: Point cloud segmentation is crucial for robotic visual perception and environmental understanding, enabling applications such as robotic navigation and 3D reconstruction. However, handling the sparse and unordered nature of point cloud data presents challenges for efficient and accurate segmentation. Inspired by the Mamba model's success in natural language processing, we propose the Serialized Po… ▽ More Point cloud segmentation is crucial for robotic visual perception and environmental understanding, enabling applications such as robotic navigation and 3D reconstruction. However, handling the sparse and unordered nature of point cloud data presents challenges for efficient and accurate segmentation. Inspired by the Mamba model's success in natural language processing, we propose the Serialized Point Cloud Mamba Segmentation Model (Serialized Point Mamba), which leverages a state-space model to dynamically compress sequences, reduce memory usage, and enhance computational efficiency. Serialized Point Mamba integrates local-global modeling capabilities with linear complexity, achieving state-of-the-art performance on both indoor and outdoor datasets. This approach includes novel techniques such as staged point cloud sequence learning, grid pooling, and Conditional Positional Encoding, facilitating effective segmentation across diverse point cloud tasks. Our method achieved 76.8 mIoU on Scannet and 70.3 mIoU on S3DIS. In Scannetv2 instance segmentation, it recorded 40.0 mAP. It also had the lowest latency and reasonable memory use, making it the SOTA among point semantic segmentation models based on mamba. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.02805 [pdf, other]

Efficient DNN-Powered Software with Fair Sparse Models

Authors: Xuanqi Gao, Weipeng Jiang, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

Abstract: With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hy… ▽ More With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hypothesis (LTH), a prevailing model pruning approach. This paper demonstrates that fairness issue of LTHbased pruning arises from both its subnetwork selection and training procedures, highlighting the inadequacy of existing remedies. To address this, we propose a novel pruning framework, Ballot, which employs a novel conflict-detection-based subnetwork selection to find accurate and fair subnetworks, coupled with a refined training process to attain a high-performance model, thereby improving the fairness of DNN-powered software. By means of this procedure, Ballot improves the fairness of pruning by 38.00%, 33.91%, 17.96%, and 35.82% compared to state-of-the-art baselines, namely Magnitude Pruning, Standard LTH, SafeCompress, and FairScratch respectively, based on our evaluation of five popular datasets and three widely used models. Our code is available at https://anonymous.4open.science/r/Ballot-506E. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.00560 [pdf, other]

DCI: An Accurate Quality Assessment Criteria for Protein Complex Structure Models

Authors: Wenda Wang, Jiaqi Zhai, He Huang, Xinqi Gong

Abstract: The structure of proteins is the basis for studying protein function and drug design. The emergence of AlphaFold 2 has greatly promoted the prediction of protein 3D structures, and it is of great significance to give an overall and accurate evaluation of the predicted models, especially the complex models. Among the existing methods for evaluating multimer structures, DockQ is the most commonly us… ▽ More The structure of proteins is the basis for studying protein function and drug design. The emergence of AlphaFold 2 has greatly promoted the prediction of protein 3D structures, and it is of great significance to give an overall and accurate evaluation of the predicted models, especially the complex models. Among the existing methods for evaluating multimer structures, DockQ is the most commonly used. However, as a more suitable metric for complex docking, DockQ cannot provide a unique and accurate evaluation in the non-docking situation. Therefore, it is necessary to propose an evaluation strategy that can directly evaluate the whole complex without limitation and achieve good results. In this work, we proposed DCI score, a new evaluation strategy for protein complex structure models, which only bases on distance map and CI (contact-interface) map, DCI focuses on the prediction accuracy of the contact interface based on the overall evaluation of complex structure, is not inferior to DockQ in the evaluation accuracy according to CAPRI classification, and is able to handle the non-docking situation better than DockQ. Besides, we calculated DCI score on CASP datasets and compared it with CASP official assessment, which obtained good results. In addition, we found that DCI can better evaluate the overall structure deviation caused by interface prediction errors in the case of multi-chains. Our DCI is available at \url{https://gitee.com/WendaWang/DCI-score.git}, and the online-server is available at \url{http://mialab.ruc.edu.cn/DCIServer/}. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.16262 [pdf, ps, other]

Large deviations for 2D Stochastic Chemotaxis-Navier-Stokes System

Authors: Yunfeng Chen, Xuhui Peng, Jianliang Zhai

Abstract: In this paper, we establish a large deviation principle for 2D stochastic Chemotaxis-Navier-Stokes equation perturbed by a small multiplicative noise. The main difficulties come from the lack of a suitable compact embedding into the space occupied by the solutions and the inherent complexity of equation. Finite dimensional projection arguments and introducing suitable stopping times play important… ▽ More In this paper, we establish a large deviation principle for 2D stochastic Chemotaxis-Navier-Stokes equation perturbed by a small multiplicative noise. The main difficulties come from the lack of a suitable compact embedding into the space occupied by the solutions and the inherent complexity of equation. Finite dimensional projection arguments and introducing suitable stopping times play important roles. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15999 [pdf, other]

doi 10.1145/3643738

SmartAxe: Detecting Cross-Chain Vulnerabilities in Bridge Smart Contracts via Fine-Grained Static Analysis

Authors: Zeqin Liao, Yuhong Nan, Henglong Liang, Sicheng Hao, Juan Zhai, Jiajing Wu, Zibin Zheng

Abstract: With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as th… ▽ More With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as there are a number of recent security incidents with heavy financial losses caused by vulnerabilities in bridge smart contracts, as we call them Cross-Chain Vulnerabilities (CCVs). However, automatically identifying CCVs in smart contracts poses several unique challenges. Particularly, it is non-trivial to (1) identify application-specific access control constraints needed for cross-bridge asset exchange, and (2) identify inconsistent cross-chain semantics between the two sides of the bridge. In this paper, we propose SmartAxe, a new framework to identify vulnerabilities in cross-chain bridge smart contracts. Particularly, to locate vulnerable functions that have access control incompleteness, SmartAxe models the heterogeneous implementations of access control and finds necessary security checks in smart contracts through probabilistic pattern inference. Besides, SmartAxe constructs cross-chain control-flow graph (xCFG) and data-flow graph (xDFG), which help to find semantic inconsistency during cross-chain data communication. To evaluate SmartAxe, we collect and label a dataset of 88 CCVs from real-attacks cross-chain bridge contracts. Evaluation results show that SmartAxe achieves a precision of 84.95% and a recall of 89.77%. In addition, SmartAxe successfully identifies 232 new/unknown CCVs from 129 real-world cross-chain bridge applications (i.e., from 1,703 smart contracts). These identified CCVs affect a total amount of digital assets worth 1,885,250 USD. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Journal ref: The ACM International Conference on the Foundations of Software Engineering 2024

arXiv:2406.12196 [pdf, other]

CITADEL: Context Similarity Based Deep Learning Framework Bug Finding

Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Shiwei Wang, Chao Shen

Abstract: With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the envi… ▽ More With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the environment. This problem is challenging due to the difficulty of getting test oracles of performance bugs. Moreover, existing tools are inefficient, generating hundreds of test cases with few trigger bugs. In this paper, we propose CITADEL, a method that accelerates the finding of bugs in terms of efficiency and effectiveness. We observe that many DL framework bugs are similar due to the similarity of operators and algorithms belonging to the same family (e.g., Conv2D and Conv3D). Orthogonal to existing bug-finding tools, CITADEL aims to find new bugs that are similar to reported ones that have known test oracles. It works by first collecting existing bug reports and identifying problematic APIs. CITADEL defines context similarity to measure the similarity of DL framework API pairs and automatically generates test cases with oracles for APIs that are similar to the problematic APIs in existing bug reports. CITADEL respectively covers 1,436 PyTorch and 5,380 TensorFlow APIs and effectively detects 79 and 80 API bugs, among which 58 and 68 are new, and 36 and 58 have been confirmed, many of which, e.g., the 11 performance bugs cannot be detected by existing tools. Moreover, a remarkable 35.40% of the test cases generated by CITADEL can trigger bugs, which significantly transcends the ratios of 0.74%, 1.23%, and 3.90% exhibited by the state-of-the-art methods, DocTer, DeepREL, and TitanFuzz. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 12 pages, 10 figures

arXiv:2406.09465 [pdf, other]

doi 10.1145/3620666.3651383

Optimal Kernel Orchestration for Tensor Programs with Korch

Authors: Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong Zhai

Abstract: Kernel orchestration is the task of mapping the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel… ▽ More Kernel orchestration is the task of mapping the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel orchestration. This paper presents Korch, a tensor program optimizer that discovers optimal kernel orchestration strategies for tensor programs. Instead of directly fusing operators, Korch first applies operator fission to decompose tensor operators into a small set of basic tensor algebra primitives. This decomposition enables a diversity of fine-grained, inter-operator optimizations. Next, Korch optimizes kernel orchestration by formalizing it as a constrained optimization problem, leveraging an off-the-shelf binary linear programming solver to discover an optimal orchestration strategy, and generating an executable that can be directly deployed on modern GPU platforms. Evaluation on a variety of DNNs shows that Korch outperforms existing tensor program optimizers by up to 1.7x on V100 GPUs and up to 1.6x on A100 GPUs. Korch is publicly available at https://github.com/humuyan/Korch. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Fix some typos in the ASPLOS version

Journal ref: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 3 (2024) 755-769

arXiv:2406.05715 [pdf]

Error analysis of vertical test for CEPC 650 MHz superconducting radio-frequency cavity

Authors: Lingxi Ye, Peng Sha, Zhenghui Mi, Feisi He, Jiyuan Zhai

Abstract: Hundreds of 650 MHz superconducting radio-frequency (SRF) cavities with high intrinsic quality factor (Q0) and accelerating gradient (Eacc) will be adopted for Circular Electron Positron Collider (CEPC). The values of Q0 and Eacc are obtained during vertical test at 2.0 K. Hence, high accuracy of vertical test is essential for evaluating the performance of SRF cavity. The 650 MHz SRF cavities achi… ▽ More Hundreds of 650 MHz superconducting radio-frequency (SRF) cavities with high intrinsic quality factor (Q0) and accelerating gradient (Eacc) will be adopted for Circular Electron Positron Collider (CEPC). The values of Q0 and Eacc are obtained during vertical test at 2.0 K. Hence, high accuracy of vertical test is essential for evaluating the performance of SRF cavity. The 650 MHz SRF cavities achieved very high Q0 (6E10) and Eacc (40 MV/m) during the vertical test. In our study, the error analysis of vertical test was conducted in the scalar case, in order to achieve high accuracy. The uncertainties of vertical test were obtained through calculation, which was approximately 3% for Eacc and less than 5% for Q0. This result was reasonable and acceptable. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.00699 [pdf, other]

Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation

Authors: Yuan Xiao, Shiqing Ma, Juan Zhai, Chunrong Fang, Jinyuan Jia, Zhenyu Chen

Abstract: The robustness of convolutional neural networks (CNNs) is vital to modern AI-driven systems. It can be quantified by formal verification by providing a certified lower bound, within which any perturbation does not alter the original input's classification result. It is challenging due to nonlinear components, such as MaxPool. At present, many verification methods are sound but risk losing some pre… ▽ More The robustness of convolutional neural networks (CNNs) is vital to modern AI-driven systems. It can be quantified by formal verification by providing a certified lower bound, within which any perturbation does not alter the original input's classification result. It is challenging due to nonlinear components, such as MaxPool. At present, many verification methods are sound but risk losing some precision to enhance efficiency and scalability, and thus, a certified lower bound is a crucial criterion for evaluating the performance of verification tools. In this paper, we present MaxLin, a robustness verifier for MaxPool-based CNNs with tight linear approximation. By tightening the linear approximation of the MaxPool function, we can certify larger certified lower bounds of CNNs. We evaluate MaxLin with open-sourced benchmarks, including LeNet and networks trained on the MNIST, CIFAR-10, and Tiny ImageNet datasets. The results show that MaxLin outperforms state-of-the-art tools with up to 110.60% improvement regarding the certified lower bound and 5.13 $\times$ speedup for the same neural networks. Our code is available at https://github.com/xiaoyuanpigo/maxlin. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted to CVPR2024. Project page: https://github.com/xiaoyuanpigo/maxlin

arXiv:2406.00602 [pdf, other]

From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

Authors: Weipeng Jiang, Xuanqi Gao, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

Abstract: Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions… ▽ More Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions and limited manual unit test cases, inadequately assessing LCGM-generated code quality. This paper investigates code quality differences, specifically effectiveness and efficiency, when employing different natural languages as inputs, focusing on Chinese and English due to their prominent corpora and LCGM availability. Evaluating LCGM-generated code quality under bilingual inputs presents three challenges: (1) lack of high-quality bilingual programming question datasets, (2) insufficient unit test cases for comprehensive correctness verification, and (3) limited support for comparing generated code performance. To address these challenges, we curated a test suite of 52 bilingual programming questions and developed automated input generators for each. We enhanced correctness verification by sampling larger unit test cases and estimated code performance by profiling execution time relative to input size growth. Using this framework, we conducted an empirical study on six state-of-the-art LCGMs. The results revealed that LCGM-generated code exhibits varying bilingual correctness on an average of 10.5% of tasks, with 39.5% of correct code showing diverse bilingual performance differences. Our findings suggested LCGMs may not consistently generate high-quality code across different languages, providing insights for future research directions. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 10 and a quarter pages, 6 figures

arXiv:2405.15297 [pdf]

doi 10.1103/PhysRevB.109.184112

High-field magnetoelectric coupling and successive magnetic transitions in Mn-doped polar antiferromagnet Ni3TeO6

Authors: J. H. Zhang, L. Lin, C. Dong, Y. T. Chang, J. F. Wang, C. L. Lu, P. Z. Chen, W. J. Zhai, G. Z. Zhou, L. Huang, Y. S. Tang, S. H. Zheng, M. F. Liu, X. H. Zhou, Z. B. Yan, J. -M. Liu

Abstract: Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 sing… ▽ More Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 single crystals in high magnetic field (H) up to 52 T. We present a previously unreported weak ferromagnetic behavior appeared in the ab plane below 9.5 K in addition to the incommensurate helical and commensurate collinear antiferromagnetic states. In the low-field region, a spin-flop type metamagnetic transition without any hysteresis occurs at Hc1 for H // c, while another metamagnetic transition accompanied with a change in electric polarization is observed at Hc2 in the high-field region both for H // c and H // ab above 30 K, which can be attributed to the sudden rotation of magnetic moments at Ni2 sites. The ME measurements reveal that a first-order ME effect is observed in the low-T and low-H regions, while a second-order ME coupling term appears above 30 K in the magnetic field range of Hc1 < H < Hc2 for H // c and H < Hc2 for H // ab, both becoming significant with increasing temperature. Eventually, they are dominated by the second-order ME effect near the antiferromagnetic transition temperature. The present work demonstrates that Ni3-xMnxTeO6 is an exotic magnetoelectric material compared with Ni3TeO6 and its derivatives, thereby providing insights to better understand the magnetism and ME coupling in Ni3TeO6 and its derivatives. △ Less

Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 30 pages with 8 figures

Journal ref: Phys. Rev. B 109, 184112 (2024)

arXiv:2405.00414 [pdf, ps, other]

Ergodicity for 2D Navier-Stokes equations with a degenerate pure jump noise

Authors: Xuhui Peng, Jianliang Zhai, Tusheng Zhang

Abstract: In this paper, we establish the ergodicity for stochastic 2D Navier-Stokes equations driven by a highly degenerate pure jump Lévy noise. The noise could appear in as few as four directions. This gives an affirmative anwser to a longstanding problem. The case of Gaussian noise was treated in Hairer and Mattingly [\emph{Ann. of Math.}, 164(3):993--1032, 2006]. To obtain the uniqueness of invariant m… ▽ More In this paper, we establish the ergodicity for stochastic 2D Navier-Stokes equations driven by a highly degenerate pure jump Lévy noise. The noise could appear in as few as four directions. This gives an affirmative anwser to a longstanding problem. The case of Gaussian noise was treated in Hairer and Mattingly [\emph{Ann. of Math.}, 164(3):993--1032, 2006]. To obtain the uniqueness of invariant measure, we use Malliavin calculus and anticipating stochastic calculus to establish the equi-continuity of the semigroup, the so-called {\em e-property}, and prove some weak irreducibility of the solution process. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.04242 [pdf, other]

Physical Property Understanding from Language-Embedded Feature Fields

Authors: Albert J. Zhai, Yuan Shen, Emily Y. Chen, Gloria X. Wang, Xinlei Wang, Sheng Wang, Kaiyu Guan, Shenlong Wang

Abstract: Can computers perceive the physical properties of objects solely through vision? Research in cognitive science and vision science has shown that humans excel at identifying materials and estimating their physical properties based purely on visual appearance. In this paper, we present a novel approach for dense prediction of the physical properties of objects using a collection of images. Inspired… ▽ More Can computers perceive the physical properties of objects solely through vision? Research in cognitive science and vision science has shown that humans excel at identifying materials and estimating their physical properties based purely on visual appearance. In this paper, we present a novel approach for dense prediction of the physical properties of objects using a collection of images. Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object. We then construct a language-embedded point cloud and estimate the physical properties of each 3D point using a zero-shot kernel regression approach. Our method is accurate, annotation-free, and applicable to any object in the open world. Experiments demonstrate the effectiveness of the proposed approach in various physical property reasoning tasks, such as estimating the mass of common objects, as well as other properties like friction and hardness. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project page (with code): https://ajzhai.github.io/NeRF2Physics/

arXiv:2403.11421 [pdf, other]

FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines

Authors: Jiaao He, Jidong Zhai

Abstract: Cost of serving large language models (LLM) is high, but the expensive and scarce GPUs are poorly efficient when generating tokens sequentially, unless the batch of sequences is enlarged. However, the batch size is limited by some constantly reused intermediate results, namely KV-Cache. They occupy too much memory to fit more sequences into a GPU simultaneously. While they could be offloaded to ho… ▽ More Cost of serving large language models (LLM) is high, but the expensive and scarce GPUs are poorly efficient when generating tokens sequentially, unless the batch of sequences is enlarged. However, the batch size is limited by some constantly reused intermediate results, namely KV-Cache. They occupy too much memory to fit more sequences into a GPU simultaneously. While they could be offloaded to host memory, the CPU-GPU bandwidth is an inevitable bottleneck. We find a way to decompose the transformer models into two parts of different characteristics, one of which includes the memory-bound KV-Cache accessing. Our key insight is that the aggregated memory capacity, bandwidth, and computing power of CPUs across multiple nodes is an efficient option to process this part. Performance improvement comes from reduced data transmission overhead and boosted GPU throughput to process the other model part. Moreover, we address efficiency challenges brought by heterogeneity at both temporal and inter-device scopes using scheduling and performance modeling techniques. Evaluation results show that our system achieves 1.88x - 5.04x the throughput of vLLM when serving modern LLMs with the same GPU. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 15 pages, 15 figures

ACM Class: C.4

arXiv:2403.01388 [pdf, ps, other]

Wong-Zakai approximations and support theorems for SDEs under Lyapunov conditions

Authors: Qi Li, Jianliang Zhai, Tusheng Zhang

Abstract: In this paper, we establish the Stroock-Varadhan type support theorems for stochastic differential equations (SDEs) under Lyapunov conditions, which significantly improve the existing results in the literature where the coefficients of the SDEs are required to be globally Lipschitz and of linear growth. Our conditions are very mild to include many important models, e.g. Threshold Ornstein-Ulenbeck… ▽ More In this paper, we establish the Stroock-Varadhan type support theorems for stochastic differential equations (SDEs) under Lyapunov conditions, which significantly improve the existing results in the literature where the coefficients of the SDEs are required to be globally Lipschitz and of linear growth. Our conditions are very mild to include many important models, e.g. Threshold Ornstein-Ulenbeck process, Stochastic SIR model, Stochastic Lotka-Volterra systems, Stochastic Duffing-van der Pol oscillator model, which have polynomial the coefficients. To obtain the support theorem, we prove a new Wong-Zakai approximation problem, which is of independent interest. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.19274 [pdf, other]

Mixed-halide perovskite alloys $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$: New insight of configuration entropy effect from first principles and phase diagrams

Authors: Fang Pan, Junni Zhai, Jinyu Chen, Lin Yang, Hua Dong, Fang Yuan, Zhuangde Jiang, Wei Ren, Zuo-Guang Ye, Guo-Xu Zhang, Jingrui Li

Abstract: Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of co… ▽ More Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of configuration and vibration entropies. Allowed by the $20$-atomic models for the $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$ series, the partition functions and therewith thermodynamic state functions are calculated by traversing all possible mixed-halide configurations. We can thus evaluate the temperature- and system-dependent configuration entropy, which largely corrects the conventional approach based on the ideal solution model. Finally, temperature-composition phase diagrams that include $α$, $β$, $γ$ and $δ$ phases of both alloys are constructed based on the free energy data, for which the contribution of phonon vibrations is included. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17152 [pdf, other]

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Authors: Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi

Abstract: Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in… ▽ More Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems. We reformulate recommendation problems as sequential transduction tasks within a generative modeling framework ("Generative Recommenders"), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data. HSTU outperforms baselines over synthetic and public datasets by up to 65.8% in NDCG, and is 5.3x to 15.2x faster than FlashAttention2-based Transformers on 8192 length sequences. HSTU-based Generative Recommenders, with 1.5 trillion parameters, improve metrics in online A/B tests by 12.4% and have been deployed on multiple surfaces of a large internet platform with billions of users. More importantly, the model quality of Generative Recommenders empirically scales as a power-law of training compute across three orders of magnitude, up to GPT-3/LLaMa-2 scale, which reduces carbon footprint needed for future model developments, and further paves the way for the first foundational models in recommendations. △ Less

Submitted 5 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 26 pages, 13 figures. ICML'24. Code available at https://github.com/facebookresearch/generative-recommenders

arXiv:2402.16522 [pdf, other]

Uniform large deviations and metastability of random dynamical systems

Authors: Jifa Jiang, Jian Wang, Jianliang Zhai, Tusheng Zhang

Abstract: In this paper, we first provide a criterion on uniform large deviation principles (ULDP) of stochastic differential equations under Lyapunov conditions on the coefficients, which can be applied to stochastic systems with coefficients of polynomial growth and possible degenerate driving noises. In the second part, using the ULDP criterion we preclude the concentration of limiting measures of invari… ▽ More In this paper, we first provide a criterion on uniform large deviation principles (ULDP) of stochastic differential equations under Lyapunov conditions on the coefficients, which can be applied to stochastic systems with coefficients of polynomial growth and possible degenerate driving noises. In the second part, using the ULDP criterion we preclude the concentration of limiting measures of invariant measures of stochastic dynamical systems on repellers and acyclic saddle chains and extend Freidlin and Wentzell's asymptotics theorem to stochastic systems with unbounded coefficients. Of particular interest, we determine the limiting measures of the invariant measures of the famous stochastic van der Pol equation and van der Pol Duffing equation whose noises are naturally degenerate. We also construct two examples to match the global phase portraits of Freidlin and Wentzell's unperturbed systems and to explicitly compute their transition difficulty matrices. Other applications include stochastic May-Leonard system and random systems with infinitely many equivalent classes. △ Less

Submitted 26 February, 2024; originally announced February 2024.

MSC Class: 60B10; 60F10; 60H10; 37A50; 37C70

arXiv:2402.12640 [pdf, ps, other]

Invertibility of local geodesic transverse and mixed ray transforms II: higher order tensors

Authors: Gunther Uhlmann, Jian Zhai

Abstract: Consider a compact Riemannian manifold in dimension $n$ with strictly convex boundary. We show the local invertibility near a boundary point of the transverse ray transform of $2$ tensors for $n\geq 3$ and the mixed ray transform of $2+2$ tensors for $n=3$. When the manifold admits a strictly convex function, this local invertibility result leads to global invertibility. Consider a compact Riemannian manifold in dimension $n$ with strictly convex boundary. We show the local invertibility near a boundary point of the transverse ray transform of $2$ tensors for $n\geq 3$ and the mixed ray transform of $2+2$ tensors for $n=3$. When the manifold admits a strictly convex function, this local invertibility result leads to global invertibility. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11283 [pdf, other]

Deep adaptive sampling for surrogate modeling without labeled data

Authors: Xili Wang, Kejun Tang, Jiayu Zhai, Xiaoliang Wan, Chao Yang

Abstract: Surrogate modeling is of great practical significance for parametric differential equation systems. In contrast to classical numerical methods, using physics-informed deep learning methods to construct simulators for such systems is a promising direction due to its potential to handle high dimensionality, which requires minimizing a loss over a training set of random samples. However, the random s… ▽ More Surrogate modeling is of great practical significance for parametric differential equation systems. In contrast to classical numerical methods, using physics-informed deep learning methods to construct simulators for such systems is a promising direction due to its potential to handle high dimensionality, which requires minimizing a loss over a training set of random samples. However, the random samples introduce statistical errors, which may become the dominant errors for the approximation of low-regularity and high-dimensional problems. In this work, we present a deep adaptive sampling method for surrogate modeling ($\text{DAS}^2$), where we generalize the deep adaptive sampling (DAS) method [62] [Tang, Wan and Yang, 2023] to build surrogate models for low-regularity parametric differential equations. In the parametric setting, the residual loss function can be regarded as an unnormalized probability density function (PDF) of the spatial and parametric variables. This PDF is approximated by a deep generative model, from which new samples are generated and added to the training set. Since the new samples match the residual-induced distribution, the refined training set can further reduce the statistical error in the current approximate solution. We demonstrate the effectiveness of $\text{DAS}^2$ with a series of numerical experiments, including the parametric lid-driven 2D cavity flow problem with a continuous range of Reynolds numbers from 100 to 1000. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.03791 [pdf, other]

ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology

Authors: Ding Tang, Lijuan Jiang, Jiecheng Zhou, Minxi Jin, Hengjie Li, Xingcheng Zhang, Zhilin Pei, Jidong Zhai

Abstract: Large-scale models rely heavily on 3D parallelism for distributed training, which utilizes tensor parallelism (TP) as the intra-operator parallelism to partition model states across GPUs. However, TP introduces significant communication overheads and complexity in modifying single-GPU code. In this paper, we propose a TP-free distributed framework ZeroPP, which leverages the hybrid of scalable int… ▽ More Large-scale models rely heavily on 3D parallelism for distributed training, which utilizes tensor parallelism (TP) as the intra-operator parallelism to partition model states across GPUs. However, TP introduces significant communication overheads and complexity in modifying single-GPU code. In this paper, we propose a TP-free distributed framework ZeroPP, which leverages the hybrid of scalable inter-operator pipeline parallelism and intra-operator fully sharded data parallelism to train models at scale, reducing memory consumption and enabling high training efficiency. Through extensive experimentation, we demonstrate that ZeroPP achieves significant performance gains of up to 33% compared to conventional 3D parallelism while maintaining comparable GPU memory consumption. △ Less

Submitted 24 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.11385 [pdf, ps, other]

Large deviations for locally monotone stochastic partial differential equations driven by Lévy noise

Authors: Weina Wu, Jianliang Zhai, Jiahui Zhu

Abstract: We establish a Freidlin-Wentzell type large deviation principle (LDP) for a class of stochastic partial differential equations with locally monotone coefficients driven by Lévy noise. Our results essentially improve a recent work on this topic (Bernoulli, 2018) by the second named author of this paper and his collaborator, because we drop the compactness embedding assumptions, and we also make the… ▽ More We establish a Freidlin-Wentzell type large deviation principle (LDP) for a class of stochastic partial differential equations with locally monotone coefficients driven by Lévy noise. Our results essentially improve a recent work on this topic (Bernoulli, 2018) by the second named author of this paper and his collaborator, because we drop the compactness embedding assumptions, and we also make the conditions for the coefficient of the noise term more specific and weaker. To obtain our results, we utilize an improved sufficient criteria of Budhiraja, Chen, Dupuis, and Maroulas for functions of Poisson random measures, and the techniques introduced by the first and second named authors of this paper in \cite{WZSIAM} play important roles. As an application, for the first time, the Freidlin-Wentzell type LDPs for many SPDEs driven by Lévy noise in unbounded domains of $\mathbb{R}^d$, which are generally lack of compactness embeddings properties, are achieved, like e.g., stochastic $p$-Laplace equation, stochastic Burgers-type equations, stochastic 2D Navier-Stokes equations, stochastic equations of non-Newtonian fluids, etc. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 23 pages

arXiv:2401.09017 [pdf, ps, other]

Invertibility of local geodesic transverse and mixed ray transforms I: basic cases

Authors: Gunther Uhlmann, Jian Zhai

Abstract: Consider a compact Riemannian manifold in dimension $n\geq 3$ with strictly convex boundary. We show that the transverse ray transform of $1$ tensors and the mixed ray transform of $1+1$ tensors are invertible, up to natural obstructions, near a boundary point. When the manifold admits a strictly convex function, this local invertibility result leads to a global result by a layer stripping argumen… ▽ More Consider a compact Riemannian manifold in dimension $n\geq 3$ with strictly convex boundary. We show that the transverse ray transform of $1$ tensors and the mixed ray transform of $1+1$ tensors are invertible, up to natural obstructions, near a boundary point. When the manifold admits a strictly convex function, this local invertibility result leads to a global result by a layer stripping argument. △ Less

Submitted 17 January, 2024; originally announced January 2024.

MSC Class: 53C22; 53C65

arXiv:2401.00751 [pdf, other]

Machine Translation Testing via Syntactic Tree Pruning

Authors: Quanjun Zhang, Juan Zhai, Chunrong Fang, Jiawei Liu, Weisong Sun, Haichuan Hu, Qingyu Wang

Abstract: Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intrac… ▽ More Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structure and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP can accurately find 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them cannot be found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective to detect translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: Accepted to ACM Transactions on Software Engineering and Methodology 2024 (TOSEM'24)

arXiv:2401.00379 [pdf, other]

DREAM: Debugging and Repairing AutoML Pipelines

Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen

Abstract: Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model architecture and hyperparameters for a given task. Like other software systems, existing AutoML systems suffer from bugs. We identify two common and severe bugs in Au… ▽ More Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model architecture and hyperparameters for a given task. Like other software systems, existing AutoML systems suffer from bugs. We identify two common and severe bugs in AutoML, performance bug (i.e., searching for the desired model takes an unreasonably long time) and ineffective search bug (i.e., AutoML systems are not able to find an accurate enough model). After analyzing the workflow of AutoML, we observe that existing AutoML systems overlook potential opportunities in search space, search method, and search feedback, which results in performance and ineffective search bugs. Based on our analysis, we design and implement DREAM, an automatic debugging and repairing system for AutoML systems. It monitors the process of AutoML to collect detailed feedback and automatically repairs bugs by expanding search space and leveraging a feedback-driven search strategy. Our evaluation results show that DREAM can effectively and efficiently repair AutoML bugs. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: 12 pages, 10 figures

arXiv:2312.12722 [pdf, other]

Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class Incremental Learning

Authors: Jiang-Tian Zhai, Xialei Liu, Lu Yu, Ming-Ming Cheng

Abstract: Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past. This strict restriction enlarges the difficulty of alleviating catastrophic forgetting since all techniques can only be applied to current task data. Considering this challenge, we propose a novel framework of fine-grained knowledge selection and restoration. The conv… ▽ More Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past. This strict restriction enlarges the difficulty of alleviating catastrophic forgetting since all techniques can only be applied to current task data. Considering this challenge, we propose a novel framework of fine-grained knowledge selection and restoration. The conventional knowledge distillation-based methods place too strict constraints on the network parameters and features to prevent forgetting, which limits the training of new tasks. To loose this constraint, we proposed a novel fine-grained selective patch-level distillation to adaptively balance plasticity and stability. Some task-agnostic patches can be used to preserve the decision boundary of the old task. While some patches containing the important foreground are favorable for learning the new task. Moreover, we employ a task-agnostic mechanism to generate more realistic prototypes of old tasks with the current task sample for reducing classifier bias for fine-grained knowledge restoration. Extensive experiments on CIFAR100, TinyImageNet and ImageNet-Subset demonstrate the effectiveness of our method. Code is available at https://github.com/scok30/vit-cil. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: to appear at AAAI 2024

arXiv:2312.05578 [pdf]

doi 10.1021/acs.nanolett.3c03948

Large nonlinear Hall effect and Berry curvature in KTaO3 based two-dimensional electron gas

Authors: Jinfeng Zhai, Mattia Trama, Hao Liu, Zhifei Zhu, Yinyan Zhu, Carmine Antonio Perroni, Roberta Citro, Pan He, Jian Shen

Abstract: The two-dimensional electron gas (2DEG) at oxide interfaces exhibits various exotic properties stemming from interfacial inversion symmetry breaking. In this work, we report the emergence of large nonlinear Hall effects (NHE) in the LaAlO3/KTaO3(111) interface 2DEG under zero magnetic field. Skew scattering was identified as the dominant origin based on the cubic scaling of nonlinear Hall conducti… ▽ More The two-dimensional electron gas (2DEG) at oxide interfaces exhibits various exotic properties stemming from interfacial inversion symmetry breaking. In this work, we report the emergence of large nonlinear Hall effects (NHE) in the LaAlO3/KTaO3(111) interface 2DEG under zero magnetic field. Skew scattering was identified as the dominant origin based on the cubic scaling of nonlinear Hall conductivity with longitudinal conductivity and the threefold symmetry. Moreover, a gate-tunable NHE with pronounced peak and dip was observed and reproduced by our theoretical calculation. These results indicate the presence of Berry curvature hotspots and thus a large Berry curvature triple at the oxide interface. Our theoretical calculations confirm the existence of large Berry curvatures from the avoided crossing of multiple 5d-orbit bands, orders of magnitude larger than that in transition-metal dichalcogenides. NHE offers a new pathway to probe the Berry curvature at oxide interfaces, and facilitates new applications in oxide nonlinear electronics. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Journal ref: Nano Letters 2023

arXiv:2312.02441 [pdf, other]

MedDM:LLM-executable clinical guidance tree for clinical decision-making

Authors: Binbin Li, Tianxin Meng, Xiaoming Shi, Jie Zhai, Tong Ruan

Abstract: It is becoming increasingly emphasis on the importance of LLM participating in clinical diagnosis decision-making. However, the low specialization refers to that current medical LLMs can not provide specific medical advice, which are more like a medical Q\&A. And there is no suitable clinical guidance tree data set that can be used directly with LLM. To address this issue, we first propose LLM-exe… ▽ More It is becoming increasingly emphasis on the importance of LLM participating in clinical diagnosis decision-making. However, the low specialization refers to that current medical LLMs can not provide specific medical advice, which are more like a medical Q\&A. And there is no suitable clinical guidance tree data set that can be used directly with LLM. To address this issue, we first propose LLM-executavle clinical guidance tree(CGT), which can be directly used by large language models, and construct medical diagnostic decision-making dataset (MedDM), from flowcharts in clinical practice guidelines. We propose an approach to screen flowcharts from medical literature, followed by their identification and conversion into standardized diagnostic decision trees. Constructed a knowledge base with 1202 decision trees, which came from 5000 medical literature and covered 12 hospital departments, including internal medicine, surgery, psychiatry, and over 500 diseases.Moreover, we propose a method for reasoning on LLM-executable CGT and a Patient-LLM multi-turn dialogue framework. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01175 [pdf]

High Q and high gradient performance of the first medium-temperature baking 1.3 GHz cryomodule

Authors: Jiyuan Zhai, Weimin Pan, Feisi He, Rui Ge, Zhenghui Mi, Peng Sha, Song Jin, Ruixiong Han, Qunyao Wang, Haiying Lin, Guangwei Wang, Mei Li, Minjing Sang, Liangrui Sun, Rui Ye, Tongxian Zhao, Shaopeng Li, Keyu Zhu, Baiqi Liu, Xiaolong Wang, Xiangchen Yang, Xiaojuan Bian, Xiangzhen Zhang, Huizhou Ma, Xuwen Dai , et al. (14 additional authors not shown)

Abstract: World's first 1.3 GHz cryomodule containing eight 9-cell superconducting radio-frequency (RF) cavities treated by medium-temperature furnace baking (mid-T bake) was developed, assembled and tested at IHEP for the Dalian Advanced Light Source (DALS) and CEPC R&D. The 9-cell cavities in the cryomodule achieved an unprecedented highest average Q0 of 3.8E10 at 16 MV/m and 3.6E10 at 21 MV/m in the hori… ▽ More World's first 1.3 GHz cryomodule containing eight 9-cell superconducting radio-frequency (RF) cavities treated by medium-temperature furnace baking (mid-T bake) was developed, assembled and tested at IHEP for the Dalian Advanced Light Source (DALS) and CEPC R&D. The 9-cell cavities in the cryomodule achieved an unprecedented highest average Q0 of 3.8E10 at 16 MV/m and 3.6E10 at 21 MV/m in the horizontal test. The cryomodule can operate stably up to a total CW RF voltage greater than 191 MV, with an average cavity CW accelerating gradient of more than 23 MV/m. The results significantly exceed the specifications of CEPC, DALS and the other high repetition rate free electron laser facilities (LCLS-II, LCLS-II-HE, SHINE, S3FEL). There is evidence that the mid-T bake cavity may not require fast cool-down or long processing time in the cryomodule. This paper reviews the cryomodule performance and discusses some important issues in cryomodule assembly and testing. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 5 pages, 6 figures

arXiv:2312.00324 [pdf, other]

Machine Learning for Actionable Warning Identification: A Comprehensive Survey

Authors: Xiuting Ge, Chunrong Fang, Xuanye Li, Weisong Sun, Daoyuan Wu, Juan Zhai, Shangwei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen

Abstract: Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior… ▽ More Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers/practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 50 primary studies from 2000/01/01 to 2023/09/01. Then, we outline the typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI). △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.17822 [pdf, other]

Anomalous Behavior Detection in Trajectory Data of Older Drivers

Authors: Seyedeh Gol Ara Ghoreishi, Sonia Moshfeghi, Muhammad Tanveer Jan, Joshua Conniff, KwangSoo Yang, Jinwoo Jang, Borko Furht, Ruth Tappen, David Newman, Monica Rosselli, Jiannan Zhai

Abstract: Given a road network and a set of trajectory data, the anomalous behavior detection (ABD) problem is to identify drivers that show significant directional deviations, hardbrakings, and accelerations in their trips. The ABD problem is important in many societal applications, including Mild Cognitive Impairment (MCI) detection and safe route recommendations for older drivers. The ABD problem is comp… ▽ More Given a road network and a set of trajectory data, the anomalous behavior detection (ABD) problem is to identify drivers that show significant directional deviations, hardbrakings, and accelerations in their trips. The ABD problem is important in many societal applications, including Mild Cognitive Impairment (MCI) detection and safe route recommendations for older drivers. The ABD problem is computationally challenging due to the large size of temporally-detailed trajectories dataset. In this paper, we propose an Edge-Attributed Matrix that can represent the key properties of temporally-detailed trajectory datasets and identify abnormal driving behaviors. Experiments using real-world datasets demonstrated that our approach identifies abnormal driving behaviors. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: IEEE HONET 2023

arXiv:2311.09264 [pdf, other]

Cross-domain feature disentanglement for interpretable modeling of tumor microenvironment impact on drug response

Authors: Jia Zhai, Hui Liu

Abstract: High-throughput screening technology has facilitated the generation of large-scale drug responses across hundreds of cancer cell lines. However, there exists significant discrepancy between in vitro cell lines and actual tumors in vivo in terms of their response to drug treatments, because of tumors comprise of complex cellular compositions and histopathology structure, known as tumor microenviron… ▽ More High-throughput screening technology has facilitated the generation of large-scale drug responses across hundreds of cancer cell lines. However, there exists significant discrepancy between in vitro cell lines and actual tumors in vivo in terms of their response to drug treatments, because of tumors comprise of complex cellular compositions and histopathology structure, known as tumor microenvironment (TME), which greatly influences the drug cytotoxicity against tumor cells. To date, no study has focused on modeling the impact of the TME on clinical drug response. This paper proposed a domain adaptation network for feature disentanglement to separate representations of cancer cells and TME of a tumor in patients. Two denoising autoencoders were separately used to extract features from cell lines (source domain) and tumors (target domain) for partial domain alignment and feature decoupling. The specific encoder was enforced to extract information only about TME. Moreover, to ensure generalizability to novel drugs, we applied a graph attention network to learn the latent representation of drugs, allowing us to linearly model the drug perturbation on cellular state in latent space. We calibrated our model on a benchmark dataset and demonstrated its superior performance in predicting clinical drug response and dissecting the influence of the TME on drug efficacy. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.08879 [pdf, other]

A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair

Authors: Quanjun Zhang, Tongke Zhang, Juan Zhai, Chunrong Fang, Bowen Yu, Weisong Sun, Zhenyu Chen

Abstract: Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion. For example, ChatGPT, the latest black-box LLM, has been investigated by numerous recent research studies and has shown impressive performance in various tasks… ▽ More Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion. For example, ChatGPT, the latest black-box LLM, has been investigated by numerous recent research studies and has shown impressive performance in various tasks. However, there exists a potential risk of data leakage since these LLMs are usually close-sourced with unknown specific training details, e.g., pre-training datasets. In this paper, we seek to review the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives. We first introduce {\benchmark}, a new benchmark with buggy and the corresponding fixed programs from competitive programming problems starting from 2023, after the training cutoff point of ChatGPT. The results on {\benchmark} show that ChatGPT is able to fix 109 out of 151 buggy programs using the basic prompt within 35 independent rounds, outperforming state-of-the-art LLMs CodeT5 and PLBART by 27.5\% and 62.4\% prediction accuracy. We also investigate the impact of three types of prompts, i.e., problem description, error feedback, and bug localization, leading to additional 34 fixed bugs. Besides, we provide additional discussion from the interactive nature of ChatGPT to illustrate the capacity of a dialog-based repair workflow with 9 additional fixed bugs. Inspired by the findings, we further pinpoint various challenges and opportunities for advanced SE study equipped with such LLMs (e.g.,~ChatGPT) in the near future. More importantly, our work calls for more research on the reevaluation of the achievements obtained by existing black-box LLMs across various SE tasks, not limited to ChatGPT on APR. △ Less

Submitted 17 April, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: add EvalGPTFix URL

arXiv:2309.06645 [pdf, other]

Bregman Graph Neural Network

Authors: Jiayu Zhai, Lequan Lin, Dai Shi, Junbin Gao

Abstract: Numerous recent research on graph neural networks (GNNs) has focused on formulating GNN architectures as an optimization problem with the smoothness assumption. However, in node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes, leading to adverse effects such as over-smoothing and misclassification. In this… ▽ More Numerous recent research on graph neural networks (GNNs) has focused on formulating GNN architectures as an optimization problem with the smoothness assumption. However, in node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes, leading to adverse effects such as over-smoothing and misclassification. In this paper, we propose a novel bilevel optimization framework for GNNs inspired by the notion of Bregman distance. We demonstrate that the GNN layer proposed accordingly can effectively mitigate the over-smoothing issue by introducing a mechanism reminiscent of the "skip connection". We validate our theoretical results through comprehensive empirical studies in which Bregman-enhanced GNNs outperform their original counterparts in both homophilic and heterophilic graphs. Furthermore, our experiments also show that Bregman GNNs can produce more robust learning accuracy even when the number of layers is high, suggesting the effectiveness of the proposed method in alleviating the over-smoothing issue. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.13282 [pdf, other]

Advancing Distributed AC Optimal Power Flow for Integrated Transmission-Distribution Systems

Authors: Xinliang Dai, Junyi Zhai, Yuning Jiang, Yi Guo, Colin N. Jones, Veit Hagenmeyer

Abstract: This paper introduces a distributed operational solution for coordinating integrated transmission-distribution (ITD) systems regarding data privacy. To tackle the nonconvex challenges of AC optimal power flow (OPF) problems, our research proposes an enhanced version of the Augmented Lagrangian based Alternating Direction Inexact Newton method (ALADIN). This proposed framework incorporates a second… ▽ More This paper introduces a distributed operational solution for coordinating integrated transmission-distribution (ITD) systems regarding data privacy. To tackle the nonconvex challenges of AC optimal power flow (OPF) problems, our research proposes an enhanced version of the Augmented Lagrangian based Alternating Direction Inexact Newton method (ALADIN). This proposed framework incorporates a second-order correction strategy and convexification, thereby enhancing numerical robustness and computational efficiency. The theoretical studies demonstrate that the proposed distributed algorithm operates the ITD systems with a local quadratic convergence guarantee. Extensive simulations on various ITD configurations highlight the superior performance of our distributed approach in terms of convergence speed, computational efficiency, scalability, and adaptability. △ Less

Submitted 30 January, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.12510 [pdf, other]

Masked Autoencoders are Efficient Class Incremental Learners

Authors: Jiang-Tian Zhai, Xialei Liu, Andrew D. Bagdanov, Ke Li, Ming-Ming Cheng

Abstract: Class Incremental Learning (CIL) aims to sequentially learn new classes while avoiding catastrophic forgetting of previous knowledge. We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were originally designed to learn useful representations through reconstructive unsupervised learning, and they can be easily integrated with a supervised loss for classification. Moreo… ▽ More Class Incremental Learning (CIL) aims to sequentially learn new classes while avoiding catastrophic forgetting of previous knowledge. We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were originally designed to learn useful representations through reconstructive unsupervised learning, and they can be easily integrated with a supervised loss for classification. Moreover, MAEs can reliably reconstruct original input images from randomly selected patches, which we use to store exemplars from past tasks more efficiently for CIL. We also propose a bilateral MAE framework to learn from image-level and embedding-level fusion, which produces better-quality reconstructed images and more stable representations. Our experiments confirm that our approach performs better than the state-of-the-art on CIFAR-100, ImageNet-Subset, and ImageNet-Full. The code is available at https://github.com/scok30/MAE-CIL . △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted at ICCV 2023

arXiv:2308.12475 [pdf, ps, other]

Determination of the density in a nonlinear elastic wave equation

Authors: Gunther Uhlmann, Jian Zhai

Abstract: This is a continuation of our study [Uhlmann-Zhai, JMPA, 2021] on an inverse boundary value problem for a nonlinear elastic wave equation. We prove that all the linear and nonlinear coefficients can be recovered from the displacement-to-traction map, including the density, under some natural geometric conditions on the wavespeeds. This is a continuation of our study [Uhlmann-Zhai, JMPA, 2021] on an inverse boundary value problem for a nonlinear elastic wave equation. We prove that all the linear and nonlinear coefficients can be recovered from the displacement-to-traction map, including the density, under some natural geometric conditions on the wavespeeds. △ Less

Submitted 24 January, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.08459 [pdf, other]

Knowledge Prompt-tuning for Sequential Recommendation

Authors: Jianyang Zhai, Xiawu Zheng, Chang-Dong Wang, Hui Li, Yonghong Tian

Abstract: Pre-trained language models (PLMs) have demonstrated strong performance in sequential recommendation (SR), which are utilized to extract general knowledge. However, existing methods still lack domain knowledge and struggle to capture users' fine-grained preferences. Meanwhile, many traditional SR methods improve this issue by integrating side information while suffering from information loss. To s… ▽ More Pre-trained language models (PLMs) have demonstrated strong performance in sequential recommendation (SR), which are utilized to extract general knowledge. However, existing methods still lack domain knowledge and struggle to capture users' fine-grained preferences. Meanwhile, many traditional SR methods improve this issue by integrating side information while suffering from information loss. To summarize, we believe that a good recommendation system should utilize both general and domain knowledge simultaneously. Therefore, we introduce an external knowledge base and propose Knowledge Prompt-tuning for Sequential Recommendation (\textbf{KP4SR}). Specifically, we construct a set of relationship templates and transform a structured knowledge graph (KG) into knowledge prompts to solve the problem of the semantic gap. However, knowledge prompts disrupt the original data structure and introduce a significant amount of noise. We further construct a knowledge tree and propose a knowledge tree mask, which restores the data structure in a mask matrix form, thus mitigating the noise problem. We evaluate KP4SR on three real-world datasets, and experimental results show that our approach outperforms state-of-the-art methods on multiple evaluation metrics. Specifically, compared with PLM-based methods, our method improves NDCG@5 and HR@5 by \textcolor{red}{40.65\%} and \textcolor{red}{36.42\%} on the books dataset, \textcolor{red}{11.17\%} and \textcolor{red}{11.47\%} on the music dataset, and \textcolor{red}{22.17\%} and \textcolor{red}{19.14\%} on the movies dataset, respectively. Our code is publicly available at the link: \href{https://github.com/zhaijianyang/KP4SR}{\textcolor{blue}{https://github.com/zhaijianyang/KP4SR}.} △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.14554 [pdf, ps, other]

Large deviation principle for stochastic reaction-diffusion equations with super-linear drift on $\mathbb{R}$ driven by space-time white noise

Authors: Yue Li, Shijie Shang, Jianliang Zhai

Abstract: In this paper, we consider stochastic reaction-diffusion equations with super-linear drift on the real line $\mathbb{R}$ driven by space-time white noise. A Freidlin-Wentzell large deviation principle is established by a modified weak convergence method on the space $C([0,T], C_{tem}(\mathbb{R}))$. Obtaining the main result in this paper is challenging due to the setting of unbounded domain, the s… ▽ More In this paper, we consider stochastic reaction-diffusion equations with super-linear drift on the real line $\mathbb{R}$ driven by space-time white noise. A Freidlin-Wentzell large deviation principle is established by a modified weak convergence method on the space $C([0,T], C_{tem}(\mathbb{R}))$. Obtaining the main result in this paper is challenging due to the setting of unbounded domain, the space-time white noise, and the superlinear drift term without dissipation. To overcome these difficulties, the special designed norm on $C([0,T], C_{tem}(\mathbb{R}))$, one order moment estimates of the stochastic convolution and two nonlinear Gronwall-type inequalities play an important role. △ Less

Submitted 26 July, 2023; originally announced July 2023.

MSC Class: 60H15; 60F10

arXiv:2307.04995 [pdf, other]

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Authors: Zixuan Ma, Haojie Wang, Jingze Xing, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Shizhi Tang, Penghan Wang, Jidong Zhai

Abstract: Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is i… ▽ More Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: 12 pages, 14 figures

arXiv:2306.10211 [pdf, ps, other]

Increasing stability estimates for the inverse potential scattering problems

Authors: Jian Zhai, Yue Zhao

Abstract: This paper is mainly concerned with the inverse scattering problem of determining the unknown potential for the classical Schrödinger equation in two and three dimensions. We establish the increasing stability of the inverse scattering problem from either multi-frequency near-field data or multi-frequency far-field pattern. The stability estimate consists of the Lipschitz type data discrepancy and… ▽ More This paper is mainly concerned with the inverse scattering problem of determining the unknown potential for the classical Schrödinger equation in two and three dimensions. We establish the increasing stability of the inverse scattering problem from either multi-frequency near-field data or multi-frequency far-field pattern. The stability estimate consists of the Lipschitz type data discrepancy and the logarithmic high frequency tail of the potential function, where the latter decreases as the upper bound of the frequency increases. A novel method is proposed for the proof, which is based on choosing appropriate incident plane waves and an application of the quantitative analytic continuation. A key ingredient in the analysis is employing scattering theory to obtain an analytic region and resolvent estimates in this region for the resolvent in two and three dimensions. We further apply this method to study the inverse scattering problem of determining both the magnetic potential and electric potential for the three-dimensional magnetic Schrödinger equation. △ Less

Submitted 16 June, 2023; originally announced June 2023.

MSC Class: 35R30; 78A46

arXiv:2306.04039 [pdf, other]

doi 10.1145/3580305.3599897

Revisiting Neural Retrieval on Accelerators

Authors: Jiaqi Zhai, Zhaojie Gong, Yueming Wang, Xiao Sun, Zheng Yan, Fu Li, Xing Liu

Abstract: Retrieval finds a small number of relevant candidates from a large corpus for information retrieval and recommendation applications. A key component of retrieval is to model (user, item) similarity, which is commonly represented as the dot product of two learned embeddings. This formulation permits efficient inference, commonly known as Maximum Inner Product Search (MIPS). Despite its popularity,… ▽ More Retrieval finds a small number of relevant candidates from a large corpus for information retrieval and recommendation applications. A key component of retrieval is to model (user, item) similarity, which is commonly represented as the dot product of two learned embeddings. This formulation permits efficient inference, commonly known as Maximum Inner Product Search (MIPS). Despite its popularity, dot products cannot capture complex user-item interactions, which are multifaceted and likely high rank. We hence examine non-dot-product retrieval settings on accelerators, and propose \textit{mixture of logits} (MoL), which models (user, item) similarity as an adaptive composition of elementary similarity functions. This new formulation is expressive, capable of modeling high rank (user, item) interactions, and further generalizes to the long tail. When combined with a hierarchical retrieval strategy, \textit{h-indexer}, we are able to scale up MoL to 100M corpus on a single GPU with latency comparable to MIPS baselines. On public datasets, our approach leads to uplifts of up to 77.3\% in hit rate (HR). Experiments on a large recommendation surface at Meta showed strong metric gains and reduced popularity bias, validating the proposed approach's performance and improved generalization. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: To appear in the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023)

arXiv:2305.18702 [pdf, other]

Adversarial Adaptive Sampling: Unify PINN and Optimal Transport for the Approximation of PDEs

Authors: Kejun Tang, Jiayu Zhai, Xiaoliang Wan, Chao Yang

Abstract: Solving partial differential equations (PDEs) is a central task in scientific computing. Recently, neural network approximation of PDEs has received increasing attention due to its flexible meshless discretization and its potential for high-dimensional problems. One fundamental numerical difficulty is that random samples in the training set introduce statistical errors into the discretization of l… ▽ More Solving partial differential equations (PDEs) is a central task in scientific computing. Recently, neural network approximation of PDEs has received increasing attention due to its flexible meshless discretization and its potential for high-dimensional problems. One fundamental numerical difficulty is that random samples in the training set introduce statistical errors into the discretization of loss functional which may become the dominant error in the final approximation, and therefore overshadow the modeling capability of the neural network. In this work, we propose a new minmax formulation to optimize simultaneously the approximate solution, given by a neural network model, and the random samples in the training set, provided by a deep generative model. The key idea is to use a deep generative model to adjust random samples in the training set such that the residual induced by the approximate PDE solution can maintain a smooth profile when it is being minimized. Such an idea is achieved by implicitly embedding the Wasserstein distance between the residual-induced distribution and the uniform distribution into the loss, which is then minimized together with the residual. A nearly uniform residual profile means that its variance is small for any normalized weight function such that the Monte Carlo approximation error of the loss functional is reduced significantly for a certain sample size. The adversarial adaptive sampling (AAS) approach proposed in this work is the first attempt to formulate two essential components, minimizing the residual and seeking the optimal training set, into one minmax objective functional for the neural network approximation of PDEs. △ Less

Submitted 14 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: ICLR, 2024

arXiv:2305.10430 [pdf, other]

Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

Authors: Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, Jingdong Wang

Abstract: Modern autonomous driving systems are typically divided into three main tasks: perception, prediction, and planning. The planning task involves predicting the trajectory of the ego vehicle based on inputs from both internal intention and the external environment, and manipulating the vehicle accordingly. Most existing works evaluate their performance on the nuScenes dataset using the L2 error and… ▽ More Modern autonomous driving systems are typically divided into three main tasks: perception, prediction, and planning. The planning task involves predicting the trajectory of the ego vehicle based on inputs from both internal intention and the external environment, and manipulating the vehicle accordingly. Most existing works evaluate their performance on the nuScenes dataset using the L2 error and collision rate between the predicted trajectories and the ground truth. In this paper, we reevaluate these existing evaluation metrics and explore whether they accurately measure the superiority of different methods. Specifically, we design an MLP-based method that takes raw sensor data (e.g., past trajectory, velocity, etc.) as input and directly outputs the future trajectory of the ego vehicle, without using any perception or prediction information such as camera images or LiDAR. Our simple method achieves similar end-to-end planning performance on the nuScenes dataset with other perception-based methods, reducing the average L2 error by about 20%. Meanwhile, the perception-based methods have an advantage in terms of collision rate. We further conduct in-depth analysis and provide new insights into the factors that are critical for the success of the planning task on nuScenes dataset. Our observation also indicates that we need to rethink the current open-loop evaluation scheme of end-to-end autonomous driving in nuScenes. Codes are available at https://github.com/E2E-AD/AD-MLP. △ Less

Submitted 21 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Technical report. Code is available

arXiv:2305.07220 [pdf, other]

Physical-layer Adversarial Robustness for Deep Learning-based Semantic Communications

Authors: Guoshun Nan, Zhichun Li, Jinli Zhai, Qimei Cui, Gong Chen, Xin Du, Xuefei Zhang, Xiaofeng Tao, Zhu Han, Tony Q. S. Quek

Abstract: End-to-end semantic communications (ESC) rely on deep neural networks (DNN) to boost communication efficiency by only transmitting the semantics of data, showing great potential for high-demand mobile applications. We argue that central to the success of ESC is the robust interpretation of conveyed semantics at the receiver side, especially for security-critical applications such as automatic driv… ▽ More End-to-end semantic communications (ESC) rely on deep neural networks (DNN) to boost communication efficiency by only transmitting the semantics of data, showing great potential for high-demand mobile applications. We argue that central to the success of ESC is the robust interpretation of conveyed semantics at the receiver side, especially for security-critical applications such as automatic driving and smart healthcare. However, robustifying semantic interpretation is challenging as ESC is extremely vulnerable to physical-layer adversarial attacks due to the openness of wireless channels and the fragileness of neural models. Toward ESC robustness in practice, we ask the following two questions: Q1: For attacks, is it possible to generate semantic-oriented physical-layer adversarial attacks that are imperceptible, input-agnostic and controllable? Q2: Can we develop a defense strategy against such semantic distortions and previously proposed adversaries? To this end, we first present MobileSC, a novel semantic communication framework that considers the computation and memory efficiency in wireless environments. Equipped with this framework, we propose SemAdv, a physical-layer adversarial perturbation generator that aims to craft semantic adversaries over the air with the abovementioned criteria, thus answering the Q1. To better characterize the realworld effects for robust training and evaluation, we further introduce a novel adversarial training method SemMixed to harden the ESC against SemAdv attacks and existing strong threats, thus answering the Q2. Extensive experiments on three public benchmarks verify the effectiveness of our proposed methods against various physical adversarial attacks. We also show some interesting findings, e.g., our MobileSC can even be more robust than classical block-wise communication systems in the low SNR regime. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: 17 pages, 28 figures, accepted by IEEE jsac

arXiv:2305.05234 [pdf, ps, other]

Large deviation principles for stochastic nonlinear Schrodinger equations driven by Levy noise

Authors: Jiahui Zhu, Wei Liu, Jianliang Zhai

Abstract: In this work we establish a Freidlin-Wentzell type large deviation principle for stochastic nonlinear Schrödinger equation, with either focusing or defocusing nonlinearity, driven by nonlinear multiplicative Lévy noise in the Marcus canonical form. This task is challenging in the current setting due to the presence of the power-type nonlinear term, the lack of regularization effect of the Schrödin… ▽ More In this work we establish a Freidlin-Wentzell type large deviation principle for stochastic nonlinear Schrödinger equation, with either focusing or defocusing nonlinearity, driven by nonlinear multiplicative Lévy noise in the Marcus canonical form. This task is challenging in the current setting due to the presence of the power-type nonlinear term, the lack of regularization effect of the Schrödinger operator and the absence of compactness of embeddings. To overcome these difficulties, we employ a regularization procedure based on Yosida approximations and implement techniques such as time discretization, cut-off arguments, and relative entropy estimates of sequences of probability measures. Our innovative approach circumvents the need for compactness conditions, distinguishing our work from previous studies. △ Less

Submitted 16 August, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Showing 1–50 of 181 results for author: Zhai, J