Search | arXiv e-print repository

LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models

Authors: Jingyi Wang, Jianzhong Ju, Jian Luan, Zhidong Deng

Abstract: Recent advances in large vision-language models (VLMs) typically employ vision encoders based on the Vision Transformer (ViT) architecture. The division of the images into patches by ViT results in a fragmented perception, thereby hindering the visual understanding capabilities of VLMs. In this paper, we propose an innovative enhancement to address this limitation by introducing a Scene Graph Expr… ▽ More Recent advances in large vision-language models (VLMs) typically employ vision encoders based on the Vision Transformer (ViT) architecture. The division of the images into patches by ViT results in a fragmented perception, thereby hindering the visual understanding capabilities of VLMs. In this paper, we propose an innovative enhancement to address this limitation by introducing a Scene Graph Expression (SGE) module in VLMs. This module extracts and structurally expresses the complex semantic information within images, thereby improving the foundational perception and understanding abilities of VLMs. Extensive experiments demonstrate that integrating our SGE module significantly enhances the VLM's performance in vision-language tasks, indicating its effectiveness in preserving intricate semantic details and facilitating better visual understanding. △ Less

Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.13459 [pdf, other]

Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model

Authors: Chen Rao, Guangyuan Li, Zehua Lan, Jiakai Sun, Junsheng Luan, Wei Xing, Lei Zhao, Huaizhong Lin, Jianfeng Dong, Dalong Zhang

Abstract: Current video deblurring methods have limitations in recovering high-frequency information since the regression losses are conservative with high-frequency details. Since Diffusion Models (DMs) have strong capabilities in generating high-frequency details, we consider introducing DMs into the video deblurring task. However, we found that directly applying DMs to the video deblurring task has the f… ▽ More Current video deblurring methods have limitations in recovering high-frequency information since the regression losses are conservative with high-frequency details. Since Diffusion Models (DMs) have strong capabilities in generating high-frequency details, we consider introducing DMs into the video deblurring task. However, we found that directly applying DMs to the video deblurring task has the following problems: (1) DMs require many iteration steps to generate videos from Gaussian noise, which consumes many computational resources. (2) DMs are easily misled by the blurry artifacts in the video, resulting in irrational content and distortion of the deblurred video. To address the above issues, we propose a novel video deblurring framework VD-Diff that integrates the diffusion model into the Wavelet-Aware Dynamic Transformer (WADT). Specifically, we perform the diffusion model in a highly compact latent space to generate prior features containing high-frequency information that conforms to the ground truth distribution. We design the WADT to preserve and recover the low-frequency information in the video while utilizing the high-frequency information generated by the diffusion model. Extensive experiments show that our proposed VD-Diff outperforms SOTA methods on GoPro, DVD, BSD, and Real-World Video datasets. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: accepted by ECCV2024

ACM Class: I.4.4

arXiv:2407.05690 [pdf, other]

Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

Authors: Bowen Shen, Zheng Lin, Daren Zha, Wei Liu, Jian Luan, Bin Wang, Weiping Wang

Abstract: Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a… ▽ More Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a high compression ratio for scaled-up LLMs remains a challenge. In this paper, we introduce a task-agnostic structured pruning approach coupled with a compact Transformer architecture design. The proposed approach, named TransAct, reduces transitional activations inside multi-head attention (MHA) and multi-layer perceptron (MLP) modules, while preserving the inter-module activations that are sensitive to perturbations. Hence, the LLM is pruned into an intra-module low-rank architecture, significantly reducing weights, KV Cache and attention computation. TransAct is implemented on the LLaMA model and evaluated on downstream benchmarks. Results verify the optimality of our approach at high compression with respect to both efficiency and performance. Further, ablation studies reveal the strength of activation-guided iterative pruning and provide experimental analysis on the redundancy of MHA and MLP modules. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Findings of ACL 2024

arXiv:2407.00993 [pdf, other]

Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions within a singular application lack adequacy for assessing the multi-dimensional reasoning and decision-making capacities of LLM mobile agents. (3) Current evaluation metrics are insufficient to accurately assess the process of sequential actions. To this end, we propose Mobile-Bench, a novel benchmark for evaluating the capabilities of LLM-based mobile agents. First, we expand conventional UI operations by incorporating 103 collected APIs to accelerate the efficiency of task completion. Subsequently, we collect evaluation data by combining real user queries with augmentation from LLMs. To better evaluate different levels of planning capabilities for mobile agents, our data is categorized into three distinct groups: SAST, SAMT, and MAMT, reflecting varying levels of task complexity. Mobile-Bench comprises 832 data entries, with more than 200 tasks specifically designed to evaluate multi-APP collaboration scenarios. Furthermore, we introduce a more accurate evaluation metric, named CheckPoint, to assess whether LLM-based mobile agents reach essential points during their planning and reasoning steps. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.06571 [pdf, other]

SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

Authors: Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang

Abstract: While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The sub… ▽ More While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The subsampling modules are responsible for shortening the sequence, while the upsampling modules restore the sequence length, and the bypass modules enhance convergence. In comparison to LLaMA, the proposed SUBLLM exhibits significant enhancements in both training and inference speeds as well as memory usage, while maintaining competitive few-shot performance. During training, SUBLLM increases speeds by 26% and cuts memory by 10GB per GPU. In inference, it boosts speeds by up to 37% and reduces memory by 1GB per GPU. The training and inference speeds can be enhanced by 34% and 52% respectively when the context window is expanded to 8192. Our code is available at https://github.com/XiaoMi/subllm. △ Less

Submitted 23 August, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figures, accepted by ECAI 2024

ACM Class: I.2.7

arXiv:2406.05676 [pdf]

Chern insulator phase realized in dual-gate-tuned MnBi2Te4 thin films grown by molecular beam epitaxy

Authors: Yunhe Bai, Yuanzhao Li, Ruixuan Liu, Jianli Luan, Yang Chen, Wenyu Song, Peng-Fei Ji, Cui Ding, Zongwei Gao, Qinghua Zhang, Fanqi Meng, Bingbing Tong, Lin Li, Tianchen Zhu, Lin Gu, Lili Wang, Jinsong Zhang, Yayu Wang, Qi-Kun Xue, Ke He, Yang Feng, Xiao Feng

Abstract: The intrinsic magnetic order, large topological-magnetic gap and rich topological phases make MnBi2Te4 a wonderful platform to study exotic topological quantum states such as axion insulator and Chern insulator. To realize and manipulate these topological phases in a MnBi2Te4 thin film, precise manipulation of the electric field across the film is essential, which requires a dual-gate structure. I… ▽ More The intrinsic magnetic order, large topological-magnetic gap and rich topological phases make MnBi2Te4 a wonderful platform to study exotic topological quantum states such as axion insulator and Chern insulator. To realize and manipulate these topological phases in a MnBi2Te4 thin film, precise manipulation of the electric field across the film is essential, which requires a dual-gate structure. In this work, we achieve dual-gate tuning of MnBi2Te4 thin films grown with molecular beam epitaxy on SrTiO3(111) substrates by applying the substrate and an AlOx layer as the gate dielectrics of bottom and top gates, respectively. Under magnetic field of 9T and temperature of 20 mK, the Hall and longitudinal resistivities of the films show inversed gate-voltage dependence, for both top- and bottom-gates, signifying the existence of the dissipationless edge state contributed by Chern insulator phase in the ferromagnetic configuration. The maximum of the Hall resistivity only reaches 0.8 h/e2, even with dual-gate tuning, probably due to the high density of bulk carriers introduced by secondary phases. In the antiferromagnetic state under zero magnetic field, the films show normal insulator behavior. The dual-gated MnBi2Te4 thin films lay the foundation for developing devices based on electrically tunable topological quantum states. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 24 pages, 4 figures

arXiv:2405.11940 [pdf, other]

On the equivalence of two spinodal decomposition criteria with a case study of Fe${}_{15}$Co${}_{15}$Ni${}_{35}$Cu${}_{35}$ multicomponent alloy

Authors: Hengwei Luan, You Wu, Jingyi Kang, Liufei Huang, J. H. Luan, Jinfeng Li, Yang Shao, Ke-fu Yao, Jian Lu

Abstract: Spinodal decomposition in multicomponent alloys has attracted increasing attention due to its beneficial effect on their mechanical and functional properties and potential applications. Both based on the Cahn-Hillard equation, the reference element method (REM) and the projection matrix method (PMM) are the two main methods to predict the occurrence of spinodal decomposition in multicomponent allo… ▽ More Spinodal decomposition in multicomponent alloys has attracted increasing attention due to its beneficial effect on their mechanical and functional properties and potential applications. Both based on the Cahn-Hillard equation, the reference element method (REM) and the projection matrix method (PMM) are the two main methods to predict the occurrence of spinodal decomposition in multicomponent alloys. In this work, it is mathematically proven that the two methods are equivalent, and therefore the advanced results based on one method can be applied to the other. Based on these methods, the $Fe{}_{15}$Co${}_{15}$Ni${}_{35}$Cu${}_{35}$ multicomponent alloy is designed as a case study. Experimental results confirm the spinodal decomposition in the heat-treated alloy, and its strength and ductility are simultaneously enhanced. This work can be the pavement for further theoretical and experimental studies on the spinodal decomposition in multicomponent alloys. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 27 pages, 3 figures, 1 supplementary file

arXiv:2404.11474 [pdf, other]

Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt

Authors: Zhanjie Zhang, Quanwei Zhang, Huaizhong Lin, Wei Xing, Juncheng Mo, Shuaicheng Huang, Jinheng Xie, Guangyuan Li, Junsheng Luan, Lei Zhao, Dalong Zhang, Lixia Chen

Abstract: Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images. Existing generative adversarial network-based methods fail to generate highly realistic stylized images and always introduce obvious artifacts and disharmonious patterns. Recently, large-scale pre-trained diffusion models opened up a new way for generating highl… ▽ More Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images. Existing generative adversarial network-based methods fail to generate highly realistic stylized images and always introduce obvious artifacts and disharmonious patterns. Recently, large-scale pre-trained diffusion models opened up a new way for generating highly realistic artistic stylized images. However, diffusion model-based methods generally fail to preserve the content structure of input content images well, introducing some undesired content structure and style patterns. To address the above problems, we propose a novel pre-trained diffusion-based artistic style transfer method, called LSAST, which can generate highly realistic artistic stylized images while preserving the content structure of input content images well, without bringing obvious artifacts and disharmonious style patterns. Specifically, we introduce a Step-aware and Layer-aware Prompt Space, a set of learnable prompts, which can learn the style information from the collection of artworks and dynamically adjusts the input images' content structure and style pattern. To train our prompt space, we propose a novel inversion method, called Step-ware and Layer-aware Prompt Inversion, which allows the prompt space to learn the style information of the artworks collection. In addition, we inject a pre-trained conditional branch of ControlNet into our LSAST, which further improved our framework's ability to maintain content structure. Extensive experiments demonstrate that our proposed method can generate more highly realistic artistic stylized images than the state-of-the-art artistic style transfer methods. △ Less

Submitted 12 August, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI2024

arXiv:2404.09083 [pdf]

Interplay between electronic dephasing and localization in finite-sized Chern insulator

Authors: Yunhe Bai, Yuanzhao Li, Jianli Luan, Yang Chen, Zongwei Gao, Wenyu Song, Yitian Tong, Jinsong Zhang, Yayu Wang, Junjie Qi, Chui-Zhen Chen, Hua Jiang, X. C. Xie, Ke He, Yang Feng, Xiao Feng, Qi-Kun Xue

Abstract: Anderson localization is anticipated to play a pivotal role in the manifestation of the quantum anomalous Hall effect, akin to its role in conventional quantum Hall effects. The significance of Anderson localization is particularly pronounced in elucidating the reasons behind the fragility of the observed quantum anomalous Hall state in the intrinsic magnetic topological insulator MnBi2Te4 with a… ▽ More Anderson localization is anticipated to play a pivotal role in the manifestation of the quantum anomalous Hall effect, akin to its role in conventional quantum Hall effects. The significance of Anderson localization is particularly pronounced in elucidating the reasons behind the fragility of the observed quantum anomalous Hall state in the intrinsic magnetic topological insulator MnBi2Te4 with a large predicted magnetic gap. Here, employing varying sized MnBi2Te4 micro/nano-structures fabricated from a single molecular-beam-epitaxy-grown thin film, we have carried out a systematic size- and temperature-dependent study on the transport properties of the films regarding the quantum anomalous Hall states. The low-temperature transport properties of the finite-sized MnBi2Te4 samples can be quantitatively understood through Anderson localization, which plays an indispensable role in stabilizing the ground states. At higher temperatures, the failure of electron localization induced by an excessively short electronic dephasing length is identified as the cause of deviation from quantization. The work reveals that electronic dephasing and localization are non-negligible factors in designing high-temperature quantum anomalous Hall systems. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 20 pages, 4 figures

arXiv:2403.06551 [pdf, other]

ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval

Authors: Yuanhang Zheng, Peng Li, Wei Liu, Yang Liu, Jian Luan, Bin Wang

Abstract: Tool learning aims to extend the capabilities of large language models (LLMs) with external tools. A major challenge in tool learning is how to support a large number of tools, including unseen tools. To address this challenge, previous studies have proposed retrieving suitable tools for the LLM based on the user query. However, previously proposed methods do not consider the differences between s… ▽ More Tool learning aims to extend the capabilities of large language models (LLMs) with external tools. A major challenge in tool learning is how to support a large number of tools, including unseen tools. To address this challenge, previous studies have proposed retrieving suitable tools for the LLM based on the user query. However, previously proposed methods do not consider the differences between seen and unseen tools, nor do they take the hierarchy of the tool library into account, which may lead to suboptimal performance for tool retrieval. Therefore, to address the aforementioned issues, we propose ToolRerank, an adaptive and hierarchy-aware reranking method for tool retrieval to further refine the retrieval results. Specifically, our proposed ToolRerank includes Adaptive Truncation, which truncates the retrieval results related to seen and unseen tools at different positions, and Hierarchy-Aware Reranking, which makes retrieval results more concentrated for single-tool queries and more diverse for multi-tool queries. Experimental results show that ToolRerank can improve the quality of the retrieval results, leading to better execution results generated by the LLM. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: This paper is accepted for LREC-COLING 2024

Journal ref: In Proceedings of LREC-COLING 2024, pages 16263-16273

arXiv:2402.16775 [pdf, other]

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

Authors: Renren Jin, Jiangcun Du, Wuwei Huang, Wei Liu, Jian Luan, Bin Wang, Deyi Xiong

Abstract: Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques, which reduce the bits needed for model weights or activations with minimal performance loss, have become popular due to the rise of LLMs. However, most quantizatio… ▽ More Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques, which reduce the bits needed for model weights or activations with minimal performance loss, have become popular due to the rise of LLMs. However, most quantization studies use pre-trained LLMs, and the impact of quantization on instruction-tuned LLMs and the relationship between perplexity and benchmark performance of quantized LLMs are not well understood. Evaluation of quantized LLMs is often limited to language modeling and a few classification tasks, leaving their performance on other benchmarks unclear. To address these gaps, we propose a structured evaluation framework consisting of three critical dimensions: (1) knowledge \& capacity, (2) alignment, and (3) efficiency, and conduct extensive experiments across ten diverse benchmarks. Our experimental results indicate that LLMs with 4-bit quantization can retain performance comparable to their non-quantized counterparts, and perplexity can serve as a proxy metric for quantized LLMs on most benchmarks. Furthermore, quantized LLMs with larger parameter scales can outperform smaller LLMs. Despite the memory savings achieved through quantization, it can also slow down the inference speed of LLMs. Consequently, substantial engineering efforts and hardware support are imperative to achieve a balanced optimization of decoding speed and memory consumption in the context of quantized LLMs. △ Less

Submitted 6 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: ACL 2024 Findings

arXiv:2401.12544 [pdf]

Correlation between magnetic domain structures and quantum anomalous Hall effect in epitaxial MnBi2Te4 thin films

Authors: Yang Shi, Yunhe Bai, Yuanzhao Li, Yang Feng, Qiang Li, Huanyu Zhang, Yang Chen, Yitian Tong, Jianli Luan, Ruixuan Liu, Pengfei Ji, Zongwei Gao, Hangwen Guo, Jinsong Zhang, Yayu Wang, Xiao Feng, Ke He, Xiaodong Zhou, Jian Shen

Abstract: We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field sweeping indicates that the magnetic domains or the spatial… ▽ More We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field sweeping indicates that the magnetic domains or the spatial nonuniformity of magnetization originates from the strong pinning of the inherent sample inhomogeneity. A direct correlation between the Hall resistivity and the domain size has been established by analyzing a series of thin films with and without QAH. Our observation shows that one has to suppress the spatial nonuniformity of magnetization to allow the Hall resistivity to be quantized. The fact that a sizable longitudinal resistivity remains even for the QAH sample suggests a quantized Hall insulator scenario. Our work provides important insights to the understanding of the quantization mechanism and the dissipation of the QAH state in MnBi2Te4 system. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures

arXiv:2401.11450 [pdf]

Reentrant quantum anomalous Hall effect in molecular beam epitaxy-grown MnBi2Te4 thin films

Authors: Yuanzhao Li, Yunhe Bai, Yang Feng, Jianli Luan, Zongwei Gao, Yang Chen, Yitian Tong, Ruixuan Liu, Su Kong Chong, Kang L. Wang, Xiaodong Zhou, Jian Shen, Jinsong Zhang, Yayu Wang, Chui-Zhen Chen, XinCheng Xie, Xiao Feng, Ke He, Qi-Kun Xue

Abstract: In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the… ▽ More In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the fundamental role of both Berry curvature and Anderson localization. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 15 pages, 4 figures

arXiv:2401.05459 [pdf, other]

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Authors: Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu

Abstract: Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing de… ▽ More Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing devices have become ubiquitous, greatly expanding the boundaries of IPAs. However, due to the lack of capabilities such as user intent understanding, task planning, tool using, and personal data management etc., existing IPAs still have limited practicality and scalability. Recently, the emergence of foundation models, represented by large language models (LLMs), brings new opportunities for the development of IPAs. With the powerful semantic understanding and reasoning capabilities, LLM can enable intelligent agents to solve complex problems autonomously. In this paper, we focus on Personal LLM Agents, which are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. We envision that Personal LLM Agents will become a major software paradigm for end-users in the upcoming era. To realize this vision, we take the first step to discuss several important questions about Personal LLM Agents, including their architecture, capability, efficiency and security. We start by summarizing the key components and design choices in the architecture of Personal LLM Agents, followed by an in-depth analysis of the opinions collected from domain experts. Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges. △ Less

Submitted 8 May, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: https://github.com/MobileLLM/Personal_LLM_Agents_Survey

arXiv:2401.04283 [pdf, ps, other]

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

Authors: Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

Abstract: Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stan… ▽ More Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stands out by running the score model once per frame, achieving a significant surge in processing efficiency. Apart from that, we introduce a novel noise generation technique where far-end signals are utilized, incorporating both far-end and near-end signals to refine the score model's accuracy. We test our proposed method on the ICASSP2023 Microsoft deep echo cancellation challenge evaluation dataset, where our method outperforms some of the end-to-end methods and other diffusion based echo cancellation methods. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.06135 [pdf, other]

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

Authors: Zhanjie Zhang, Quanwei Zhang, Guangyuan Li, Wei Xing, Lei Zhao, Jiakai Sun, Zehua Lan, Junsheng Luan, Yiling Huang, Huaizhong Lin

Abstract: Artistic style transfer aims to repaint the content image with the learned artistic style. Existing artistic style transfer methods can be divided into two categories: small model-based approaches and pre-trained large-scale model-based approaches. Small model-based approaches can preserve the content strucuture, but fail to produce highly realistic stylized images and introduce artifacts and dish… ▽ More Artistic style transfer aims to repaint the content image with the learned artistic style. Existing artistic style transfer methods can be divided into two categories: small model-based approaches and pre-trained large-scale model-based approaches. Small model-based approaches can preserve the content strucuture, but fail to produce highly realistic stylized images and introduce artifacts and disharmonious patterns; Pre-trained large-scale model-based approaches can generate highly realistic stylized images but struggle with preserving the content structure. To address the above issues, we propose ArtBank, a novel artistic style transfer framework, to generate highly realistic stylized images while preserving the content structure of the content images. Specifically, to sufficiently dig out the knowledge embedded in pre-trained large-scale models, an Implicit Style Prompt Bank (ISPB), a set of trainable parameter matrices, is designed to learn and store knowledge from the collection of artworks and behave as a visual prompt to guide pre-trained large-scale models to generate highly realistic stylized images while preserving content structure. Besides, to accelerate training the above ISPB, we propose a novel Spatial-Statistical-based self-Attention Module (SSAM). The qualitative and quantitative experiments demonstrate the superiority of our proposed method over state-of-the-art artistic style transfer methods. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2311.15058 [pdf]

Controlled generation of Poincaré sphere beams with inverse-designed multimode meta-waveguides

Authors: Jing Luan, Shuang Zheng, Zhenyu Wan, Tiange Wu, Weijie Chang, Deming Liu, Minming Zhang

Abstract: The angular momentum of light can be described by positions on various Poincaré spheres, where different structured light beams have proven useful for numerous optical applications. However, the dynamic generation and control of arbitrary structured light on different Poincaré spheres is still handled via bulky optics in free space. Here we propose and demonstrate multimode silicon photonic integr… ▽ More The angular momentum of light can be described by positions on various Poincaré spheres, where different structured light beams have proven useful for numerous optical applications. However, the dynamic generation and control of arbitrary structured light on different Poincaré spheres is still handled via bulky optics in free space. Here we propose and demonstrate multimode silicon photonic integrated meta-waveguides to generate arbitrary structured light beams on polarization/orbit/higher-order/hybrid Poincaré spheres. The multimode meta-waveguides are inversely designed to map polarization states/higher-order spatial modes to orbit angular momentum, generating polarization-/charge-diverse orbit angular momentum modes. Based on the fundamental orbit angular momentum mode basis enabled by the meta-waveguides, different structured-light fields on polarization/orbit/higher-order/hybrid Poincaré spheres could be flexibly generated by controlling the relative amplitude and phase profiles of on-chip guided modes. The demonstrated photonic integrated devices hold great potential for the flexible manipulation of structure light beams in many applications. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.03672 [pdf, other]

CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation with Weighted Prefix-to-Prefix Training

Authors: Mengge Liu, Wen Zhang, Xiang Li, Yanzhi Tian, Yuhang Guo, Jian Luan, Bin Wang, Shuoying Chen

Abstract: Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available. Prefix-to-prefix framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix. However, due to the word order difference between languages, misaligned prefix pairs would make SiMT models suffer from serious ha… ▽ More Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available. Prefix-to-prefix framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix. However, due to the word order difference between languages, misaligned prefix pairs would make SiMT models suffer from serious hallucination problems, i.e. target outputs that are unfaithful to source inputs. Such problems can not only produce target tokens that are not supported by the source prefix, but also hinder generating the correct translation by receiving more source words. In this work, we propose a Confidence-Based Simultaneous Machine Translation (CBSiMT) framework, which uses model confidence to perceive hallucination tokens and mitigates their negative impact with weighted prefix-to-prefix training. Specifically, token-level and sentence-level weights are calculated based on model confidence and acted on the loss function. We explicitly quantify the faithfulness of the generated target tokens using the token-level weight, and employ the sentence-level weight to alleviate the disturbance of sentence pairs with serious word order differences on the model. Experimental results on MuST-C English-to-Chinese and WMT15 German-to-English SiMT tasks demonstrate that our method can consistently improve translation quality at most latency regimes, with up to 2 BLEU scores improvement at low latency. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.18659 [pdf, other]

DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy

Authors: Hongda Sun, Weikai Xu, Wei Liu, Jian Luan, Bin Wang, Shuo Shang, Ji-Rong Wen, Rui Yan

Abstract: Recent advances in large language models (LLMs) have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior studies have focused on modeling reasoning steps using various thought structures like chains, trees, or graphs. However, LLM-based reasoning still encounters the following challenges: (1) Limited adaptability of preset structur… ▽ More Recent advances in large language models (LLMs) have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior studies have focused on modeling reasoning steps using various thought structures like chains, trees, or graphs. However, LLM-based reasoning still encounters the following challenges: (1) Limited adaptability of preset structures to diverse tasks; (2) Insufficient precision in exploiting known conditions to derive new ones; and (3) Inadequate consideration of historical reasoning experiences for subsequent reasoning steps. To this end, we propose DetermLR, a novel perspective that rethinks the reasoning process as an evolution from indeterminacy to determinacy. First, we categorize known conditions into two types: determinate and indeterminate premises This provides an oveall direction for the reasoning process and guides LLMs in converting indeterminate data into progressively determinate insights. Subsequently, we leverage quantitative measurements to prioritize more relevant premises to explore new insights. Furthermore, we automate the storage and extraction of available premises and reasoning paths with reasoning memory, preserving historical reasoning details for subsequent reasoning steps. Comprehensive experimental results demonstrate that DetermLR surpasses all baselines on various logical reasoning benchmarks: LogiQA, ProofWriter, FOLIO, PrOntoQA, and LogicalDeduction. Compared to previous multi-step reasoning methods, DetermLR achieves higher accuracy with fewer reasoning steps, highlighting its superior efficiency and effectiveness in solving logical reasoning tasks. △ Less

Submitted 26 May, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at ACL 2024 Main, Code repo: https://github.com/XiaoMi/DetermLR

arXiv:2307.15895 [pdf, other]

Auditing Frameworks Need Resource Isolation: A Systematic Study on the Super Producer Threat to System Auditing and Its Mitigation

Authors: Peng Jiang, Ruizhe Huang, Ding Li, Yao Guo, Xiangqun Chen, Jianhai Luan, Yuxin Ren, Xinwei Hu

Abstract: System auditing is a crucial technique for detecting APT attacks. However, attackers may try to compromise the system auditing frameworks to conceal their malicious activities. In this paper, we present a comprehensive and systematic study of the super producer threat in auditing frameworks, which enables attackers to either corrupt the auditing framework or paralyze the entire system. We analyze… ▽ More System auditing is a crucial technique for detecting APT attacks. However, attackers may try to compromise the system auditing frameworks to conceal their malicious activities. In this paper, we present a comprehensive and systematic study of the super producer threat in auditing frameworks, which enables attackers to either corrupt the auditing framework or paralyze the entire system. We analyze that the main cause of the super producer threat is the lack of data isolation in the centralized architecture of existing solutions. To address this threat, we propose a novel auditing framework, NODROP, which isolates provenance data generated by different processes with a threadlet-based architecture design. Our evaluation demonstrates that NODROP can ensure the integrity of the auditing frameworks while achieving an average 6.58% higher application overhead compared to vanilla Linux and 6.30% lower application overhead compared to a state-of-the-art commercial auditing framework, Sysdig across eight different hardware configurations. △ Less

Submitted 29 July, 2023; originally announced July 2023.

Comments: 18 pages, to appear in the 32th USENIX Security Symposium (USENIX Security '23)

arXiv:2306.16636 [pdf, other]

CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

Authors: Tianwen Wei, Jian Luan, Wei Liu, Shuang Dong, Bin Wang

Abstract: We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1.7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams. This dataset aims to provide a benchmark tool for assessing the following question: to what grade level of elementary school math do the abilities of popular large language models (LLMs) co… ▽ More We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1.7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams. This dataset aims to provide a benchmark tool for assessing the following question: to what grade level of elementary school math do the abilities of popular large language models (LLMs) correspond? We evaluate a variety of popular LLMs, including both commercial and open-source options, and discover that only GPT-4 achieves success (accuracy $\geq$ 60\%) across all six elementary school grades, while other models falter at different grade levels. Furthermore, we assess the robustness of several top-performing LLMs by augmenting the original problems in the CMATH dataset with distracting information. Our findings reveal that GPT-4 is able to maintains robustness, while other model fail. We anticipate that our study will expose limitations in LLMs' arithmetic and reasoning capabilities, and promote their ongoing development and advancement. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.10543 [pdf, other]

UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning

Authors: Kang Zhao, Wei Liu, Jian Luan, Minglei Gao, Li Qian, Hanlin Teng, Bin Wang

Abstract: Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC),… ▽ More Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation. Specifically, we decompose the main task into three subtasks based on probability graphs: 1) conversation summarization, 2) memory retrieval, 3) memory-augmented generation. Each subtask involves learning a representation for calculating the relevance between the query and memory, which is modelled by inserting a special token at the beginning of the decoder input. The relevance representation learning strengthens the connection across subtasks through parameter sharing and joint training. Extensive experimental results show that the proposed method consistently improves over strong baselines and yields better dialogue consistency and engagingness. △ Less

Submitted 18 June, 2023; originally announced June 2023.

arXiv:2305.17415 [pdf, other]

Exploring Better Text Image Translation with Multimodal Codebook

Authors: Zhibin Lan, Jiawei Yu, Xiang Li, Wen Zhang, Jian Luan, Bin Wang, Degen Huang, Jinsong Su

Abstract: Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from t… ▽ More Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from the error propagation of optical character recognition (OCR). In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies. Then, we propose a TIT model with a multimodal codebook, which is able to associate the image with relevant texts, providing useful supplementary information for translation. Moreover, we present a multi-stage training framework involving text machine translation, image-text alignment, and TIT tasks, which fully exploits additional bilingual texts, OCR dataset and our OCRMT30K dataset to train our model. Extensive experiments and in-depth analyses strongly demonstrate the effectiveness of our proposed model and training framework. △ Less

Submitted 2 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: Accepted by ACL 2023 Main Conference

arXiv:2303.00969 [pdf, other]

Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation

Authors: Mengge Liu, Wen Zhang, Xiang Li, Jian Luan, Bin Wang, Yuhang Guo, Shuoying Chen

Abstract: Simultaneous machine translation (SimulMT) models start translation before the end of the source sentence, making the translation monotonically aligned with the source sentence. However, the general full-sentence translation test set is acquired by offline translation of the entire source sentence, which is not designed for SimulMT evaluation, making us rethink whether this will underestimate the… ▽ More Simultaneous machine translation (SimulMT) models start translation before the end of the source sentence, making the translation monotonically aligned with the source sentence. However, the general full-sentence translation test set is acquired by offline translation of the entire source sentence, which is not designed for SimulMT evaluation, making us rethink whether this will underestimate the performance of SimulMT models. In this paper, we manually annotate a monotonic test set based on the MuST-C English-Chinese test set, denoted as SiMuST-C. Our human evaluation confirms the acceptability of our annotated test set. Evaluations on three different SimulMT models verify that the underestimation problem can be alleviated on our test set. Further experiments show that finetuning on an automatically extracted monotonic training set improves SimulMT models by up to 3 BLEU points. △ Less

Submitted 13 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted by 48th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

arXiv:2301.06745 [pdf, other]

BERT-ERC: Fine-tuning BERT is Enough for Emotion Recognition in Conversation

Authors: Xiangyu Qin, Zhiyu Wu, Jinshi Cui, Tingting Zhang, Yanran Li, Jian Luan, Bin Wang, Li Wang

Abstract: Previous works on emotion recognition in conversation (ERC) follow a two-step paradigm, which can be summarized as first producing context-independent features via fine-tuning pretrained language models (PLMs) and then analyzing contextual information and dialogue structure information among the extracted features. However, we discover that this paradigm has several limitations. Accordingly, we pr… ▽ More Previous works on emotion recognition in conversation (ERC) follow a two-step paradigm, which can be summarized as first producing context-independent features via fine-tuning pretrained language models (PLMs) and then analyzing contextual information and dialogue structure information among the extracted features. However, we discover that this paradigm has several limitations. Accordingly, we propose a novel paradigm, i.e., exploring contextual information and dialogue structure information in the fine-tuning step, and adapting the PLM to the ERC task in terms of input text, classification structure, and training strategy. Furthermore, we develop our model BERT-ERC according to the proposed paradigm, which improves ERC performance in three aspects, namely suggestive text, fine-grained classification module, and two-stage training. Compared to existing methods, BERT-ERC achieves substantial improvement on four datasets, indicating its effectiveness and generalization capability. Besides, we also set up the limited resources scenario and the online prediction scenario to approximate real-world scenarios. Extensive experiments demonstrate that the proposed paradigm significantly outperforms the previous one and can be adapted to various scenes. △ Less

Submitted 17 January, 2023; originally announced January 2023.

arXiv:2212.03435 [pdf, other]

Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

Authors: Fengyu Yang, Jian Luan, Yujun Wang

Abstract: In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation a… ▽ More In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation and intonation of the second language in different contexts without mutual interference. This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker. We introduce phonology embedding to capture the English differences between different phonology. Embedding mask is applied to language embedding for distinguishing information between different languages and to phonology embedding for focusing on English expression. We specially design an embedding strength modulator to capture the dynamic strength of language and phonology. Experiments show that our approach can produce significantly more natural and standard spoken English speech of the monolingual Chinese speaker. From analysis, we find that suitable phonology control contributes to better performance in different scenarios. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Submitted to ICASSP2023

arXiv:2206.03773 [pdf]

doi 10.1093/nsr/nwad189

Quantized anomalous Hall resistivity achieved in molecular beam epitaxy-grown MnBi2Te4 thin films

Authors: Yunhe Bai, Yuanzhao Li, Jianli Luan, Ruixuan Liu, Wenyu Song, Yang Chen, Peng-Fei Ji, Qinghua Zhang, Fanqi Meng, Bingbing Tong, Lin Li, Yuying Jiang, Zongwei Gao, Lin Gu, Jinsong Zhang, Yayu Wang, Qi-Kun Xue, Ke He, Yang Feng, Xiao Feng

Abstract: The intrinsic magnetic topological insulator MnBi2Te4 provides a feasible pathway to high temperature quantum anomalous Hall (QAH) effect as well as various novel topological quantum phases. Although quantized transport properties have been observed in exfoliated MnBi2Te4 thin flakes, it remains a big challenge to achieve molecular beam epitaxy (MBE)-grown MnBi2Te4 thin films even close to the qua… ▽ More The intrinsic magnetic topological insulator MnBi2Te4 provides a feasible pathway to high temperature quantum anomalous Hall (QAH) effect as well as various novel topological quantum phases. Although quantized transport properties have been observed in exfoliated MnBi2Te4 thin flakes, it remains a big challenge to achieve molecular beam epitaxy (MBE)-grown MnBi2Te4 thin films even close to the quantized regime. In this work, we report the realization of quantized anomalous Hall resistivity in MBE-grown MnBi2Te4 thin films with the chemical potential tuned by both controlled in-situ oxygen exposure and top gating. We find that elongated post-annealing obviously elevates the temperature to achieve quantization of the Hall resistivity, but also increases the residual longitudinal resistivity, indicating a picture of high-quality QAH puddles weakly coupled by tunnel barriers. These results help to clarify the puzzles in previous experimental studies on MnBi2Te4 and to find a way out of the big difficulty in obtaining MnBi2Te4 samples showing quantized transport properties. △ Less

Submitted 17 April, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: 4 figures

Journal ref: National Science Review, nwad189 (2023)

arXiv:2205.15705 [pdf]

High-quality 8-fold self-compression of ultrashort near-UV pulses in Ar-filled ultrathin-walled photonic crystal fiber

Authors: Jie Luan, Philip St. J. Russell, David Novoa

Abstract: We demonstrate generation of 7.6 fs near-UV pulses centered at 400 nm via 8-fold soliton-effect self-compression in an Ar-filled hollow-core kagomé-style photonic crystal fiber with ultrathin core walls. Analytical calculations of the effective compression length and soliton order permit adjustment of the experimental parameters, and numerical modelling of the nonlinear pulse dynamics in the fiber… ▽ More We demonstrate generation of 7.6 fs near-UV pulses centered at 400 nm via 8-fold soliton-effect self-compression in an Ar-filled hollow-core kagomé-style photonic crystal fiber with ultrathin core walls. Analytical calculations of the effective compression length and soliton order permit adjustment of the experimental parameters, and numerical modelling of the nonlinear pulse dynamics in the fiber accurately predict the spectro-temporal profiles of the self-compressed pulses. After compensation of phase distortion introduced by the optical elements along the beam path from the fiber to the diagnostics, 71% of the pulse energy was in the main temporal lobe, with peak powers in excess of 0.2 GW. The convenient set-up opens up new opportunities for time-resolved studies in spectroscopy, chemistry and materials science. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: 7 pages, 5 figures

arXiv:2111.10002 [pdf]

High-speed and single-mode FP laser based on parity-time symmetry

Authors: Sikang Yang, Jing Luan, Yu Han, Ruigang Zhang, Qi Tian, Pengxiang He, Deming Liu, Minming Zhang

Abstract: The ability to manipulate cavity resonant modes is of critical importance in laser physics and applications. By exploiting the parity time (PT) symmetry, we propose and experimentally realize a single-mode FP laser with improved output power and high-speed modulation have been demonstrated. The proposed PT symmetric laser consists of two coupled structurally identical FP resonators. The gain and l… ▽ More The ability to manipulate cavity resonant modes is of critical importance in laser physics and applications. By exploiting the parity time (PT) symmetry, we propose and experimentally realize a single-mode FP laser with improved output power and high-speed modulation have been demonstrated. The proposed PT symmetric laser consists of two coupled structurally identical FP resonators. The gain and loss in two FP resonators can be manipulated independently by changing the injection currents. In the PT symmetric FP laser, single-mode operation is accomplished by selectively breaking of PT symmetry depending solely on the relation between gain-loss and coupling. Single-mode lasing with output power of 1.7 dBm and a sidemode suppression ratio (SMSR) exceeding 24 dB is demonstrated. The 3 dB bandwidth of 7.9 GHz is achieved and clear eye-openings were obtained for 2.5 Gbps and 10Gbps NRZ operation over 10 km single-mode fibers. Furthermore, the PT symmetry breaking is experimentally confirmed with measured loss and coupling coefficient of two FP resonators. The influence of cavity length, facet reflectivity, and electrical isolation between two P-side electrodes on the side mode suppression ratio and output optical power is also been demonstrated, paving the way for further improvement of the PT symmetric FP laser. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: 8 pages, 6 figures

arXiv:2110.09780 [pdf, other]

Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

Authors: Fengyu Yang, Jian Luan, Yujun Wang

Abstract: Learning emotion embedding from reference audio is a straightforward approach for multi-emotion speech synthesis in encoder-decoder systems. But how to get better emotion embedding and how to inject it into TTS acoustic model more effectively are still under investigation. In this paper, we propose an innovative constraint to help VAE extract emotion embedding with better cluster cohesion. Besides… ▽ More Learning emotion embedding from reference audio is a straightforward approach for multi-emotion speech synthesis in encoder-decoder systems. But how to get better emotion embedding and how to inject it into TTS acoustic model more effectively are still under investigation. In this paper, we propose an innovative constraint to help VAE extract emotion embedding with better cluster cohesion. Besides, the obtained emotion embedding is used as query to aggregate latent representations of all encoder layers via attention. Moreover, the queries from encoder layers themselves are also helpful. Experiments prove the proposed methods can enhance the encoding of comprehensive syntactic and semantic information and produce more expressive emotional speech. △ Less

Submitted 28 January, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: accepted by ICASSP2022

arXiv:2110.04486 [pdf, other]

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

Authors: Yunchao He, Jian Luan, Yujun Wang

Abstract: Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This… ▽ More Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This paper proposes PAMA-TTS to address the problem. It takes the advantage of both flexible attention and explicit duration models. Based on the monotonic attention mechanism, PAMA-TTS also leverages token duration and relative position of a frame, especially countdown information, i.e. in how many future frames the present phoneme will end. They help the attention to move forward along the token sequence in a soft but reliable control. Experimental results prove that PAMA-TTS achieves the highest naturalness, while has on-par or even better duration controllability than the duration-informed model. △ Less

Submitted 18 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

Comments: Accepted by ICASSP 2022. 5 pages, 4 figures, 3 tables. Audio samples are available at: https://pama-tts.github.io/

arXiv:2107.04407 [pdf]

doi 10.1007/s10237-021-01542-5

Hemodynamic effects of stent-graft introducer sheath during thoracic endovascular aortic repair

Authors: Yonghui Qiao, Le Mao, Yan Wang, Jingyang Luan, Yanlu Chen, Ting Zhu, Kun Luo, Jianren Fan

Abstract: Thoracic endovascular aortic repair (TEVAR) has become the standard treatment of a variety of aortic pathologies. The objective of this study is to evaluate the hemodynamic effects of stent-graft introducer sheath during TEVAR. Three idealized representative diseased aortas of aortic aneurysm, coarctation of the aorta, and aortic dissection were designed. Computational fluid dynamics studies were… ▽ More Thoracic endovascular aortic repair (TEVAR) has become the standard treatment of a variety of aortic pathologies. The objective of this study is to evaluate the hemodynamic effects of stent-graft introducer sheath during TEVAR. Three idealized representative diseased aortas of aortic aneurysm, coarctation of the aorta, and aortic dissection were designed. Computational fluid dynamics studies were performed in the above idealized aortic geometries. An introducer sheath routinely used in the clinic was virtually-delivered into diseased aortas. Comparative analysis was carried out to evaluate the hemodynamic effects of the introducer sheath. Results show that the blood flow to the supra-aortic branches would increase above 9% due to the obstruction of the introducer sheath. The region exposed to high endothelial cell activation potential (ECAP) expands in the scenarios of coarctation of the aorta and aortic dissection, which indicates that the probability of thrombus formation may increase during TEVAR. The pressure magnitude in peak systole shows an obvious rise and a similar phenomenon is not observed in early diastole. The blood viscosity in the aortic arch and descending aorta is remarkably altered by the introducer sheath. The uneven viscosity distribution confirms the necessity of using non-Newtonian models and high viscosity region with high ECAP further promotes thrombosis. Our results highlight the hemodynamic effects of stent-graft introducer sheath during TEVAR, which may associate with perioperative complications. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Journal ref: Biomechanics and Modeling in Mechanobiology (2022)

arXiv:2107.03065 [pdf, other]

Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information

Authors: Qinghua Wu, Quanbo Shen, Jian Luan, YuJun Wang

Abstract: In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called… ▽ More In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called Msdtron, in which 1) a representation of the harmonic structure of speech, called excitation spectrogram, is designed to directly guide the learning of harmonics in mel-spectrogram. 2) conditional gated LSTM (CGLSTM) is proposed to control the flow of text content information through the network by re-weighting the gates of LSTM using speaker information. The experiments show a significant reduction in reconstruction error of mel-spectrogram in the training of the multi-speaker model, and a great improvement is observed in the subjective evaluation of speaker adapted model. △ Less

Submitted 11 February, 2022; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: Accepted by ICASSP-2022

arXiv:2103.05215 [pdf]

doi 10.1088/0256-307X/38/3/037402

Gate Tunable Supercurrent in Josephson Junctions Based on Bi2Te3 Topological Insulator Thin Films

Authors: Wei-Xiong Wu, Yang Feng, Yun-He Bai, Yu-Ying Jiang, Zong-Wei Gao, Yuan-Zhao Li, Jian-Li Luan, Heng-An Zhou, Wan-Jun Jiang, Xiao Feng, Jin-Song Zhang, Hao Zhang, Ke He, Xu-Cun Ma, Qi-Kun Xue, Ya-Yu Wang

Abstract: We report transport measurements on Josephson junctions consisting of Bi2Te3 topological insulator (TI) thin films contacted by superconducting Nb electrodes. For a device with junction length L = 134 nm, the critical supercurrent Ic can be modulated by an electrical gate which tunes the carrier type and density of the TI film. Ic can reach a minimum when the TI is near the charge neutrality regim… ▽ More We report transport measurements on Josephson junctions consisting of Bi2Te3 topological insulator (TI) thin films contacted by superconducting Nb electrodes. For a device with junction length L = 134 nm, the critical supercurrent Ic can be modulated by an electrical gate which tunes the carrier type and density of the TI film. Ic can reach a minimum when the TI is near the charge neutrality regime with the Fermi energy lying close to the Dirac point of the surface state. In the p-type regime the Josephson current can be well described by a short ballistic junction model. In the n-type regime the junction is ballistic at 0.7 K < T < 3.8 K while for T < 0.7 K the diffusive bulk modes emerge and contribute a larger Ic than the ballistic model. We attribute the lack of diffusive bulk modes in the p-type regime to the formation of p-n junctions. Our work provides new clues for search of Majorana zero mode in TI-based superconducting devices. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 6 pages, 4 figures, The manuscript with the same title will be published by Chinese Physics Letters

Journal ref: Chinese Physics Letters 38, 037402 (2021)

arXiv:2102.11205 [pdf]

doi 10.1364/OE.422815

Efficient self-compression of ultrashort UV pulses in air-filled hollow-core photonic crystal fiber

Authors: Jie Luan, Philip St. J. Russell, David Novoa

Abstract: We report generation of ultrashort UV pulses by soliton self-compression in kagomé-style hollow-core photonic crystal fiber filled with ambient air. Pump pulses with energy 2.6 uJ and duration 54 fs at 400 nm were compressed temporally by a factor of 5, to a duration of ~11 fs. The experimental results are supported by numerical simulations, showing that both Raman and Kerr effects play a role in… ▽ More We report generation of ultrashort UV pulses by soliton self-compression in kagomé-style hollow-core photonic crystal fiber filled with ambient air. Pump pulses with energy 2.6 uJ and duration 54 fs at 400 nm were compressed temporally by a factor of 5, to a duration of ~11 fs. The experimental results are supported by numerical simulations, showing that both Raman and Kerr effects play a role in the compression dynamics. The convenience of using ambient air, and the absence of glass windows that would distort the compressed pulses, makes the setup highly attractive as the basis of an efficient table-top UV pulse compressor. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: 7 pages, 5 figures

arXiv:2102.00802 [pdf]

Liquefaction-induced Plasticity from Entropy-boosted Amorphous Ceramics

Authors: Haidong Bian, Quanfeng He, Junhua Luan, Yu Bu, Yong Yang, Zhengtao Xu, Jian Lu, Yang Yang Li

Abstract: Ceramics are easy to break, and very few generic mechanisms are available for improving their mechanical properties, e.g., the 1975-discovered anti-fracture mechanism is strictly limited to zirconia and hafnia. Here we report a general mechanism for achieving high plasticity through liquefaction of ceramics. We further disclose the general material design strategies to achieve this difficult task… ▽ More Ceramics are easy to break, and very few generic mechanisms are available for improving their mechanical properties, e.g., the 1975-discovered anti-fracture mechanism is strictly limited to zirconia and hafnia. Here we report a general mechanism for achieving high plasticity through liquefaction of ceramics. We further disclose the general material design strategies to achieve this difficult task through entropy-boosted amorphous ceramics (EBACs), enabling fracture-resistant properties that can withstand severe plastic deformation (e.g., over 95%, deformed to a thickness of a few nanometers) while maintaining high hardness and reduced modulus. The findings reported here open a new route to ductile ceramics and many applications. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: 16 pages,4 figures

arXiv:2101.02382 [pdf]

Highly Distorted Lattices in Chemically Complex Alloys Produce Ultra-Elastic Materials with Extraordinary Elinvar Effects

Authors: Q. F. He, J. G. Wang, H. A. Chen, Z. Y. Ding, Z. Q. Zhou, L. H. Xiong, J. H. Luan, J. M. Pelletier, J. C. Qiao, Q. Wang, L. L. Fan, Y. Ren, Q. S. Zeng, C. T. Liu, C. W. Pao, D. J. Srolovitz, Y. Yang

Abstract: Conventional crystalline alloys usually possess a low atomic size difference in order to stabilize its crystalline structure. However, in this article, we report a single phase chemically complex alloy which possesses a large atomic size misfit usually unaffordable to conventional alloys. Consequently, this alloy develops a rather complex atomic-scale chemical order and a highly distorted crystall… ▽ More Conventional crystalline alloys usually possess a low atomic size difference in order to stabilize its crystalline structure. However, in this article, we report a single phase chemically complex alloy which possesses a large atomic size misfit usually unaffordable to conventional alloys. Consequently, this alloy develops a rather complex atomic-scale chemical order and a highly distorted crystalline structure. As a result, this crystalline alloy displays an unusually high elastic strain limit (~2%), about ten times of that of conventional alloys, and an extremely low internal friction (< 2E-4) at room temperature. More interestingly, this alloy firmly maintains its elastic modulus even when the testing temperature rises from room temperature to 900 K, which is unmatched by the existing alloys hitherto reported. From an application viewpoint, our discovery may open up new opportunities to design high precision devices usable even under an extreme environment. △ Less

Submitted 7 January, 2021; originally announced January 2021.

arXiv:2009.01776 [pdf, other]

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

Authors: Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu

Abstract: High-fidelity singing voices usually require higher sampling rate (e.g., 48kHz) to convey expression and emotion. However, higher sampling rate causes the wider frequency band and longer waveform sequences and throws challenges for singing voice synthesis (SVS) in both frequency and time domains. Conventional SVS systems that adopt small sampling rate cannot well address the above challenges. In t… ▽ More High-fidelity singing voices usually require higher sampling rate (e.g., 48kHz) to convey expression and emotion. However, higher sampling rate causes the wider frequency band and longer waveform sequences and throws challenges for singing voice synthesis (SVS) in both frequency and time domains. Conventional SVS systems that adopt small sampling rate cannot well address the above challenges. In this paper, we develop HiFiSinger, an SVS system towards high-fidelity singing voice. HiFiSinger consists of a FastSpeech based acoustic model and a Parallel WaveGAN based vocoder to ensure fast training and inference and also high voice quality. To tackle the difficulty of singing modeling caused by high sampling rate (wider frequency band and longer waveform), we introduce multi-scale adversarial training in both the acoustic model and vocoder to improve singing modeling. Specifically, 1) To handle the larger range of frequencies caused by higher sampling rate, we propose a novel sub-frequency GAN (SF-GAN) on mel-spectrogram generation, which splits the full 80-dimensional mel-frequency into multiple sub-bands and models each sub-band with a separate discriminator. 2) To model longer waveform sequences caused by higher sampling rate, we propose a multi-length GAN (ML-GAN) for waveform generation to model different lengths of waveform sequences with separate discriminators. 3) We also introduce several additional designs and findings in HiFiSinger that are crucial for high-fidelity voices, such as adding F0 (pitch) and V/UV (voiced/unvoiced flag) as acoustic features, choosing an appropriate window/hop size for mel-spectrogram, and increasing the receptive field in vocoder for long vowel modeling. Experiment results show that HiFiSinger synthesizes high-fidelity singing voices with much higher quality: 0.32/0.44 MOS gain over 48kHz/24kHz baseline and 0.83 MOS gain over previous SVS systems. △ Less

Submitted 3 September, 2020; originally announced September 2020.

arXiv:2008.04658 [pdf, other]

Transfer Learning for Improving Singing-voice Detection in Polyphonic Instrumental Music

Authors: Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li

Abstract: Detecting singing-voice in polyphonic instrumental music is critical to music information retrieval. To train a robust vocal detector, a large dataset marked with vocal or non-vocal label at frame-level is essential. However, frame-level labeling is time-consuming and labor expensive, resulting there is little well-labeled dataset available for singing-voice detection (S-VD). Hence, we propose a d… ▽ More Detecting singing-voice in polyphonic instrumental music is critical to music information retrieval. To train a robust vocal detector, a large dataset marked with vocal or non-vocal label at frame-level is essential. However, frame-level labeling is time-consuming and labor expensive, resulting there is little well-labeled dataset available for singing-voice detection (S-VD). Hence, we propose a data augmentation method for S-VD by transfer learning. In this study, clean speech clips with voice activity endpoints and separate instrumental music clips are artificially added together to simulate polyphonic vocals to train a vocal/non-vocal detector. Due to the different articulation and phonation between speaking and singing, the vocal detector trained with the artificial dataset does not match well with the polyphonic music which is singing vocals together with the instrumental accompaniments. To reduce this mismatch, transfer learning is used to transfer the knowledge learned from the artificial speech-plus-music training set to a small but matched polyphonic dataset, i.e., singing vocals with accompaniments. By transferring the related knowledge to make up for the lack of well-labeled training data in S-VD, the proposed data augmentation method by transfer learning can improve S-VD performance with an F-score improvement from 89.5% to 93.2%. △ Less

Submitted 11 August, 2020; originally announced August 2020.

Comments: Accepted by INTERSPEECH 2020

arXiv:2008.02490 [pdf]

PPSpeech: Phrase based Parallel End-to-End TTS System

Authors: Yahuan Cong, Ran Zhang, Jian Luan

Abstract: Current end-to-end autoregressive TTS systems (e.g. Tacotron 2) have outperformed traditional parallel approaches on the quality of synthesized speech. However, they introduce new problems at the same time. Due to the autoregressive nature, the time cost of inference has to be proportional to the length of text, which pose a great challenge for online serving. On the other hand, the style of synth… ▽ More Current end-to-end autoregressive TTS systems (e.g. Tacotron 2) have outperformed traditional parallel approaches on the quality of synthesized speech. However, they introduce new problems at the same time. Due to the autoregressive nature, the time cost of inference has to be proportional to the length of text, which pose a great challenge for online serving. On the other hand, the style of synthetic speech becomes unstable and may change obviously among sentences. In this paper, we propose a Phrase based Parallel End-to-End TTS System (PPSpeech) to address these issues. PPSpeech uses autoregression approach within a phrase and executes parallel strategies for different phrases. By this method, we can achieve both high quality and high efficiency. In addition, we propose acoustic embedding and text context embedding as the conditions of encoder to keep successive and prevent from abrupt style or timbre change. Experiments show that, the synthesis speed of PPSpeech is much faster than sentence level autoregressive Tacotron 2 when a sentence has more than 5 phrases. The speed advantage increases with the growth of sentence length. Subjective experiments show that the proposed system with acoustic embedding and context embedding as conditions can make the style transition across sentences gradient and natural, defeating Global Style Token (GST) obviously in MOS. △ Less

Submitted 6 August, 2020; originally announced August 2020.

arXiv:2007.04590 [pdf, other]

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

Authors: Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu

Abstract: In this paper, we develop DeepSinger, a multi-lingual multi-singer singing voice synthesis (SVS) system, which is built from scratch using singing training data mined from music websites. The pipeline of DeepSinger consists of several steps, including data crawling, singing and accompaniment separation, lyrics-to-singing alignment, data filtration, and singing modeling. Specifically, we design a l… ▽ More In this paper, we develop DeepSinger, a multi-lingual multi-singer singing voice synthesis (SVS) system, which is built from scratch using singing training data mined from music websites. The pipeline of DeepSinger consists of several steps, including data crawling, singing and accompaniment separation, lyrics-to-singing alignment, data filtration, and singing modeling. Specifically, we design a lyrics-to-singing alignment model to automatically extract the duration of each phoneme in lyrics starting from coarse-grained sentence level to fine-grained phoneme level, and further design a multi-lingual multi-singer singing model based on a feed-forward Transformer to directly generate linear-spectrograms from lyrics, and synthesize voices using Griffin-Lim. DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers. We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages (Chinese, Cantonese and English). The results demonstrate that with the singing data purely mined from the Web, DeepSinger can synthesize high-quality singing voices in terms of both pitch accuracy and voice naturalness (footnote: Our audio samples are shown in https://speechresearch.github.io/deepsinger/.) △ Less

Submitted 15 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: Accepted by KDD2020 research track

arXiv:2006.10317 [pdf, other]

Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

Authors: Jie Wu, Jian Luan

Abstract: This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multi-singer framework to leverage all the existing singing data of different singers. To attenuate the issue of musical score unbalance among singers, we incorporate an adversarial task of singer classification to make… ▽ More This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multi-singer framework to leverage all the existing singing data of different singers. To attenuate the issue of musical score unbalance among singers, we incorporate an adversarial task of singer classification to make encoder output less singer dependent. Furthermore, we apply multiple random window discriminators (MRWDs) on the generated acoustic features to make the network be a GAN. Both objective and subjective evaluations indicate that the proposed synthesizer can generate higher quality singing voice than baseline (4.12 vs 3.53 in MOS). Especially, the articulation of high-pitched vowels is significantly enhanced. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: Submitted to INTERSPEECH2020

arXiv:2006.06261 [pdf, other]

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

Authors: Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou

Abstract: This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling. We follow the main architecture of FastSpeech while proposing some singing-specific design: 1) Besides phoneme ID and position encoding, features from musical score (e.g.note pitch and length) are also added. 2) To attenuate off-key issues, we a… ▽ More This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling. We follow the main architecture of FastSpeech while proposing some singing-specific design: 1) Besides phoneme ID and position encoding, features from musical score (e.g.note pitch and length) are also added. 2) To attenuate off-key issues, we add a residual connection in F0 prediction. 3) In addition to the duration loss of each phoneme, the duration of all the phonemes in a musical note is accumulated to calculate the syllable duration loss for rhythm enhancement. Experiment results show that XiaoiceSing outperforms the baseline system of convolutional neural networks by 1.44 MOS on sound quality, 1.18 on pronunciation accuracy and 1.38 on naturalness respectively. In two A/B tests, the proposed F0 and duration modeling methods achieve 97.3% and 84.3% preference rate over baseline respectively, which demonstrates the overwhelming advantages of XiaoiceSing. △ Less

Submitted 11 June, 2020; originally announced June 2020.

arXiv:1907.09980 [pdf, other]

doi 10.1016/j.physleta.2021.127510

Tunable Dirac points and zero-energy modes in periodic curved graphene superlattices

Authors: Jianli Luan, Kaiyi Guo, Shangyang Li, Tianxing Ma, Li-Gang Wang, Hai-Qing Lin

Abstract: We combined periodic ripples and electrostatic potentials to form curved graphene superlattices and studied the effects of space-dependent Fermi velocity induced from curvature on their electronic properties. With equal periods and symmetric potentials, the Dirac points do not move, but their locations shift under asymmetric potentials. This shift can be tuned by curvature and potentials. Tunable… ▽ More We combined periodic ripples and electrostatic potentials to form curved graphene superlattices and studied the effects of space-dependent Fermi velocity induced from curvature on their electronic properties. With equal periods and symmetric potentials, the Dirac points do not move, but their locations shift under asymmetric potentials. This shift can be tuned by curvature and potentials. Tunable extra gaps in band structures can appear with unequal periods. The existence of new Dirac points is proposed, such that these new Dirac points can appear under smaller potentials with curvature, and their locations can be changed even under a fixed potential by adjusting the curvature. Our results suggest that curvature provides a new possible dimension to tune the electronic properties in graphene superlattices and a platform to more easier study physics near new Dirac points. △ Less

Submitted 28 June, 2021; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: 10 pages and 7 figures. Published version

Journal ref: Physics Letters A 409 (2021) 127510

arXiv:1905.03802 [pdf, other]

Titan's Dynamic Love Number Implies Stably-Stratified Ocean

Authors: Jing Luan

Abstract: The dynamic quadrupole Love number of Titan measured by \Cassini is $k_\mathrm{2,obs}=0.616\pm 0.067$, strongly indicating a global subsurface ocean. However, the theoretical Love number due to equilibrium tides is at most $k_\mathrm{2,eq}^\mathrm{max}\approx 0.48$ in the absence of an ice shell on top of the ocean. In reality, there is an outer ice shell of thickness $ 100\,\mathrm{km}$, reducing… ▽ More The dynamic quadrupole Love number of Titan measured by \Cassini is $k_\mathrm{2,obs}=0.616\pm 0.067$, strongly indicating a global subsurface ocean. However, the theoretical Love number due to equilibrium tides is at most $k_\mathrm{2,eq}^\mathrm{max}\approx 0.48$ in the absence of an ice shell on top of the ocean. In reality, there is an outer ice shell of thickness $ 100\,\mathrm{km}$, reducing the equilibrium-tide Love number to $k_\mathrm{2,eq}\approx 0.42$. Therefore, other types of tidal response, like dynamic tides, may be also present in Titan. We propose that the ocean is stably stratified. As a result, there exist standing ocean waves (gravity modes) with eigen-frequencies close to the tidal frequency. Such a gravity mode (g-mode) is resonantly excited. It bends the outer ice shell radially and thus enhances the dynamic Love number by $k_\mathrm{2,g}$. In order for $k_\mathrm{2,g}$ to account for the discrepancy between $k_\mathrm{2,eq}$ and $k_\mathrm{2,obs}$, the Brunt-Vaisala frequency in the ocean is required to be $3.3\times 10^{-4}\,\mathrm{rad\, s^{-1}}$. It is compatible with the volatile-rich model for Titan that was proposed to explain the methane-rich atmosphere. The three components of the tidal potential with azimuthal degrees, $m=-2,0,2$, correspond to the three components of the quadrupole Love number, $k_\mathrm{2,-2}$, $k_\mathrm{2,0}$ and $k_\mathrm{2,2}$. They can excite retrograde, axisymmetric and prograde g-modes equally in the absence of rotation. However, Coriolis force induced by Titan's rotation breaks the symmetry among these modes. Most likely, only one of the Love-number components is significantly enhanced by a g-mode, while the other two are still attributed to equilibrium tides. This prediction is testable by observation. If confirmed, the smaller components of the Love number can be used to constrain the thickness of the outer ice shell. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: Comments are welcome. Submit to Icarus

arXiv:1901.00578 [pdf, ps, other]

Prediction of multi-dimensional spatial variation data via Bayesian tensor completion

Authors: Jiali Luan, Zheng Zhang

Abstract: This paper presents a multi-dimensional computational method to predict the spatial variation data inside and across multiple dies of a wafer. This technique is based on tensor computation. A tensor is a high-dimensional generalization of a matrix or a vector. By exploiting the hidden low-rank property of a high-dimensional data array, the large amount of unknown variation testing data may be pred… ▽ More This paper presents a multi-dimensional computational method to predict the spatial variation data inside and across multiple dies of a wafer. This technique is based on tensor computation. A tensor is a high-dimensional generalization of a matrix or a vector. By exploiting the hidden low-rank property of a high-dimensional data array, the large amount of unknown variation testing data may be predicted from a few random measurement samples. The tensor rank, which decides the complexity of a tensor representation, is decided by an available variational Bayesian approach. Our approach is validated by a practical chip testing data set, and it can be easily generalized to characterize the process variations of multiple wafers. Our approach is more efficient than the previous virtual probe techniques in terms of memory and computational cost when handling high-dimensional chip testing data. △ Less

Submitted 2 January, 2019; originally announced January 2019.

arXiv:1712.04206 [pdf, other]

doi 10.1088/1361-648X/aadbe0

$\textit{Zitterbewegung}$ near new Dirac points in graphene superlattice

Authors: Jianli Luan, Shangyang Li, Tianxing Ma, Li-Gang Wang

Abstract: New Dirac points appear when periodic potentials are applied to graphene, and there are many interesting effects near these new Dirac points. Here we investigate the $\textit{Zitterbewegung}$ effect of fermions described by a Gaussian wave packet in graphene superlattice near new Dirac points. The $\textit{Zitterbewegung}$ near different Dirac points has similar characteristics, while Fermions nea… ▽ More New Dirac points appear when periodic potentials are applied to graphene, and there are many interesting effects near these new Dirac points. Here we investigate the $\textit{Zitterbewegung}$ effect of fermions described by a Gaussian wave packet in graphene superlattice near new Dirac points. The $\textit{Zitterbewegung}$ near different Dirac points has similar characteristics, while Fermions near new Dirac points have different group velocities in both $x$- and $y$-direction, which causes the different properties of the $\textit{Zitterbewegung}$ near new Dirac points. We also investigate the $\textit{Zitterbewegung}$ effect influenced by all Dirac points, and get the evolution with changing potential. Our intensive results suggest that graphene superlattice may provide an appropriate system to study $\textit{Zitterbewegung}$ effect near new Dirac points experimentally. △ Less

Submitted 24 April, 2018; v1 submitted 12 December, 2017; originally announced December 2017.

Comments: 8 pages, 11 figures

Journal ref: J. Phys.: Condens. Matter 30 395502 (2018)

arXiv:1711.06367 [pdf, other]

doi 10.3847/1538-4357/aad0f4

DAVs: Red edge and Outbursts

Authors: Jing Luan, Peter Goldreich

Abstract: As established by ground based surveys, white dwarfs with hydrogen atmospheres pulsate as they cool across the temperature range, $12500\,\mathrm{K} \gtrsim T_{\mathrm{eff}} \gtrsim 10800\,\mathrm{K}$. Known as DAVs or ZZ Ceti stars, their oscillations are attributed to overstable g-modes excited by convective driving. The effective temperature at the blue edge of the instability strip is slightly… ▽ More As established by ground based surveys, white dwarfs with hydrogen atmospheres pulsate as they cool across the temperature range, $12500\,\mathrm{K} \gtrsim T_{\mathrm{eff}} \gtrsim 10800\,\mathrm{K}$. Known as DAVs or ZZ Ceti stars, their oscillations are attributed to overstable g-modes excited by convective driving. The effective temperature at the blue edge of the instability strip is slightly lower than that at which a surface convection zone appears. The temperature at the red edge is a two-decade old puzzle. Recently, {\it Kepler} discovered a number of cool DAVs which pulsate at higher frequencies and with much smaller photometric amplitudes than expected based on trends extrapolated from DAVs found by ground based observations. Remarkably, some of them exhibit sporadic outbursts separated by days, each lasting several hours, and releasing $\sim 10^{33}-10^{34}\,\mathrm{erg}$. We provide quantitative explanations for both the red edge and the outbursts. The minimal frequency for overstable modes rises abruptly near the red edge. Although high frequency overstable modes exist below the red edge, their photometric amplitudes are generally too small to be detected by ground based observations. Nevertheless, these overstable parent modes can manifest themselves through nonlinear mode couplings to damped daughter modes which generate limit cycles giving rise to photometric outbursts. △ Less

Submitted 2 July, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

Comments: accepted to ApJ

arXiv:1707.02519 [pdf, other]

doi 10.1093/mnras/stx2714

How Cassini Can Constrain Tidal Dissipation in Saturn

Authors: Jing Luan, Jim Fuller, Eliot Quataert

Abstract: Tidal dissipation inside giant planets is important for the orbital evolution of their natural satellites. It is conventionally treated by parameterized equilibrium tidal theory, in which the tidal torque declines rapidly with distance, and orbital expansion was faster in the past. However, Lainey et al. (2017) find that some Saturnian satellites are currently migrating outward faster than predict… ▽ More Tidal dissipation inside giant planets is important for the orbital evolution of their natural satellites. It is conventionally treated by parameterized equilibrium tidal theory, in which the tidal torque declines rapidly with distance, and orbital expansion was faster in the past. However, Lainey et al. (2017) find that some Saturnian satellites are currently migrating outward faster than predicted by equilibrium tidal theory. Resonance locking between satellites and internal oscillations of Saturn, proposed by Fuller et al. (2016), naturally matches the observed migration rates. Here, we show that the resonance locking theory predicts dynamical tidal perturbations to Saturn's gravitational field in addition to those produced by equilibrium tidal bulges. We show that these perturbations can likely be detected during Cassini's proximal orbits if migration of satellites results from resonant gravity modes, but will likely be undetectable if migration results from inertial wave attractors or dissipation of the equilibrium tide. Additionally, we show that the detection of gravity modes would place constraints on the size of the hypothetical stably stratified region in Saturn. △ Less

Submitted 28 October, 2017; v1 submitted 8 July, 2017; originally announced July 2017.

Comments: accepted to MNRAS

arXiv:1611.05923 [pdf, other]

doi 10.1109/BigData.2016.7841024

"Influence Sketching": Finding Influential Samples In Large-Scale Regressions

Authors: Mike Wojnowicz, Ben Cruz, Xuan Zhao, Brian Wallace, Matt Wolff, Jay Luan, Caleb Crable

Abstract: There is an especially strong need in modern large-scale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the "needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical sta… ▽ More There is an especially strong need in modern large-scale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the "needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical statistical technique for identifying samples which unusually strongly impact the fit of a regression model (and its downstream predictions). In order to scale this technique up to very large and high-dimensional datasets, we introduce a new algorithm which we call "influence sketching." Influence sketching embeds random projections within the influence computation; in particular, the influence score is calculated using the randomly projected pseudo-dataset from the post-convergence Generalized Linear Model (GLM). We validate that influence sketching can reliably and successfully discover influential samples by applying the technique to a malware detection dataset of over 2 million executable files, each represented with almost 100,000 features. For example, we find that randomly deleting approximately 10% of training samples reduces predictive accuracy only slightly from 99.47% to 99.45%, whereas deleting the same number of samples with high influence sketch scores reduces predictive accuracy all the way down to 90.24%. Moreover, we find that influential samples are especially likely to be mislabeled. In the case study, we manually inspect the most influential samples, and find that influence sketching pointed us to new, previously unidentified pieces of malware. △ Less

Submitted 23 March, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

Comments: fixed additional typos

Journal ref: Big Data (Big Data), 2016 IEEE International Conference on, pp. 3601 - 3612. IEEE, 2016

Showing 1–50 of 84 results for author: Luan, J