Search | arXiv e-print repository

Novel ground states and emergent quantum many-body scars in a two-species Rydberg atom array

Authors: Lei-Yi-Nan Liu, Shun-Yao Yu, Shi-Rong Peng, Jie Sheng, Su Yi, Peng Xu, Shou-Shu Gong, Tao Shi, Jian Cui

Abstract: Rydberg atom array has been established as one appealing platform for quantum simulation and quantum computation. Recent experimental development of trapping and controlling two-species atoms using optical tweezer arrays has brought more complex interactions in this game, enabling much versatile novel quantum states and phenomena to emerge and thus leading to a growing need for both theoretical an… ▽ More Rydberg atom array has been established as one appealing platform for quantum simulation and quantum computation. Recent experimental development of trapping and controlling two-species atoms using optical tweezer arrays has brought more complex interactions in this game, enabling much versatile novel quantum states and phenomena to emerge and thus leading to a growing need for both theoretical and numerical investigations in this regard. In this paper we systematically calculate the ground state phase diagram of alternating two-species atom array and find some novel quantum states that cannot exist in traditional cold-atom platforms, for instance the period $4$ product state $|1100\rangle^{\otimes m}$, the period $6$ product state $|111000\rangle^{\otimes m}$ and order-disorder separation phase. We also confirm the existence of floating phase, however, in this system it has to be described by two interacting bosonic fields whereas that in the single species Rydberg atom array can be understood as free bosons. More interestingly, in the quench dynamics we discover a type of new quantum many-body scar distinct from that previous found in single species atoms which is explained by low-energy effective theory of the PXP model. Instead, the underlying physics of the newly found quantum many-body scar can be described by a perturbation theory spanning the whole energy spectrum. Detailed analysis on how to experimentally prepare these states and observe the phenomena is provided. Numerical evidence shows that the proposed scheme is robust against typical experimentally relevent imperfections and thus it is implementable. Our work opens new avenue for quantum simulating novel quantum many-body states both in and out of equilibrium arising from the interplay of competing interactions of different atom species and quantum fluctuations. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 19 pages, 19 figures

arXiv:2408.15549 [pdf, other]

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

Authors: Taiwei Shi, Zhuoer Wang, Longqi Yang, Ying-Chun Lin, Zexue He, Mengting Wan, Pei Zhou, Sujay Jauhar, Xiaofeng Xu, Xia Song, Jennifer Neville

Abstract: As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a n… ▽ More As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a novel framework that leverages real-time, in-situ user interactions to create preference datasets that more accurately reflect authentic human values. WildFeedback operates through a three-step process: feedback signal identification, preference data construction, and user-guided evaluation. We applied this framework to a large corpus of user-LLM conversations, resulting in a rich preference dataset that reflects genuine user preferences. This dataset captures the nuances of user preferences by identifying and classifying feedback signals within natural conversations, thereby enabling the construction of more representative and context-sensitive alignment data. Our extensive experiments demonstrate that LLMs fine-tuned on WildFeedback exhibit significantly improved alignment with user preferences, as evidenced by both traditional benchmarks and our proposed user-guided evaluation. By incorporating real-time feedback from actual users, WildFeedback addresses the scalability, subjectivity, and bias challenges that plague existing approaches, marking a significant step toward developing LLMs that are more responsive to the diverse and evolving needs of their users. In summary, WildFeedback offers a robust, scalable solution for aligning LLMs with true human values, setting a new standard for the development and evaluation of user-centric language models. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 24 pages

arXiv:2408.08394 [pdf]

doi 10.1038/s41467-024-51255-3

A topological Hund nodal line antiferromagnet

Authors: Xian P. Yang, Yueh-Ting Yao, Pengyu Zheng, Shuyue Guan, Huibin Zhou, Tyler A. Cochran, Che-Min Lin, Jia-Xin Yin, Xiaoting Zhou, Zi-Jia Cheng, Zhaohu Li, Tong Shi, Md Shafayat Hossain, Shengwei Chi, Ilya Belopolski, Yu-Xiao Jiang, Maksim Litskevich, Gang Xu, Zhaoming Tian, Arun Bansil, Zhiping Yin, Shuang Jia, Tay-Rong Chang, M. Zahid Hasan

Abstract: The interplay of topology, magnetism, and correlations gives rise to intriguing phases of matter. In this study, through state-of-the-art angle-resolved photoemission spectroscopy, density functional theory and dynamical mean-field theory calculations, we visualize a fourfold degenerate Dirac nodal line at the boundary of the bulk Brillouin zone in the antiferromagnet YMn2Ge2. We further demonstra… ▽ More The interplay of topology, magnetism, and correlations gives rise to intriguing phases of matter. In this study, through state-of-the-art angle-resolved photoemission spectroscopy, density functional theory and dynamical mean-field theory calculations, we visualize a fourfold degenerate Dirac nodal line at the boundary of the bulk Brillouin zone in the antiferromagnet YMn2Ge2. We further demonstrate that this gapless, antiferromagnetic Dirac nodal line is enforced by the combination of magnetism, space-time inversion symmetry and nonsymmorphic lattice symmetry. The corresponding drumhead surface states traverse the whole surface Brillouin zone. YMn2Ge2 thus serves as a platform to exhibit the interplay of multiple degenerate nodal physics and antiferromagnetism. Interestingly, the magnetic nodal line displays a d-orbital dependent renormalization along its trajectory in momentum space, thereby manifesting Hund coupling. Our findings offer insights into the effect of electronic correlations on magnetic Dirac nodal lines, leading to an antiferromagnetic Hund nodal line. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Journal ref: Nature Communications volume 15, Article number: 7052 (2024)

arXiv:2408.08209 [pdf, other]

Modeling Domain and Feedback Transitions for Cross-Domain Sequential Recommendation

Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Qi Liu, Ruobing Xie, Jun Xu, Ji-Rong Wen

Abstract: Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain… ▽ More Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain sequential recommendation methods typically model user interests by focusing solely on information about domain transitions, often overlooking the valuable insights provided by users' feedback transitions. In this paper, we propose $\text{Transition}^2$, a novel method to model transitions across both domains and types of user feedback. Specifically, $\text{Transition}^2$ introduces a transition-aware graph encoder based on user history, assigning different weights to edges according to the feedback type. This enables the graph encoder to extract historical embeddings that capture the transition information between different domains and feedback types. Subsequently, we encode the user history using a cross-transition multi-head self-attention, incorporating various masks to distinguish different types of transitions. Finally, we integrate these modules to make predictions across different domains. Experimental results on two public datasets demonstrate the effectiveness of $\text{Transition}^2$. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.07791 [pdf, other]

An Efficient and Explanatory Image and Text Clustering System with Multimodal Autoencoder Architecture

Authors: Tiancheng Shi, Yuanchen Wei, John R. Kender

Abstract: We demonstrate the efficiencies and explanatory abilities of extensions to the common tools of Autoencoders and LLM interpreters, in the novel context of comparing different cultural approaches to the same international news event. We develop a new Convolutional-Recurrent Variational Autoencoder (CRVAE) model that extends the modalities of previous CVAE models, by using fully-connected latent laye… ▽ More We demonstrate the efficiencies and explanatory abilities of extensions to the common tools of Autoencoders and LLM interpreters, in the novel context of comparing different cultural approaches to the same international news event. We develop a new Convolutional-Recurrent Variational Autoencoder (CRVAE) model that extends the modalities of previous CVAE models, by using fully-connected latent layers to embed in parallel the CNN encodings of video frames, together with the LSTM encodings of their related text derived from audio. We incorporate the model within a larger system that includes frame-caption alignment, latent space vector clustering, and a novel LLM-based cluster interpreter. We measure, tune, and apply this system to the task of summarizing a video into three to five thematic clusters, with each theme described by ten LLM-produced phrases. We apply this system to two news topics, COVID-19 and the Winter Olympics, and five other topics are in progress. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.04998 [pdf, other]

ProFuser: Progressive Fusion of Large Language Models

Authors: Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang

Abstract: While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which… ▽ More While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which may provide limited insight towards model advantage. In this paper, we introduce a novel approach that enhances the fusion process by incorporating both the training and inference modes. Our method evaluates model advantage not only through cross entropy during training but also by considering inference outputs, providing a more comprehensive assessment. To combine the two modes effectively, we introduce ProFuser to progressively transition from inference mode to training mode. To validate ProFuser's effectiveness, we fused three models, including vicuna-7b-v1.5, Llama-2-7b-chat, and mpt-7b-8k-chat, and demonstrated the improved performance in knowledge, reasoning, and safety compared to baseline methods. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.02559 [pdf, other]

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Authors: Yauwai Yim, Chunkit Chan, Tianyu Shi, Zheye Deng, Wei Fan, Tianshi Zheng, Yangqiu Song

Abstract: Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by… ▽ More Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by open-source and API-based LLMs to sophisticated text-based games requiring agent collaboration under imperfect information, comparing their performance to established baselines using other types of agents. We propose a Theory of Mind (ToM) planning technique that allows LLM agents to adapt their strategy against various adversaries using only game rules, current state, and historical context as input. An external tool was incorporated to mitigate the challenge of dynamic and extensive action spaces in this card game. Our results show that although a performance gap exists between current LLMs and state-of-the-art reinforcement learning (RL) models, LLMs demonstrate ToM capabilities in this game setting. It consistently improves their performance against opposing agents, suggesting their ability to understand the actions of allies and adversaries and establish collaboration with allies. To encourage further research and understanding, we have made our codebase openly accessible. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2407.19256 [pdf]

Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review

Authors: Tongyue Shi, Jun Ma, Zihan Yu, Haowei Xu, Minqi Xiong, Meirong Xiao, Yilin Li, Huiying Zhao, Guilan Kong

Abstract: With the rapid development of artificial intelligence (AI), large language models (LLMs) have shown strong capabilities in natural language understanding, reasoning, and generation, attracting amounts of research interest in applying LLMs to health and medicine. Critical care medicine (CCM) provides diagnosis and treatment for critically ill patients who often require intensive monitoring and inte… ▽ More With the rapid development of artificial intelligence (AI), large language models (LLMs) have shown strong capabilities in natural language understanding, reasoning, and generation, attracting amounts of research interest in applying LLMs to health and medicine. Critical care medicine (CCM) provides diagnosis and treatment for critically ill patients who often require intensive monitoring and interventions in intensive care units (ICUs). Can LLMs be applied to CCM? Are LLMs just like stochastic parrots or ICU experts in assisting clinical decision-making? This scoping review aims to provide a panoramic portrait of the application of LLMs in CCM. Literature in seven databases, including PubMed, Embase, Scopus, Web of Science, CINAHL, IEEE Xplore, and ACM Digital Library, were searched from January 1, 2019, to June 10, 2024. Peer-reviewed journal and conference articles that discussed the application of LLMs in critical care settings were included. From an initial 619 articles, 24 were selected for final review. This review grouped applications of LLMs in CCM into three categories: clinical decision support, medical documentation and reporting, and medical education and doctor-patient communication. LLMs have advantages in handling unstructured data and do not require manual feature engineering. Meanwhile, applying LLMs to CCM faces challenges, including hallucinations, poor interpretability, bias and alignment challenges, and privacy and ethics issues. Future research should enhance model reliability and interpretability, integrate up-to-date medical knowledge, and strengthen privacy and ethical guidelines. As LLMs evolve, they could become key tools in CCM to help improve patient outcomes and optimize healthcare delivery. This study is the first review of LLMs in CCM, aiding researchers, clinicians, and policymakers to understand the current status and future potentials of LLMs in CCM. △ Less

Submitted 27 July, 2024; originally announced July 2024.

Comments: 28 pages, 5 figures

arXiv:2407.17702 [pdf, other]

Universal clusters in quasi-two-dimensional ultracold Fermi mixtures

Authors: Ruijin Liu, Tingting Shi, Matteo Zaccanti, Xiaoling Cui

Abstract: We study universal clusters in quasi-two dimensions (q2D) that consist of a light (L) atom interacting with two or three heavy (H) identical fermions, forming the trimer or tetramer bound state. The axial confinement in q2D is shown to lift the three-fold degeneracy of 3D trimer (tetramer) in $p$-wave channel and uniquely select the ground state with magnetic angular momentum $|m|=1$ ($m=0$). By v… ▽ More We study universal clusters in quasi-two dimensions (q2D) that consist of a light (L) atom interacting with two or three heavy (H) identical fermions, forming the trimer or tetramer bound state. The axial confinement in q2D is shown to lift the three-fold degeneracy of 3D trimer (tetramer) in $p$-wave channel and uniquely select the ground state with magnetic angular momentum $|m|=1$ ($m=0$). By varying the interaction or confinement strength, we explore the dimensional crossover of these clusters from 3D to 2D, characterized by a gradual change of critical H-L mass ratio for their emergence and momentum-space distribution. Importantly, we find that a finite effective range will {\it not} alter their critical mass ratios in the weak coupling regime. There, we establish an effective 2D model to quantitatively reproduce the properties of q2D clusters, and further identify the optimal interaction strengths for their detections in experiments. Our results suggest a promising prospect for observing universal clusters and associated high-order correlation effects in realistic q2D ultracold Fermi mixtures. △ Less

Submitted 3 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

Comments: 6 pages, 4 figures, with supplementary material (8 pages, 4 figures)

arXiv:2407.16857 [pdf, other]

SECRM-2D: RL-Based Efficient and Comfortable Route-Following Autonomous Driving with Analytic Safety Guarantees

Authors: Tianyu Shi, Ilia Smirnov, Omar ElSamadisy, Baher Abdulhai

Abstract: Over the last decade, there has been increasing interest in autonomous driving systems. Reinforcement Learning (RL) shows great promise for training autonomous driving controllers, being able to directly optimize a combination of criteria such as efficiency comfort, and stability. However, RL- based controllers typically offer no safety guarantees, making their readiness for real deployment questi… ▽ More Over the last decade, there has been increasing interest in autonomous driving systems. Reinforcement Learning (RL) shows great promise for training autonomous driving controllers, being able to directly optimize a combination of criteria such as efficiency comfort, and stability. However, RL- based controllers typically offer no safety guarantees, making their readiness for real deployment questionable. In this paper, we propose SECRM-2D (the Safe, Efficient and Comfortable RL- based driving Model with Lane-Changing), an RL autonomous driving controller (both longitudinal and lateral) that balances optimization of efficiency and comfort and follows a fixed route, while being subject to hard analytic safety constraints. The aforementioned safety constraints are derived from the criterion that the follower vehicle must have sufficient headway to be able to avoid a crash if the leader vehicle brakes suddenly. We evaluate SECRM-2D against several learning and non-learning baselines in simulated test scenarios, including freeway driving, exiting, merging, and emergency braking. Our results confirm that representative previously-published RL AV controllers may crash in both training and testing, even if they are optimizing a safety objective. By contrast, our controller SECRM-2D is successful in avoiding crashes during both training and testing, improves over the baselines in measures of efficiency and comfort, and is more faithful in following the prescribed route. In addition, we achieve a good theoretical understanding of the longitudinal steady-state of a collection of SECRM-2D vehicles. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.11290 [pdf, other]

Distributed memory parallel adaptive tensor-train cross approximation

Authors: Tianyi Shi, Daniel Hayes, Jing-Mei Qiu

Abstract: The tensor-train (TT) format is a data-sparse tensor representation commonly used in high dimensional function approximations arising from computational and data sciences. Various sequential and parallel TT decomposition algorithms have been proposed for different tensor inputs and assumptions. In this paper, we propose subtensor parallel adaptive TT cross, which partitions a tensor onto distribut… ▽ More The tensor-train (TT) format is a data-sparse tensor representation commonly used in high dimensional function approximations arising from computational and data sciences. Various sequential and parallel TT decomposition algorithms have been proposed for different tensor inputs and assumptions. In this paper, we propose subtensor parallel adaptive TT cross, which partitions a tensor onto distributed memory machines with multidimensional process grids, and constructs an TT approximation iteratively with tensor elements. We derive two iterative formulations for pivot selection and TT core construction under the distributed memory setting, conduct communication and scaling analysis of the algorithm, and illustrate its performance with multiple test experiments. These include up to 6D Hilbert tensors and tensors constructed from Maxwellian distribution functions that arise in kinetic theory. Our results demonstrate significant accuracy with greatly reduced storage requirements via the TT cross approximation. Furthermore, we demonstrate good to optimal strong and weak scaling performance for the proposed parallel algorithm. △ Less

Submitted 15 July, 2024; originally announced July 2024.

MSC Class: 15A69; 65Y05; 65F99

arXiv:2407.09066 [pdf]

Physical encryption and decryption for secure data transmission in optical networks leveraging the temporal Talbot effect and microwave photonics

Authors: Chulun Lin, Taixia Shi, Yiqing Liu, Yang Chen

Abstract: A novel microwave photonic scheme for secure data transmission in optical networks is proposed. The security of the scheme is guaranteed by physical encryption and decryption via the temporal Talbot effect in dispersive mediums. First, the original data is randomized in the digital domain by performing an exclusive OR operation using a random matrix. Subsequently, a time-varying multi-tone electri… ▽ More A novel microwave photonic scheme for secure data transmission in optical networks is proposed. The security of the scheme is guaranteed by physical encryption and decryption via the temporal Talbot effect in dispersive mediums. First, the original data is randomized in the digital domain by performing an exclusive OR operation using a random matrix. Subsequently, a time-varying multi-tone electrical signal, which represents the randomized data matrix, is modulated onto an optical carrier. The optical signal after modulation is then phase-modulated by a temporal Talbot array illuminator (TAI) signal, and the optical signal after discrete quadratic phase modulation will lose its original appearance in the frequency domain and be further dispersed in the first dispersive medium. Due to the dispersion that does not match the TAI signal exactly, the waveform after the first dispersive medium is a noise-like signal. Hence, the physical encryption of the original data is successfully achieved. As the optical signal passes a second dispersive medium that makes the total dispersion match the TAI signal, the temporal waveform of the noise-like signal after photodetection is transformed into pulses. "1" and "0" in the randomized data matrix are represented through the presence and absence of pulses, and the physical decryption is achieved. By further processing the recovered data matrix using the random matrix, the original data can be recovered. The physical layer security of the proposed scheme and its fiber transmission capability are demonstrated. 8-Gbit/s data is transmitted, encrypted, and decrypted using two dispersive mediums and an optical fiber of 10 to 200 km, and error-free transmission is achieved. Many factors that affect the encryption, decryption, and transmission performance of the system have been analyzed. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 19 pages, 15 figures, 1 table

arXiv:2407.06083 [pdf, other]

A Survey of Controllable Learning: Methods and Applications in Information Retrieval

Authors: Chenglei Shen, Xiao Zhang, Teng Shi, Changshuo Zhang, Guofu Xie, Jun Xu

Abstract: Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorize… ▽ More Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorizes CL according to who controls (users or platforms), what is controllable (e.g., retrieval objectives, users' historical behaviors, controllable environmental adaptation), how control is implemented (e.g., rule-based method, Pareto optimization, Hypernetwork), and where to implement control (e.g.,pre-processing, in-processing, post-processing methods). Then, we identify challenges faced by CL across training, evaluation, task setting, and deployment in online environments. Additionally, we outline promising directions for CL in theoretical analysis, efficient computation, empowering large language models, application scenarios and evaluation frameworks in IR. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.06067 [pdf, other]

Faraday laser pumped cesium beam clock

Authors: Hangbo Shi, Xiaomin Qin, Haijun Chen, Yufei Yan, Ziqi Lu, Zhiyang Wang, Zijie Liu, Xiaolei Guan, Qiang Wei, Tiantian Shi, Jingbiao Chen

Abstract: We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday lase… ▽ More We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday laser is 2.5 kHz after MTS locking, and the fractional frequency stability of the Faraday laser is optimized to $1.8\times{10}^{-12}/\sqrtτ$. Based on this high-performance Faraday laser, the cesium beam clock realizes a signal-to-noise ratio (SNR) in 1 Hz bandwidth of $39600$ when the cesium oven temperature is 130°C. Frequency-compared with Hydrogen maser, the fractional frequency stability of the Faraday laser pumped cesium beam clock can reach $1.3\times{10}^{-12}/\sqrtτ$ and drops to $1.4\times{10}^{-14}$ at 10000 s when the cesium oven temperature is 110°C. %, which is the best reported result compared with other cesium beam clocks. This Faraday laser pumped cesium beam clock demonstrates its excellent performance, and its great potential in the fields of timekeeping, navigation, and communication. Meanwhile, the Faraday laser, as a high-performance optical frequency standard, can also contribute to the development of other applications in quantum metrology, precision measurement and atomic physics. △ Less

Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03332 [pdf, other]

DDPM-MoCo: Advancing Industrial Surface Defect Generation and Detection with Generative and Contrastive Learning

Authors: Yangfan He, Xinyan Wang, Tianyu Shi

Abstract: The task of industrial detection based on deep learning often involves solving two problems: (1) obtaining sufficient and effective data samples, (2) and using efficient and convenient model training methods. In this paper, we introduce a novel defect-generation method, named DDPM-MoCo, to address these issues. Firstly, we utilize the Denoising Diffusion Probabilistic Model (DDPM) to generate high… ▽ More The task of industrial detection based on deep learning often involves solving two problems: (1) obtaining sufficient and effective data samples, (2) and using efficient and convenient model training methods. In this paper, we introduce a novel defect-generation method, named DDPM-MoCo, to address these issues. Firstly, we utilize the Denoising Diffusion Probabilistic Model (DDPM) to generate high-quality defect data samples, overcoming the problem of insufficient sample data for model learning. Furthermore, we utilize the unsupervised learning Momentum Contrast model (MoCo) with an enhanced batch contrastive loss function for training the model on unlabeled data, addressing the efficiency and consistency challenges in large-scale negative sample encoding during diffusion model training. The experimental results showcase an enhanced visual detection method for identifying defects on metal surfaces, covering the entire process, starting from generating unlabeled sample data for training the diffusion model, to utilizing the same labeled sample data for downstream detection tasks. This study offers valuable practical insights and application potential for visual detection in the metal processing industry. △ Less

Submitted 9 May, 2024; originally announced July 2024.

arXiv:2407.01219 [pdf, other]

Searching for Best Practices in Retrieval-Augmented Generation

Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times. Typically, a RAG workflow involves multiple processing steps, each of which can be executed in various ways. Here, we investigate existing RAG approaches and their potential combinations to identify optimal RAG practices. Through extensive experiments, we suggest several strategies for deploying RAG that balance both performance and efficiency. Moreover, we demonstrate that multimodal retrieval techniques can significantly enhance question-answering capabilities about visual inputs and accelerate the generation of multimodal content using a "retrieval as generation" strategy. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.17807 [pdf, other]

Enhancing Commentary Strategies for Imperfect Information Card Games: A Study of Large Language Models in Guandan Commentary

Authors: Meiling Tao, Xuechen Liang, Ziyi Wang, Yiling Tao, Tianyu Shi

Abstract: Recent advancements in large language models (LLMs) have unlocked the potential for generating high-quality game commentary. However, producing insightful and engaging commentary for complex games with incomplete information remains a significant challenge. In this paper, we introduce a novel commentary method that combine Reinforcement Learning (RL) and LLMs, tailored specifically for the Chinese… ▽ More Recent advancements in large language models (LLMs) have unlocked the potential for generating high-quality game commentary. However, producing insightful and engaging commentary for complex games with incomplete information remains a significant challenge. In this paper, we introduce a novel commentary method that combine Reinforcement Learning (RL) and LLMs, tailored specifically for the Chinese card game \textit{Guandan}. Our system leverages RL to generate intricate card-playing scenarios and employs LLMs to generate corresponding commentary text, effectively emulating the strategic analysis and narrative prowess of professional commentators. The framework comprises a state commentary guide, a Theory of Mind (ToM)-based strategy analyzer, and a style retrieval module, which seamlessly collaborate to deliver detailed and context-relevant game commentary in the Chinese language environment. We empower LLMs with ToM capabilities and refine both retrieval and information filtering mechanisms. This facilitates the generation of personalized commentary content. Our experimental results showcase the substantial enhancement in performance achieved by the proposed commentary framework when applied to open-source LLMs, surpassing the performance of GPT-4 across multiple evaluation metrics. △ Less

Submitted 3 August, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16942 [pdf, other]

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

arXiv:2406.16062 [pdf, other]

Towards Biologically Plausible Computing: A Comprehensive Comparison

Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, global error computation, and dual-phase training. To address this long-standing challenge, many studies have endeavored to devise biologically plausible training algorithms. However, a fully biologically plausible algorithm for training multilayer neural networks remains elusive, and interpretations of biological plausibility vary among researchers. In this study, we establish criteria for biological plausibility that a desirable learning algorithm should meet. Using these criteria, we evaluate a range of existing algorithms considered to be biologically plausible, including Hebbian learning, spike-timing-dependent plasticity, feedback alignment, target propagation, predictive coding, forward-forward algorithm, perturbation learning, local losses, and energy-based learning. Additionally, we empirically evaluate these algorithms across diverse network architectures and datasets. We compare the feature representations learned by these algorithms with brain activity recorded by non-invasive devices under identical stimuli, aiming to identify which algorithm can most accurately replicate brain activity patterns. We are hopeful that this study could inspire the development of new biologically plausible algorithms for training multilayer networks, thereby fostering progress in both the fields of neuroscience and machine learning. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14067 [pdf]

A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz. The IF LFM signal is converted to the optical domain via an intensity modulator and then filtered by a fiber Bragg grating (FBG) to generate only two 2nd-order optical LFM sidebands. In radar detection, the two optical LFM sidebands beat with each other to generate a frequency-and-bandwidth-quadrupled LFM signal, which is used for ranging, radial velocity measurement, and imaging. By changing the center frequency of the IF LFM signal, the radar function can be operated within 8 to 40 GHz. In spectrum sensing, one 2nd-order optical LFM sideband is selected by another FBG, which then works in conjunction with the stimulated Brillouin scattering gain spectrum to map the frequency of the signal under test to time with an instantaneous measurement bandwidth of 2 GHz. By using a frequency shift module to adjust the pump frequency, the frequency measurement range can be adjusted from 0 to 40 GHz. The prototype is comprehensively studied and tested, which is capable of achieving a range resolution of 3.75 cm, a range error of less than $\pm$ 2 cm, a radial velocity error within $\pm$ 1 cm/s, delivering clear imaging of multiple small targets, and maintaining a frequency measurement error of less than $\pm$ 7 MHz and a frequency resolution of better than 20 MHz. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 18 pages, 12 figures, 1 table

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07590 [pdf, other]

StreamPrompt: Learnable Prompt-guided Data Selection for Efficient Stream Learning

Authors: Tongjun Shi, Shuhao Zhang

Abstract: Stream Learning (SL) requires models to rapidly adapt to continuous data streams, setting it apart from traditional Continual Learning (CL). Recent SL methods emphasize efficiency by selecting data subsets for training, but they often struggle due to their reliance on static, rule-based selection algorithms that cannot effectively adapt to the changing importance of data. In this work, we introduc… ▽ More Stream Learning (SL) requires models to rapidly adapt to continuous data streams, setting it apart from traditional Continual Learning (CL). Recent SL methods emphasize efficiency by selecting data subsets for training, but they often struggle due to their reliance on static, rule-based selection algorithms that cannot effectively adapt to the changing importance of data. In this work, we introduce StreamPrompt, a method that enhances data selection through dynamic, learnable prompts. These dynamic prompts serve two purposes beyond guiding model inference: 1) optimizing data selection, and 2) guiding updates to the rehearsal buffer. This approach addresses the challenges of adaptability and computational efficiency in processing continuous data streams. Moreover, StreamPrompt introduces Prompt Attunement,a mechanism that enhances the efficiency of prompt learning. By leveraging attention layers from vision transformers and softly combining their outputs with a gate unit, Prompt Attunementrefines prompts with minimal computational resources. Comprehensive evaluations demonstrate StreamPrompts superior performance over state-of-the-art, with significant improvements in accuracy and reductions in training time. These results underscore the efficacy and efficiency of StreamPrompt, establishing its potential as a scalable and effective solution for the evolving demands of SL. Our code is available at https://github.com/intellistream/Efficient-Stream-Learning. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06412 [pdf, ps, other]

Bose-Einstein condensates of microwave-shielded polar molecules

Authors: Wei-Jian Jin, Fulin Deng, Su Yi, Tao Shi

Abstract: We investigate the ground-state properties of the ultracold gases of bosonic microwave-shielded polar molecules. To account for the large shielding core of the inter-molecular potential, we adopt a variational ansatz incorporating the Jastrow correlation factor. We show that the system is always stable and supports a self-bound gas phase and an expanding gas phase. We also calculate the condensate… ▽ More We investigate the ground-state properties of the ultracold gases of bosonic microwave-shielded polar molecules. To account for the large shielding core of the inter-molecular potential, we adopt a variational ansatz incorporating the Jastrow correlation factor. We show that the system is always stable and supports a self-bound gas phase and an expanding gas phase. We also calculate the condensate fraction which is significantly reduced when the size of the shielding core of the two-body potential becomes comparable to the inter-molecular distance. Our studies distinguish the molecular condensates from the atomic ones and invalidate the application of the Gross-Pitaevskii equation to the microwave-shielded molecular gases. Our work paves the way for studying the Bose-Einstein condensations of ultracold gases of microwave-shielded polar molecules. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05628 [pdf, other]

Domain Generalization Guided by Large-Scale Pre-Trained Priors

Authors: Zongbin Wang, Bin Pan, Shiyu Shen, Tianyang Shi, Zhenwei Shi

Abstract: Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to m… ▽ More Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to maintain this ability, it could further enhance the generalization ability of the DG model. For this purpose, we introduce a new method called Fine-Tune with Large-scale pre-trained Priors (FT-LP), which incorporates the pre-trained model as a prior into the DG fine-tuning process, ensuring that the model refers to its pre-trained model at each optimization step. FT-LP comprises a theoretical framework and a simple implementation strategy. In theory, we verify the rationality of FT-LP by introducing a generalization error bound with the pre-trained priors for DG. In implementation, we utilize an encoder to simulate the model distribution, enabling the use of FT-LP when only pre-trained weights are available. In summary, we offer a new fine-tuning method for DG algorithms to utilize pre-trained models throughout the fine-tuning process. Through experiments on various datasets and DG models, our proposed method exhibits significant improvements, indicating its effectiveness. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.04828 [pdf, other]

QAGCF: Graph Collaborative Filtering for Q&A Recommendation

Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Yanping Zheng, Ruobing Xie, Qi Liu, Jun Xu, Ji-Rong Wen

Abstract: Question and answer (Q&A) platforms usually recommend question-answer pairs to meet users' knowledge acquisition needs, unlike traditional recommendations that recommend only one item. This makes user behaviors more complex, and presents two challenges for Q&A recommendation, including: the collaborative information entanglement, which means user feedback is influenced by either the question or th… ▽ More Question and answer (Q&A) platforms usually recommend question-answer pairs to meet users' knowledge acquisition needs, unlike traditional recommendations that recommend only one item. This makes user behaviors more complex, and presents two challenges for Q&A recommendation, including: the collaborative information entanglement, which means user feedback is influenced by either the question or the answer; and the semantic information entanglement, where questions are correlated with their corresponding answers, and correlations also exist among different question-answer pairs. Traditional recommendation methods treat the question-answer pair as a whole or only consider the answer as a single item, which overlooks the two challenges and cannot effectively model user interests. To address these challenges, we introduce Question & Answer Graph Collaborative Filtering (QAGCF), a graph neural network model that creates separate graphs for collaborative and semantic views to disentangle the information in question-answer pairs. The collaborative view disentangles questions and answers to individually model collaborative information, while the semantic view captures the semantic information both within and between question-answer pairs. These views are further merged into a global graph to integrate the collaborative and semantic information. Polynomial-based graph filters are used to address the high heterophily issues of the global graph. Additionally, contrastive learning is utilized to obtain robust embeddings during training. Extensive experiments on industrial and public datasets demonstrate that QAGCF consistently outperforms baselines and achieves state-of-the-art results. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04712 [pdf, other]

AICoderEval: Improving AI Domain Code Generation of Large Language Models

Authors: Yinghui Xia, Yuyan Chen, Tianyu Shi, Jun Wang, Jinsong Yang

Abstract: Automated code generation is a pivotal capability of large language models (LLMs). However, assessing this capability in real-world scenarios remains challenging. Previous methods focus more on low-level code generation, such as model loading, instead of generating high-level codes catering for real-world tasks, such as image-to-text, text classification, in various domains. Therefore, we construc… ▽ More Automated code generation is a pivotal capability of large language models (LLMs). However, assessing this capability in real-world scenarios remains challenging. Previous methods focus more on low-level code generation, such as model loading, instead of generating high-level codes catering for real-world tasks, such as image-to-text, text classification, in various domains. Therefore, we construct AICoderEval, a dataset focused on real-world tasks in various domains based on HuggingFace, PyTorch, and TensorFlow, along with comprehensive metrics for evaluation and enhancing LLMs' task-specific code generation capability. AICoderEval contains test cases and complete programs for automated evaluation of these tasks, covering domains such as natural language processing, computer vision, and multimodal learning. To facilitate research in this area, we open-source the AICoderEval dataset at \url{https://huggingface.co/datasets/vixuowis/AICoderEval}. After that, we propose CoderGen, an agent-based framework, to help LLMs generate codes related to real-world tasks on the constructed AICoderEval. Moreover, we train a more powerful task-specific code generation model, named AICoder, which is refined on llama-3 based on AICoderEval. Our experiments demonstrate the effectiveness of CoderGen in improving LLMs' task-specific code generation capability (by 12.00\% on pass@1 for original model and 9.50\% on pass@1 for ReAct Agent). AICoder also outperforms current code generation LLMs, indicating the great quality of the AICoderEval benchmark. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.16847 [pdf, other]

TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction

Authors: Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu

Abstract: Autoregressive next-token prediction is a standard pretraining method for large-scale language models, but its application to vision tasks is hindered by the non-sequential nature of image data, leading to cumulative errors. Most vision models employ masked autoencoder (MAE) based pretraining, which faces scalability issues. To address these challenges, we introduce \textbf{TokenUnify}, a novel pr… ▽ More Autoregressive next-token prediction is a standard pretraining method for large-scale language models, but its application to vision tasks is hindered by the non-sequential nature of image data, leading to cumulative errors. Most vision models employ masked autoencoder (MAE) based pretraining, which faces scalability issues. To address these challenges, we introduce \textbf{TokenUnify}, a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction. We provide theoretical evidence demonstrating that TokenUnify mitigates cumulative errors in visual autoregression. Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution, ideal for creating spatially correlated long sequences. This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date and providing a unified benchmark for experimental validation. Leveraging the Mamba network inherently suited for long-sequence modeling on this dataset, TokenUnify not only reduces the computational complexity but also leads to a significant 45\% improvement in segmentation performance on downstream EM neuron segmentation tasks compared to existing methods. Furthermore, TokenUnify demonstrates superior scalability over MAE and traditional autoregressive methods, effectively bridging the gap between pretraining strategies for language and vision models. Code is available at \url{https://github.com/ydchen0806/TokenUnify}. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16701 [pdf, other]

Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Authors: Tong Shi, Xuri Ge, Joemon M. Jose, Nicolas Pugeault, Paul Henderson

Abstract: Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to local details, such as facial state changes between video frames, which can reduce the discriminability of features and thus lower recognition accuracy. In this paper, we propose a Detail-Enhanced Intra- and Inter-modal Intera… ▽ More Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to local details, such as facial state changes between video frames, which can reduce the discriminability of features and thus lower recognition accuracy. In this paper, we propose a Detail-Enhanced Intra- and Inter-modal Interaction network(DE-III) for AVER, incorporating several novel aspects. We introduce optical flow information to enrich video representations with texture details that better capture facial state changes. A fusion module integrates the optical flow estimation with the corresponding video frames to enhance the representation of facial texture variations. We also design attentive intra- and inter-modal feature enhancement modules to further improve the richness and discriminability of video and audio representations. A detailed quantitative evaluation shows that our proposed model outperforms all existing methods on three benchmark datasets for both concrete and continuous emotion recognition. To encourage further research and ensure replicability, we will release our full code upon acceptance. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Submitted to 27th International Conference of Pattern Recognition (ICPR 2024)

arXiv:2405.16553 [pdf, ps, other]

Unveiling quantum phases in quasi-one-dimensional dipolar gases using continuous matrix product state

Authors: Li Peng, Junqiao Pan, Su Yi, Tao Shi

Abstract: We investigate the ground-state properties of the quasi-one-dimensional dipolar gases using continuous matrix product states techniques. Making use of the first- and second-order correlation functions, we find that the system supports the superfluid, super-Tonks-Girardeau, and quasicrystal phases according to the Luttinger liquid theory. We also map out the phase diagram on the parameter plane con… ▽ More We investigate the ground-state properties of the quasi-one-dimensional dipolar gases using continuous matrix product states techniques. Making use of the first- and second-order correlation functions, we find that the system supports the superfluid, super-Tonks-Girardeau, and quasicrystal phases according to the Luttinger liquid theory. We also map out the phase diagram on the parameter plane consisting the contact and dipolar interaction strengths. Furthermore, we compute the Luttinger parameter, the structure factor, and the momentum distribution of the system. Finally, we show that the predicted dipolar effect can potentially be observed in quasi-one-dimensional gases of polar molecules. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.15271 [pdf]

Seamless Integration and Implementation of Distributed Contact and Contactless Vital Sign Monitoring

Authors: Dingding Liang, Yang Chen, Jiawei Gao, Taixia Shi, Jianping Yao

Abstract: Real-time vital sign monitoring is gaining immense significance not only in the medical field but also in personal health management. Facing the needs of different application scenarios of the smart and healthy city in the future, the low-cost, large-scale, scalable, and distributed vital sign monitoring system is of great significance. In this work, a seamlessly integrated contact and contactless… ▽ More Real-time vital sign monitoring is gaining immense significance not only in the medical field but also in personal health management. Facing the needs of different application scenarios of the smart and healthy city in the future, the low-cost, large-scale, scalable, and distributed vital sign monitoring system is of great significance. In this work, a seamlessly integrated contact and contactless vital sign monitoring system, which can simultaneously implement respiration and heartbeat monitoring, is proposed. In contact vital sign monitoring, the chest wall movement due to respiration and heartbeat is translated into changes in the optical output intensity of a fiber Bragg grating (FBG). The FBG is also an important part of radar signal generation for contactless vital sign monitoring, in which the chest wall movement is translated into phase changes of the radar de-chirped signal. By analyzing the intensity of the FBG output and phase of the radar de-chirped signal, real-time respiration and heartbeat monitoring are realized. In addition, due to the distributed structure of the system and its good integration with the wavelength-division multiplexing optical network, it can be massively scaled by employing more wavelengths. A proof-of-concept experiment is carried out. Contact and contactless respiration and heartbeat monitoring of three people are simultaneously realized. During a monitoring time of 60 s, the maximum absolute measurement errors of respiration and heartbeat rates are 1.6 respirations per minute and 2.3 beats per minute, respectively. The measurement error does not have an obvious change even when the monitoring time is decreased to 5 s. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 14 pages,9 figures

arXiv:2405.13645 [pdf, other]

Formation and Dissociation of Field-Linked Tetramers

Authors: Fulin Deng, Xing-Yan Chen, Xin-Yu Luo, Wenxian Zhang, Su Yi, Tao Shi

Abstract: We investigate the static and dynamic properties of tetratomic molecules formed by two microwave-shielded polar molecules across field-linked resonances. In particular, we focus on two-body physics and experimental techniques unexplored in the recent experiment [X.-Y. Chen {\it et al}., Nature {\bf626}, 283 (2024)]. We show that, compared to the lowest tetramer state, higher tetramer states typica… ▽ More We investigate the static and dynamic properties of tetratomic molecules formed by two microwave-shielded polar molecules across field-linked resonances. In particular, we focus on two-body physics and experimental techniques unexplored in the recent experiment [X.-Y. Chen {\it et al}., Nature {\bf626}, 283 (2024)]. We show that, compared to the lowest tetramer state, higher tetramer states typically have longer lifetimes, which may facilitate a further cooling of tetramer gases towards quantum degeneracy. To detect tetramers, we identify the distinctive time-of-flight images from ramp dissociation, which can be observed by lowering the ramp rate of the microwave. Remarkably, in the modulational dissociation of tetramers, we find that multi-photon processes induce dissociation even below the threshold modulation frequency when the modulation amplitude is sufficiently high. Given the universal form of the inter-molecular potential for microwave-shielded polar molecules, our results also apply to other molecular gases widely explored in recent experiments. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.04135 [pdf, other]

In-context Learning for Automated Driving Scenarios

Authors: Ziqi Zhou, Jingyue Zhang, Jingyuan Zhang, Boyue Wang, Tianyu Shi, Alaa Khamis

Abstract: One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic e… ▽ More One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic environment descriptions are input into the LLM. The LLM then utilizes this information to assist in generating rewards, thereby steering the behavior of RL agents towards patterns that more closely resemble human driving. The experimental results demonstrate that this approach not only makes RL agents more anthropomorphic but also reaches better performance. Additionally, various strategies for reward-proxy and reward-shaping are investigated, revealing the significant impact of prompt design on shaping an AD vehicle's behavior. These findings offer a promising direction for the development of more advanced and human-like automated driving systems. Our experimental data and source code can be found here. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 7 pages, 6 figures, 35 references

arXiv:2405.02289 [pdf, other]

TSDiT: Traffic Scene Diffusion Models With Transformers

Authors: Chen Yang, Tianyu Shi

Abstract: In this paper, we introduce a novel approach to trajectory generation for autonomous driving, combining the strengths of Diffusion models and Transformers. First, we use the historical trajectory data for efficient preprocessing and generate action latent using a diffusion model with DiT(Diffusion with Transformers) Blocks to increase scene diversity and stochasticity of agent actions. Then, we co… ▽ More In this paper, we introduce a novel approach to trajectory generation for autonomous driving, combining the strengths of Diffusion models and Transformers. First, we use the historical trajectory data for efficient preprocessing and generate action latent using a diffusion model with DiT(Diffusion with Transformers) Blocks to increase scene diversity and stochasticity of agent actions. Then, we combine action latent, historical trajectories and HD Map features and put them into different transformer blocks. Finally, we use a trajectory decoder to generate future trajectories of agents in the traffic scene. The method exhibits superior performance in generating smooth turning trajectories, enhancing the model's capability to fit complex steering patterns. The experimental results demonstrate the effectiveness of our method in producing realistic and diverse trajectories, showcasing its potential for application in autonomous vehicle navigation systems. △ Less

Submitted 21 December, 2023; originally announced May 2024.

arXiv:2405.01063 [pdf, other]

Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach

Authors: Tianhao Shi, Yang Zhang, Jizhi Zhang, Fuli Feng, Xiangnan He

Abstract: As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequat… ▽ More As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequate means of capturing these attributes. In practice, the efficacy of these approaches is limited, pushing us to investigate ways of promoting fairness with limited sensitive attribute information. Toward this goal, it is important to reconstruct missing sensitive attributes. Nevertheless, reconstruction errors are inevitable due to the complexity of real-world sensitive attribute reconstruction problems and legal regulations. Thus, we pursue fair learning methods that are robust to reconstruction errors. To this end, we propose Distributionally Robust Fair Optimization (DRFO), which minimizes the worst-case unfairness over all potential probability distributions of missing sensitive attributes instead of the reconstructed one to account for the impact of the reconstruction errors. We provide theoretical and empirical evidence to demonstrate that our method can effectively ensure fairness in recommender systems when only limited sensitive attributes are accessible. △ Less

Submitted 27 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 8 pages, 5 figures, accepted by SIGIR'24

arXiv:2405.00478 [pdf, ps, other]

Dual-frequency optical-microwave atomic clocks based on cesium atoms

Authors: Tiantian Shi, Qiang Wei, Xiaomin Qin, Zhenfeng Liu, Kunkun Chen, Shiying Cao, Hangbo Shi, Zijie Liu, Jingbiao Chen

Abstract: $^{133}$Cs, which is the only stable cesium (Cs) isotope, is one of the most investigated elements in atomic spectroscopy and was used to realize the atomic clock in 1955. Among all atomic clocks, the cesium atomic clock has a special place, since the current unit of time is based on a microwave transition in the Cs atom. In addition, the long lifetime of the $6{\text{P}}_{3/2}… ▽ More $^{133}$Cs, which is the only stable cesium (Cs) isotope, is one of the most investigated elements in atomic spectroscopy and was used to realize the atomic clock in 1955. Among all atomic clocks, the cesium atomic clock has a special place, since the current unit of time is based on a microwave transition in the Cs atom. In addition, the long lifetime of the $6{\text{P}}_{3/2}$ state and simple preparation technique of Cs vapor cells have great relevance to quantum and atom optics experiments, which suggests the use of the $6{\text{S}} - 6{\text{P}}$ D2 transition as an optical frequency standard. In this work, using one laser as the local oscillator and Cs atoms as the quantum reference, we realized two atomic clocks in the optical and microwave frequencies, respectively. Both clocks could be freely switched or simultaneously output. The optical clock based on the vapor cell continuously operated with a frequency stability of $3.89 \times {10^{ - 13}}$ at 1 s, decreasing to $2.17 \times {10^{ - 13}}$ at 32 s, which was frequency stabilized by modulation transfer spectroscopy and estimated by an optical comb. Then, applying this stabilized laser for an optically pumped Cs beam atomic clock to reduce the laser frequency noise, we obtained a microwave clock with a frequency stability of $1.84 \times {10^{ - 12}}/\sqrt τ$, reaching $5.99 \times {10^{ - 15}}$ at $10^5$ s. This study demonstrates an attractive feature for the commercialization and deployment of optical and microwave clocks and will guide further development of integrated atomic clocks with better stability. Thus, this study lays the groundwork for future quantum metrology and laser physics. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 8 pages, 4 figures

arXiv:2404.12529 [pdf, other]

A Survey of Bluetooth Indoor Localization

Authors: Taolei Shi, Wei Gong

Abstract: Nowadays, indoor localization has received extensive research interest due to more and more applications' needs for location information to provide a more precise and effective service [1], [2]. There are various wireless techniques and mechanisms that have been proposed; some of them have been studied in depth and come into use, such as Wi-Fi, RFID, and sensor networks. In comparison, the develop… ▽ More Nowadays, indoor localization has received extensive research interest due to more and more applications' needs for location information to provide a more precise and effective service [1], [2]. There are various wireless techniques and mechanisms that have been proposed; some of them have been studied in depth and come into use, such as Wi-Fi, RFID, and sensor networks. In comparison, the development of Bluetooth location technology is slow and there are not many papers and surveys in this field, although the performance and market value of Bluetooth are increasing steadily. In this paper, we aim to provide a detailed survey of various indoor localization systems with Bluetooth. In contrast with the existing surveys, we categorize the exciting localization techniques that have been proposed in the literature in order to sketch the development of Bluetooth location compared to other technologies. We also evaluate different systems from the perspective of availability, cost, scalability, and accuracy. We also discuss remaining problems and challenges to accurate Bluetooth localization. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 8 pages, 2 figures

arXiv:2404.11168 [pdf]

Microwave photonic short-time Fourier transform based on stabilized period-one nonlinear laser dynamics and stimulated Brillouin scattering

Authors: Sunan Zhang, Taixia Shi, Lizhong Jiang, Yang Chen

Abstract: A microwave photonic short-time Fourier transform (STFT) system based on stabilized period-one (P1) nonlinear laser dynamics and stimulated Brillouin scattering (SBS) is proposed. By using an optoelectronic feedback loop, the frequency-sweep optical signal generated by the P1 nonlinear laser dynamics is stabilized, which is further used in conjunction with an optical bandpass filter implemented by… ▽ More A microwave photonic short-time Fourier transform (STFT) system based on stabilized period-one (P1) nonlinear laser dynamics and stimulated Brillouin scattering (SBS) is proposed. By using an optoelectronic feedback loop, the frequency-sweep optical signal generated by the P1 nonlinear laser dynamics is stabilized, which is further used in conjunction with an optical bandpass filter implemented by stimulated Brillouin scattering (SBS) to achieve the frequency-to-time mapping of microwave signals and the final STFT. By comparing the experimental results with and without optoelectronic feedback, it is found that the time-frequency diagram of the signal under test (SUT) obtained by STFT is clearer and more regular, and the frequency of the SUT measured in each frequency-sweep period is more accurate. The mean absolute error is reduced by 50% under the optimal filter bandwidth. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures

arXiv:2404.09520 [pdf, other]

UniSAR: Modeling User Transition Behaviors between Search and Recommendation

Authors: Teng Shi, Zihua Si, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Dewei Leng, Yanan Niu, Yang Song

Abstract: Nowadays, many platforms provide users with both search and recommendation services as important tools for accessing information. The phenomenon has led to a correlation between user search and recommendation behaviors, providing an opportunity to model user interests in a fine-grained way. Existing approaches either model user search and recommendation behaviors separately or overlook the differe… ▽ More Nowadays, many platforms provide users with both search and recommendation services as important tools for accessing information. The phenomenon has led to a correlation between user search and recommendation behaviors, providing an opportunity to model user interests in a fine-grained way. Existing approaches either model user search and recommendation behaviors separately or overlook the different transitions between user search and recommendation behaviors. In this paper, we propose a framework named UniSAR that effectively models the different types of fine-grained behavior transitions for providing users a Unified Search And Recommendation service. Specifically, UniSAR models the user transition behaviors between search and recommendation through three steps: extraction, alignment, and fusion, which are respectively implemented by transformers equipped with pre-defined masks, contrastive learning that aligns the extracted fine-grained user transitions, and cross-attentions that fuse different transitions. To provide users with a unified service, the learned representations are fed into the downstream search and recommendation models. Joint learning on both search and recommendation data is employed to utilize the knowledge and enhance each other. Experimental results on two public datasets demonstrated the effectiveness of UniSAR in terms of enhancing both search and recommendation simultaneously. The experimental analysis further validates that UniSAR enhances the results by successfully modeling the user transition behaviors between search and recommendation. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted by SIGIR 2024

arXiv:2404.08226 [pdf, other]

Improving Continuous Sign Language Recognition with Adapted Image Models

Authors: Lianyu Hu, Tongkai Shi, Liqing Gao, Zekang Liu, Wei Feng

Abstract: The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-… ▽ More The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-tuning the model easily forgets the generic essential knowledge acquired in the pretraining stage and overfits the downstream data. To enable high efficiency when adapting these large vision-language models (e.g., CLIP) to performing continuous sign language recognition (CSLR) while preserving their generalizability, we propose a novel strategy (AdaptSign). Especially, CLIP is adopted as the visual backbone to extract frame-wise features whose parameters are fixed, and a set of learnable modules are introduced to model spatial sign variations or capture temporal sign movements. The introduced additional modules are quite lightweight, only owning 3.2% extra computations with high efficiency. The generic knowledge acquired in the pretraining stage is well-preserved in the frozen CLIP backbone in this process. Extensive experiments show that despite being efficient, AdaptSign is able to demonstrate superior performance across a series of CSLR benchmarks including PHOENIX14, PHOENIX14-T, CSL-Daily and CSL compared to existing methods. Visualizations show that AdaptSign could learn to dynamically pay major attention to the informative spatial regions and cross-frame trajectories in sign videos. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.05680 [pdf, other]

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

Authors: Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, Xiaoguang Han

Abstract: While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often caus… ▽ More While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead. △ Less

Submitted 16 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted by ECCV 2024. Project page: https://lhyfst.github.io/spherehead

arXiv:2404.05374 [pdf]

Seamlessly merging radar ranging/imaging, wireless communications, and spectrum sensing, for 6G empowered by microwave photonics

Authors: Taixia Shi, Yang Chen, Jianping Yao

Abstract: Integration of radar, wireless communications, and spectrum sensing is being investigated for 6G with an increased spectral efficiency. Microwave photonics (MWP), a technique that combines microwave engineering and photonic technology to take advantage of the wide bandwidth offered by photonics for microwave signal generation and processing is considered an effective solution for the implementatio… ▽ More Integration of radar, wireless communications, and spectrum sensing is being investigated for 6G with an increased spectral efficiency. Microwave photonics (MWP), a technique that combines microwave engineering and photonic technology to take advantage of the wide bandwidth offered by photonics for microwave signal generation and processing is considered an effective solution for the implementation of the integration. In this paper, an MWP-assisted joint radar, wireless communications, and spectrum sensing (JRCSS) system that enables precise perception of the surrounding physical and electromagnetic environments while maintaining high-speed data communication is proposed and demonstrated. Communication signals and frequency-sweep signals are merged in the optical domain to achieve high-speed radar ranging and imaging, high-data-rate wireless communications, and wideband spectrum sensing. In an experimental demonstration, a JRCSS system supporting radar ranging with a measurement error within $\pm$ 4 cm, two-dimensional imaging with a resolution of 25 $\times$ 24.7 mm, wireless communications with a data rate of 2 Gbaud, and spectrum sensing with a frequency measurement error within $\pm$ 10 MHz in a 6-GHz bandwidth, is demonstrated. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 18 pages, 10 figures

arXiv:2404.02082 [pdf, other]

WcDT: World-centric Diffusion Transformer for Traffic Scene Generation

Authors: Chen Yang, Aaron Xuxiang Tian, Dong Chen, Tianyu Shi, Arsalan Heydarian

Abstract: In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer" (WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance t… ▽ More In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer" (WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance the scene diversity and stochasticity, the historical trajectory data is first preprocessed and encoded into latent space using Denoising Diffusion Probabilistic Models (DDPM) enhanced with Diffusion with Transformer (DiT) blocks. Then, the latent features, historical trajectories, HD map features, and historical traffic signal information are fused with various transformer-based encoders. The encoded traffic scenes are then decoded by a trajectory decoder to generate multimodal future trajectories. Comprehensive experimental results show that the proposed approach exhibits superior performance in generating both realistic and diverse trajectories, showing its potential for integration into automatic driving simulation systems. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 12 pages, 6 figures

arXiv:2404.01663 [pdf, other]

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

Authors: Xuechen Liang, Meiling Tao, Yinghui Xia, Tianyu Shi, Jun Wang, JingSong Yang

Abstract: Open large language models (LLMs) have significantly advanced the field of natural language processing, showcasing impressive performance across various tasks.Despite the significant advancements in LLMs, their effective operation still relies heavily on human input to accurately guide the dialogue flow, with agent tuning being a crucial optimization technique that involves human adjustments to th… ▽ More Open large language models (LLMs) have significantly advanced the field of natural language processing, showcasing impressive performance across various tasks.Despite the significant advancements in LLMs, their effective operation still relies heavily on human input to accurately guide the dialogue flow, with agent tuning being a crucial optimization technique that involves human adjustments to the model for better response to such guidance.Addressing this dependency, our work introduces the TinyAgent model, trained on a meticulously curated high-quality dataset. We also present the Collaborative Multi-Agent Tuning (CMAT) framework, an innovative system designed to augment language agent capabilities through adaptive weight updates based on environmental feedback. This framework fosters collaborative learning and real-time adaptation among multiple intelligent agents, enhancing their context-awareness and long-term memory. In this research, we propose a new communication agent framework that integrates multi-agent systems with environmental feedback mechanisms, offering a scalable method to explore cooperative behaviors. Notably, our TinyAgent-7B model exhibits performance on par with GPT-3.5, despite having fewer parameters, signifying a substantial improvement in the efficiency and effectiveness of LLMs. △ Less

Submitted 26 August, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.15986 [pdf, other]

doi 10.1103/PhysRevE.110.024128

Exact Work Distribution and Jarzynski's Equality of a Relativistic Particle in an Expanding Piston

Authors: Xianghang Zhang, Tingzhang Shi, H. T. Quan

Abstract: We study the non-equilibrium work in a pedagogical model of relativistic ideal gas. We obtain the exact work distribution and verify the Jarzynski's equality. In the non-relativistic limit, our results recover the non-relativistic results [arXiv:cond-mat/0502434]. We also find that, unlike the non-relativistic case, the work distribution no longer has zeros and the number of collisions in this rel… ▽ More We study the non-equilibrium work in a pedagogical model of relativistic ideal gas. We obtain the exact work distribution and verify the Jarzynski's equality. In the non-relativistic limit, our results recover the non-relativistic results [arXiv:cond-mat/0502434]. We also find that, unlike the non-relativistic case, the work distribution no longer has zeros and the number of collisions in this relativistic gas model is finite. In addition, based on an analysis of the experimental parameters, we conclude that it is difficult to detect the relativistic effects of the work distribution of the ideal gas in a piston system with the current experimental techniques. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.06384 [pdf, other]

Precision Spectroscopy and Nuclear Structure Parameters in 7Li+ ion

Authors: Hua Guan, Xiao-Qiu Qi, Peng-Peng Zhou, Wei Sun, Shao-Long Chen, Xu-Rui Chang, Yao Huang, Pei-Pei Zhang, Zong-Chao Yan, G. W. F. Drake, Ai-Xi Chen, Zhen-Xiang Zhong, Ting-Yun Shi, Ke-Lin Gao

Abstract: The optical Ramsey technique is used to obtain precise measurements of the hyperfine splittings in the $2\,^3\!S_1$ and $2\,^3\!P_J$ states of $^7$Li$^+$. Together with bound-state quantum electrodynamic theory, the Zemach radius and quadrupole moment of the $^7$Li nucleus are determined to be $3.35(1)$~fm and $-3.86(5)$~fm$^2$ respectively, with the quadrupole moment deviating from the recommende… ▽ More The optical Ramsey technique is used to obtain precise measurements of the hyperfine splittings in the $2\,^3\!S_1$ and $2\,^3\!P_J$ states of $^7$Li$^+$. Together with bound-state quantum electrodynamic theory, the Zemach radius and quadrupole moment of the $^7$Li nucleus are determined to be $3.35(1)$~fm and $-3.86(5)$~fm$^2$ respectively, with the quadrupole moment deviating from the recommended value of $-4.00(3)$~fm$^2$ by $1.75σ$. Furthermore, we determine the quadrupole moment ratio of $^6$Li to $^7$Li as $0.101(13)$, exhibiting a $6σ$ deviation from the previous measured value of $0.020161(13)$ by LiF molecular spectroscopy. The results taken together provide a sensitive test of nuclear structure models. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2402.11725 [pdf, other]

How Susceptible are Large Language Models to Ideological Manipulation?

Authors: Kai Chen, Zihao He, Jun Yan, Taiwei Shi, Kristina Lerman

Abstract: Large Language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information. This raises concerns about the societal impact that could arise if the ideologies within these models can be easily manipulated. In this work, we investigate how effectively LLMs can learn and generalize ideological biases from their instruction-tuning data. Ou… ▽ More Large Language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information. This raises concerns about the societal impact that could arise if the ideologies within these models can be easily manipulated. In this work, we investigate how effectively LLMs can learn and generalize ideological biases from their instruction-tuning data. Our findings reveal a concerning vulnerability: exposure to only a small amount of ideologically driven samples significantly alters the ideology of LLMs. Notably, LLMs demonstrate a startling ability to absorb ideology from one topic and generalize it to even unrelated ones. The ease with which LLMs' ideologies can be skewed underscores the risks associated with intentionally poisoned training data by malicious actors or inadvertently introduced biases by data annotators. It also emphasizes the imperative for robust safeguards to mitigate the influence of ideological manipulations on LLMs. △ Less

Submitted 18 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.04159 [pdf, ps, other]

Optimal transport in the frame of abstract Lax-Oleinik operator revisited

Authors: Wei Cheng, Jiahui Hong, Tianqi Shi

Abstract: This is our first paper on the extension of our recent work on the Lax-Oleinik commutators and its applications to the intrinsic approach of propagation of singularities of the viscosity solutions of Hamilton-Jacobi equations. We reformulate Kantorovich-Rubinstein duality theorem in the theory of optimal transport in terms of abstract Lax-Oleinik operators, and analyze the relevant optimal transpo… ▽ More This is our first paper on the extension of our recent work on the Lax-Oleinik commutators and its applications to the intrinsic approach of propagation of singularities of the viscosity solutions of Hamilton-Jacobi equations. We reformulate Kantorovich-Rubinstein duality theorem in the theory of optimal transport in terms of abstract Lax-Oleinik operators, and analyze the relevant optimal transport problem in the case the cost function $c(x,y)=h(t_1,t_2,x,y)$ is the fundamental solution of Hamilton-Jacobi equation. For further applications to the problem of cut locus and propagation of singularities in optimal transport, we introduce corresponding random Lax-Oleinik operators. We also study the problem of singularities for $c$-concave functions and its dynamical implication when $c$ is the fundamental solution with $t_2-t_1\ll1$ and $t_2-t_1<\infty$, and $c$ is the Peierls' barrier respectively. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.16501 [pdf]

AFSD-Physics: Exploring the governing equations of temperature evolution during additive friction stir deposition by a human-AI teaming approach

Authors: Tony Shi, Mason Ma, Jiajie Wu, Chase Post, Elijah Charles, Tony Schmitz

Abstract: This paper presents a modeling effort to explore the underlying physics of temperature evolution during additive friction stir deposition (AFSD) by a human-AI teaming approach. AFSD is an emerging solid-state additive manufacturing technology that deposits materials without melting. However, both process modeling and modeling of the AFSD tool are at an early stage. In this paper, a human-AI teamin… ▽ More This paper presents a modeling effort to explore the underlying physics of temperature evolution during additive friction stir deposition (AFSD) by a human-AI teaming approach. AFSD is an emerging solid-state additive manufacturing technology that deposits materials without melting. However, both process modeling and modeling of the AFSD tool are at an early stage. In this paper, a human-AI teaming approach is proposed to combine models based on first principles with AI. The resulting human-informed machine learning method, denoted as AFSD-Physics, can effectively learn the governing equations of temperature evolution at the tool and the build from in-process measurements. Experiments are designed and conducted to collect in-process measurements for the deposition of aluminum 7075 with a total of 30 layers. The acquired governing equations are physically interpretable models with low computational cost and high accuracy. Model predictions show good agreement with the measurements. Experimental validation with new process parameters demonstrates the model's generalizability and potential for use in tool temperature control and process optimization. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.10785 [pdf, ps, other]

Composite learning backstepping control with guaranteed exponential stability and robustness

Authors: Tian Shi, Changyun Wen, Yongping Pan

Abstract: Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, input-to-state stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite le… ▽ More Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, input-to-state stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite learning backstepping control (CLBC) strategy based on modular backstepping and high-order tuners to compensate for the transient process of parameter estimation and achieve closed-loop exponential stability without the nonlinear damping terms and the PE condition. A novel composite learning mechanism that maximizes the staged exciting strength is designed for parameter estimation, such that parameter convergence can be achieved under a condition of interval excitation (IE) or even partial IE that is strictly weaker than PE. An extra prediction error is employed in the adaptive law to ensure the transient performance without nonlinear damping terms. The exponential stability of the closed-loop system is proved rigorously under the partial IE or IE condition. Simulations have demonstrated the effectiveness and superiority of the proposed method in both parameter estimation and control compared to state-of-the-art methods. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.09432 [pdf, other]

RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models

Authors: Meiling Tao, Xuechen Liang, Tianyu Shi, Lei Yu, Yiting Xie

Abstract: This study presents RoleCraft-GLM, an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs). RoleCraft-GLM addresses the key issue of lacking personalized interactions in conversational AI, and offers a solution with detailed and emotionally nuanced character portrayals. We contribute a unique conversational dataset that shifts from conventional celebr… ▽ More This study presents RoleCraft-GLM, an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs). RoleCraft-GLM addresses the key issue of lacking personalized interactions in conversational AI, and offers a solution with detailed and emotionally nuanced character portrayals. We contribute a unique conversational dataset that shifts from conventional celebrity-centric characters to diverse, non-celebrity personas, thus enhancing the realism and complexity of language modeling interactions. Additionally, our approach includes meticulous character development, ensuring dialogues are both realistic and emotionally resonant. The effectiveness of RoleCraft-GLM is validated through various case studies, highlighting its versatility and skill in different scenarios. Our framework excels in generating dialogues that accurately reflect characters' personality traits and emotions, thereby boosting user engagement. In conclusion, RoleCraft-GLM marks a significant leap in personalized AI interactions, and paves the way for more authentic and immersive AI-assisted role-playing experiences by enabling more nuanced and emotionally rich dialogues △ Less

Submitted 4 April, 2024; v1 submitted 17 December, 2023; originally announced January 2024.

Showing 1–50 of 336 results for author: Shi, T