-
High-Efficiency Urban 3D Radio Map Estimation Based on Sparse Measurements
Authors:
Xinwei Chen,
Xiaofeng Zhong,
Zijian Zhang,
Linglong Dai,
Shidong Zhou
Abstract:
Recent widespread applications for unmanned aerial vehicles (UAVs) -- from infrastructure inspection to urban logistics -- have prompted an urgent need for high-accuracy three-dimensional (3D) radio maps. However, existing methods designed for two-dimensional radio maps face challenges of high measurement costs and limited data availability when extended to 3D scenarios. To tackle these challenges…
▽ More
Recent widespread applications for unmanned aerial vehicles (UAVs) -- from infrastructure inspection to urban logistics -- have prompted an urgent need for high-accuracy three-dimensional (3D) radio maps. However, existing methods designed for two-dimensional radio maps face challenges of high measurement costs and limited data availability when extended to 3D scenarios. To tackle these challenges, we first build a real-world large-scale 3D radio map dataset, covering over 4.2 million m^3 and over 4 thousand data points in complex urban environments. We propose a Gaussian Process Regression-based scheme for 3D radio map estimation, allowing us to realize more accurate map recovery with a lower RMSE than state-of-the-art schemes by over 2.5 dB. To further enhance data efficiency, we propose two methods for training point selection, including an offline clustering-based method and an online maximum a posterior (MAP)-based method. Extensive experiments demonstrate that the proposed scheme not only achieves full-map recovery with only 2% of UAV measurements, but also sheds light on future studies on 3D radio maps.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Authors:
Yanqi Dai,
Huanran Hu,
Lei Wang,
Shengjie Jin,
Xu Chen,
Zhiwu Lu
Abstract:
Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comp…
▽ More
Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method. Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. Additionally, we present a robust evaluation method, MMRole-Eval, encompassing eight metrics across three dimensions, where a reward model is trained to score MRPAs with the constructed ground-truth data for comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent. Extensive evaluation results demonstrate the improved performance of MMRole-Agent and highlight the primary challenges in developing MRPAs, emphasizing the need for enhanced multimodal understanding and role-playing consistency. The data, code, and models will be available at https://github.com/YanqiDai/MMRole.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
EdgeShield: A Universal and Efficient Edge Computing Framework for Robust AI
Authors:
Duo Zhong,
Bojing Li,
Xiang Chen,
Chenchen Liu
Abstract:
The increasing prevalence of adversarial attacks on Artificial Intelligence (AI) systems has created a need for innovative security measures. However, the current methods of defending against these attacks often come with a high computing cost and require back-end processing, making real-time defense challenging. Fortunately, there have been remarkable advancements in edge-computing, which make it…
▽ More
The increasing prevalence of adversarial attacks on Artificial Intelligence (AI) systems has created a need for innovative security measures. However, the current methods of defending against these attacks often come with a high computing cost and require back-end processing, making real-time defense challenging. Fortunately, there have been remarkable advancements in edge-computing, which make it easier to deploy neural networks on edge devices. Building upon these advancements, we propose an edge framework design to enable universal and efficient detection of adversarial attacks. This framework incorporates an attention-based adversarial detection methodology and a lightweight detection network formation, making it suitable for a wide range of neural networks and can be deployed on edge devices. To assess the effectiveness of our proposed framework, we conducted evaluations on five neural networks. The results indicate an impressive 97.43% F-score can be achieved, demonstrating the framework's proficiency in detecting adversarial attacks. Moreover, our proposed framework also exhibits significantly reduced computing complexity and cost in comparison to previous detection methods. This aspect is particularly beneficial as it ensures that the defense mechanism can be efficiently implemented in real-time on-edge devices.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
EcoFollower: An Environment-Friendly Car Following Model Considering Fuel Consumption
Authors:
Hui Zhong,
Xianda Chen,
PakHin Tiu,
Hongliang Lu,
Meixin Zhu
Abstract:
To alleviate energy shortages and environmental impacts caused by transportation, this study introduces EcoFollower, a novel eco-car-following model developed using reinforcement learning (RL) to optimize fuel consumption in car-following scenarios. Employing the NGSIM datasets, the performance of EcoFollower was assessed in comparison with the well-established Intelligent Driver Model (IDM). The…
▽ More
To alleviate energy shortages and environmental impacts caused by transportation, this study introduces EcoFollower, a novel eco-car-following model developed using reinforcement learning (RL) to optimize fuel consumption in car-following scenarios. Employing the NGSIM datasets, the performance of EcoFollower was assessed in comparison with the well-established Intelligent Driver Model (IDM). The findings demonstrate that EcoFollower excels in simulating realistic driving behaviors, maintaining smooth vehicle operations, and closely matching the ground truth metrics of time-to-collision (TTC), headway, and comfort. Notably, the model achieved a significant reduction in fuel consumption, lowering it by 10.42\% compared to actual driving scenarios. These results underscore the capability of RL-based models like EcoFollower to enhance autonomous vehicle algorithms, promoting safer and more energy-efficient driving strategies.
△ Less
Submitted 22 July, 2024;
originally announced August 2024.
-
Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency
Authors:
Xijun Wang,
Dongshan Ye,
Chenyuan Feng,
Howard H. Yang,
Xiang Chen,
Tony Q. S. Quek
Abstract:
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction a…
▽ More
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction and segmentation mapping techniques to convert images into explainable semantics, while employing Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks. We also introduce a multi-rate ISC transmission protocol that dynamically adapts to both the received explainable semantic content and specific task requirements at the receiver. Simulation results demonstrate that our framework achieves explainable learning, decoupled training, and compatible transmission in various application scenarios. Finally, some intriguing research directions and application scenarios are identified.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Neural Network Modeling of Heavy-Quark Potential from Holography
Authors:
Ou-Yang Luo,
Xun Chen,
Fu-Peng Li,
Xiao-Hua Li,
Kai Zhou
Abstract:
Using Multi-Layer Perceptrons (MLP) and Kolmogorov-Arnold Networks (KAN), we construct a holographic model based on lattice QCD data for the heavy-quark potential in the 2+1 system. The deformation factor $w(r)$ in the metric is obtained using the two types of neural network. First, we numerically obtain $w(r)$ using MLP, accurately reproducing the QCD results of the lattice, and calculate the hea…
▽ More
Using Multi-Layer Perceptrons (MLP) and Kolmogorov-Arnold Networks (KAN), we construct a holographic model based on lattice QCD data for the heavy-quark potential in the 2+1 system. The deformation factor $w(r)$ in the metric is obtained using the two types of neural network. First, we numerically obtain $w(r)$ using MLP, accurately reproducing the QCD results of the lattice, and calculate the heavy quark potential at finite temperature and the chemical potential. Subsequently, we employ KAN within the Andreev-Zakharov model for validation purpose, which can analytically reconstruct $w(r)$, matching the Andreev-Zakharov model exactly and confirming the validity of MLP. Finally, we construct an analytical holographic model using KAN and study the heavy-quark potential at finite temperature and chemical potential using the KAN-based holographic model. This work demonstrates the potential of KAN to derive analytical expressions for high-energy physics applications.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging
Authors:
Senkang Hu,
Zhengru Fang,
Zihan Fang,
Yiqin Deng,
Xianhao Chen,
Yuguang Fang,
Sam Kwong
Abstract:
Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language…
▽ More
Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language models (LLMs). Specifically, we first design a scene observation and understanding module to allow an agent to capture the traffic environment. Then we propose a hierarchical planning module to enable the agent to make decisions and plan trajectories based on the observation and the agent's own state. In addition, in order to facilitate collaboration among multiple agents, we introduce a communication module to enable the surrounding agents to exchange necessary information and coordinate their actions. Finally, we develop a reinforcement reflection guided training paradigm to further enhance the decision-making capability of the framework. Extensive experiments are conducted to evaluate the performance of our proposed method, demonstrating its superior efficiency and effectiveness for multi-agent collaborative decision-making under various ramp merging scenarios.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Measurement of the Branching Fraction of \boldmath{$ψ(2S) \to γπ^0$}
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Based on $(2712.4\pm14.1)\times10^{6}~ψ(2S)$ events, 7.9 fb$^{-1}$ $ψ(3773)$ data, and 0.8 fb$^{-1}$ off-resonance data samples collected with the BESIII detector, we measure the branching fraction of $ψ(2S)\rightarrowγπ^{0}$ and $e^{+}e^{-}\rightarrowγπ^{0}$ form factor at momentum transfers $Q^{2}\sim13$ GeV$^{2}$. The $e^{+}e^{-}\rightarrowγπ^{0}$ cross section is fitted with considering the in…
▽ More
Based on $(2712.4\pm14.1)\times10^{6}~ψ(2S)$ events, 7.9 fb$^{-1}$ $ψ(3773)$ data, and 0.8 fb$^{-1}$ off-resonance data samples collected with the BESIII detector, we measure the branching fraction of $ψ(2S)\rightarrowγπ^{0}$ and $e^{+}e^{-}\rightarrowγπ^{0}$ form factor at momentum transfers $Q^{2}\sim13$ GeV$^{2}$. The $e^{+}e^{-}\rightarrowγπ^{0}$ cross section is fitted with considering the interference between the $ψ(2S)$ and continuum amplitudes and two solutions are found, ${\cal B}=3.74\times10^{-7}$ with $φ=3.93$ rad and ${\cal B}=7.87\times10^{-7}$ with $φ=2.08$ rad. Here, ${\cal B}$ is the branching fraction of $ψ(2S)\rightarrowγπ^{0}$ and $φ$ is the relative phase angle between the $ψ(2S)$ and continuum amplitudes. Due to insufficient off-resonance data, the branching fraction ${\cal B}(ψ(2S)\rightarrowγπ^{0})$ is determined to be in the range $[2.7, 9.7]\times10^{-7}$ within one standard deviation of the contour region.
△ Less
Submitted 7 August, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
Authors:
Xueyan Li,
Xinyan Chen,
Yazhe Niu,
Shuai Hu,
Yu Liu
Abstract:
In the field of psychology, traditional assessment methods, such as standardized scales, are frequently critiqued for their static nature, lack of personalization, and reduced participant engagement, while comprehensive counseling evaluations are often inaccessible. The complexity of quantifying psychological traits further limits these methods. Despite advances with large language models (LLMs),…
▽ More
In the field of psychology, traditional assessment methods, such as standardized scales, are frequently critiqued for their static nature, lack of personalization, and reduced participant engagement, while comprehensive counseling evaluations are often inaccessible. The complexity of quantifying psychological traits further limits these methods. Despite advances with large language models (LLMs), many still depend on single-round Question-and-Answer interactions. To bridge this gap, we introduce PsyDI, a personalized and progressively in-depth chatbot designed for psychological measurements, exemplified by its application in the Myers-Briggs Type Indicator (MBTI) framework. PsyDI leverages user-related multi-modal information and engages in customized, multi-turn interactions to provide personalized, easily accessible measurements, while ensuring precise MBTI type determination. To address the challenge of unquantifiable psychological traits, we introduce a novel training paradigm that involves learning the ranking of proxy variables associated with these traits, culminating in a robust score model for MBTI measurements. The score model enables PsyDI to conduct comprehensive and precise measurements through multi-turn interactions within a unified estimation context. Through various experiments, we validate the efficacy of both the score model and the PsyDI pipeline, demonstrating its potential to serve as a general framework for psychological measurements. Furthermore, the online deployment of PsyDI has garnered substantial user engagement, with over 3,000 visits, resulting in the collection of numerous multi-turn dialogues annotated with MBTI types, which facilitates further research. The source code for the training and web service components is publicly available as a part of OpenDILab at: https://github.com/opendilab/PsyDI
△ Less
Submitted 15 August, 2024; v1 submitted 22 July, 2024;
originally announced August 2024.
-
Measurement of $Σ^+$ transverse polarization in $e^+e^-$ collisions at $\sqrt{s} = 3.68-3.71$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at seven energy points ranging from 3.68 to 3.71 GeV and corresponding to an integrated luminosity of $652.1~{\rm pb^{-1}}$, we present an energy-dependent measurement of the transverse polarization, relative phase and modulus ratio of the electromagnetic form factors of the $Σ^+$ hyperon in the $e^+e^- \to Σ^+ \barΣ^-$ reaction. The…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at seven energy points ranging from 3.68 to 3.71 GeV and corresponding to an integrated luminosity of $652.1~{\rm pb^{-1}}$, we present an energy-dependent measurement of the transverse polarization, relative phase and modulus ratio of the electromagnetic form factors of the $Σ^+$ hyperon in the $e^+e^- \to Σ^+ \barΣ^-$ reaction. These results are helpful to understand the production mechanism of the $Σ^+$-$\barΣ^-$ pairs.
△ Less
Submitted 7 August, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Valence Quark Distributions in Pions: Insights from Tsallis Entropy
Authors:
Jingxuan Chen,
Xiaopeng Wang,
Yanbing Cai,
Xurong Chen,
Qian Wang
Abstract:
We investigate the valence quark distributions of pions at a low initial scale ($Q^2_0$) through the application of Tsallis entropy, a non-extensive measure adept at encapsulating long-range correlations among internal constituents. Utilizing the maximum entropy approach, we derive the valence quark distributions at elevated resolution scales via a modified DGLAP equation, which integrates GLR-MQ-…
▽ More
We investigate the valence quark distributions of pions at a low initial scale ($Q^2_0$) through the application of Tsallis entropy, a non-extensive measure adept at encapsulating long-range correlations among internal constituents. Utilizing the maximum entropy approach, we derive the valence quark distributions at elevated resolution scales via a modified DGLAP equation, which integrates GLR-MQ-ZRS corrections for the $Q^2$ evolution. Our findings indicate that the resulting $Q^2$-dependent valence quark distributions yield an optimal fit to experimental data, with an inferred parameter value of $q$ ($q = 0.91$), diverging from unity. This deviation highlights the significant role that correlations among valence quarks play in shaping our understanding of pion internal structure. Additionally, our computations of the first three moments of pion quark distributions at $ Q^2 = 4 \, \mathrm{GeV}^2$ display consistency with alternative theoretical models, thereby reinforcing the importance of incorporating valence quark correlations within this analytical framework.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
FAST detection of OH emission in the carbon-rich planetary nebula NGC 7027
Authors:
Xu-Jia Ouyang,
Yong Zhang,
Chuan-Peng Zhang,
Peng Jiang,
Jun-ichi Nakashima,
Xi Chen,
Hai-Hua Qiao,
Xu-Ying Zhang,
Hao-Min Sun,
Xiao-Hu Li,
Albert Zijlstra
Abstract:
We present the first detection of the ground-state OH emission line at 1612 MHz toward the prototypical carbon-rich planetary nebula (PN) NGC 7027, utilizing the newly installed ultra-wideband (UWB) receiver of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). This emission is likely to originate from the interface of the neutral shell and the ionized region. The other three ground…
▽ More
We present the first detection of the ground-state OH emission line at 1612 MHz toward the prototypical carbon-rich planetary nebula (PN) NGC 7027, utilizing the newly installed ultra-wideband (UWB) receiver of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). This emission is likely to originate from the interface of the neutral shell and the ionized region. The other three ground-state OH lines at 1665, 1667, and 1721 MHz are observed in absorption and have velocities well matched with that of HCO$^+$ absorption. We infer that the OH absorption is from the outer shell of NGC 7027, although the possibility that they are associated with a foreground cloud cannot be completely ruled out. All the OH lines exhibit a single blue-shifted component with respect to the central star. The formation of OH in carbon-rich environments might be via photodissociation-induced chemical processes. Our observations offer significant constraints for chemical simulations, and they underscore the potent capability of the UWB receiver of FAST to search for nascent PNe.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Recent Advances in Data-driven Intelligent Control for Wireless Communication: A Comprehensive Survey
Authors:
Wei Huo,
Huiwen Yang,
Nachuan Yang,
Zhaohua Yang,
Jiuzhou Zhang,
Fuhai Nan,
Xingzhou Chen,
Yifan Mao,
Suyang Hu,
Pengyu Wang,
Xuanyu Zheng,
Mingming Zhao,
Ling Shi
Abstract:
The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex…
▽ More
The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex demands of these emerging systems. As the volume of data continues to escalate, the integration of data-driven methods has become indispensable for enabling adaptive and intelligent control mechanisms in future wireless communication systems. This comprehensive survey explores recent advancements in data-driven methodologies applied to wireless communication networks. It focuses on developments over the past five years and their application to various control objectives within wireless cyber-physical systems. It encompasses critical areas such as link adaptation, user scheduling, spectrum allocation, beam management, power control, and the co-design of communication and control systems. We provide an in-depth exploration of the technical underpinnings that support these data-driven approaches, including the algorithms, models, and frameworks developed to enhance network performance and efficiency. We also examine the challenges that current data-driven algorithms face, particularly in the context of the dynamic and heterogeneous nature of next-generation wireless networks. The paper provides a critical analysis of these challenges and offers insights into potential solutions and future research directions. This includes discussing the adaptability, integration with 6G, and security of data-driven methods in the face of increasing network complexity and data volume.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Observation of $η_{c}(2S) \to K^{+}K^{-}η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
By analyzing $(27.12 \pm 0.14)\times10^{8}$ $ψ(3686)$ events accumulated with the BESIII detector, the decay $η_{c}(2S) \to K^{+} K^{-} η$ is observed for the first time with a significance of $6.2σ$ after considering systematic uncertainties. The product of the branching fractions of $ψ(3686) \to γη_{c}(2S)$ and $η_{c}(2S) \to K^{+} K^{-} η$ is measured to be…
▽ More
By analyzing $(27.12 \pm 0.14)\times10^{8}$ $ψ(3686)$ events accumulated with the BESIII detector, the decay $η_{c}(2S) \to K^{+} K^{-} η$ is observed for the first time with a significance of $6.2σ$ after considering systematic uncertainties. The product of the branching fractions of $ψ(3686) \to γη_{c}(2S)$ and $η_{c}(2S) \to K^{+} K^{-} η$ is measured to be $\mathcal{B}(ψ(3686) \toγη_{c}(2S))\times \mathcal{B}(η_{c}(2S)\to K^{+} K^{-}η)=(2.39 \pm 0.32 \pm 0.34) \times 10^{-6}$, where the first uncertainty is statistical, and the second one is systematic. The branching fraction of $η_{c}(2S)\to K^{+} K^{-}η$ is determined to be $\mathcal{B}(η_{c}(2S)\to K^{+} K^{-}η) = (3.42 \pm 0.46 \pm 0.48 \pm 2.44) \times 10^{-3}$, where the third uncertainty is due to the branching fraction of $ψ(3686) \to γη_{c}(2S)$. Using a recent BESIII measurement of $\mathcal{B} (η_{c}(2S) \to K^{+} K^{-}π^{0})$, we also determine the ratio between the branching fractions of $η_{c}(2S) \to K^{+} K^{-}η$ and $η_{c}(2S) \to K^{+} K^{-}π^{0}$ to be $1.49 \pm 0.22 \pm 0.25$, which is consistent with the previous result of BaBar at a comparable precision level.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
A Framework for Fine-Tuning LLMs using Heterogeneous Feedback
Authors:
Ryan Aponte,
Ryan A. Rossi,
Shunan Guo,
Franck Dernoncourt,
Tong Yu,
Xiang Chen,
Subrata Mitra,
Nedim Lipka
Abstract:
Large language models (LLMs) have been applied to a wide range of tasks, including text summarization, web navigation, and chatbots. They have benefitted from supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) following an unsupervised pretraining. These datasets can be difficult to collect, limited in scope, and vary in sample quality. Additionally, datasets can va…
▽ More
Large language models (LLMs) have been applied to a wide range of tasks, including text summarization, web navigation, and chatbots. They have benefitted from supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) following an unsupervised pretraining. These datasets can be difficult to collect, limited in scope, and vary in sample quality. Additionally, datasets can vary extensively in supervision format, from numerical to binary as well as multi-dimensional with many different values. We present a framework for fine-tuning LLMs using heterogeneous feedback, which has two main components. First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF. Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases potentially exceeding the full dataset. We conduct extensive experiments to understand the effectiveness of these techniques for incorporating heterogeneous feedback, and demonstrate improvements from using a high-quality and diverse subset of the data. We find that our framework is able to improve models in multiple areas simultaneously, such as in instruction following and bias reduction.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
SiCo: A Size-Controllable Virtual Try-On Approach for Informed Decision-Making
Authors:
Sherry X. Chen,
Alex Christopher Lim,
Yimeng Liu,
Pradeep Sen,
Misha Sra
Abstract:
Virtual try-on (VTO) applications aim to improve the online shopping experience by allowing users to preview garments, before making purchase decisions. However, many VTO tools fail to consider the crucial relationship between a garment's size and the user's body size, often employing a one-size-fits-all approach when visualizing a clothing item. This results in poor size recommendations and purch…
▽ More
Virtual try-on (VTO) applications aim to improve the online shopping experience by allowing users to preview garments, before making purchase decisions. However, many VTO tools fail to consider the crucial relationship between a garment's size and the user's body size, often employing a one-size-fits-all approach when visualizing a clothing item. This results in poor size recommendations and purchase decisions leading to increased return rates. To address this limitation, we introduce SiCo, an online VTO system, where users can upload images of themselves and visualize how different sizes of clothing would look on their body to help make better-informed purchase decisions. Our user study shows SiCo's superiority over baseline VTO. The results indicate that our approach significantly enhances user ability to gauge the appearance of outfits on their bodies and boosts their confidence in selecting clothing sizes that match desired goals. Based on our evaluation, we believe our VTO design has the potential to reduce return rates and enhance the online clothes shopping experience. Our code is available at https://github.com/SherryXTChen/SiCo.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Authors:
Xianyu Chen,
Ming Jiang,
Qi Zhao
Abstract:
While exploring visual scenes, humans' scanpaths are driven by their underlying attention processes. Understanding visual scanpaths is essential for various applications. Traditional scanpath models predict the where and when of gaze shifts without providing explanations, creating a gap in understanding the rationale behind fixations. To bridge this gap, we introduce GazeXplain, a novel study of v…
▽ More
While exploring visual scenes, humans' scanpaths are driven by their underlying attention processes. Understanding visual scanpaths is essential for various applications. Traditional scanpath models predict the where and when of gaze shifts without providing explanations, creating a gap in understanding the rationale behind fixations. To bridge this gap, we introduce GazeXplain, a novel study of visual scanpath prediction and explanation. This involves annotating natural-language explanations for fixations across eye-tracking datasets and proposing a general model with an attention-language decoder that jointly predicts scanpaths and generates explanations. It integrates a unique semantic alignment mechanism to enhance the consistency between fixations and explanations, alongside a cross-dataset co-training approach for generalization. These novelties present a comprehensive and adaptable solution for explainable human visual scanpath prediction. Extensive experiments on diverse eye-tracking datasets demonstrate the effectiveness of GazeXplain in both scanpath prediction and explanation, offering valuable insights into human visual attention and cognitive processes.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Language Model Can Listen While Speaking
Authors:
Ziyang Ma,
Yakun Song,
Chenpeng Du,
Jian Cong,
Zhuo Chen,
Yuping Wang,
Yuxuan Wang,
Xie Chen
Abstract:
Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisf…
▽ More
Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisfactory. To address these limitations, we explore full duplex modeling (FDM) in interactive speech language models (iSLM), focusing on enhancing real-time interaction and, more explicitly, exploring the quintessential ability of interruption. We introduce a novel model design, namely listening-while-speaking language model (LSLM), an end-to-end system equipped with both listening and speaking channels. Our LSLM employs a token-based decoder-only TTS for speech generation and a streaming self-supervised learning (SSL) encoder for real-time audio input. LSLM fuses both channels for autoregressive generation and detects turn-taking in real time. Three fusion strategies -- early fusion, middle fusion, and late fusion -- are explored, with middle fusion achieving an optimal balance between speech generation and real-time interaction. Two experimental settings, command-based FDM and voice-based FDM, demonstrate LSLM's robustness to noise and sensitivity to diverse instructions. Our results highlight LSLM's capability to achieve duplex communication with minimal impact on existing systems. This study aims to advance the development of interactive speech dialogue systems, enhancing their applicability in real-world contexts.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs
Authors:
Dongming Jin,
Shengxin Zhao,
Zhi Jin,
Xiaohong Chen,
Chunhui Wang,
Zheng Fang,
Hongbin Xiao
Abstract:
Cyber-physical systems (CPSs) integrate cyber and physical components and enable them to interact with each other to meet user needs. The needs for CPSs span rich application domains such as healthcare and medicine, smart home, smart building, etc. This indicates that CPSs are all about solving real-world problems. With the increasing abundance of sensing devices and effectors, the problems wanted…
▽ More
Cyber-physical systems (CPSs) integrate cyber and physical components and enable them to interact with each other to meet user needs. The needs for CPSs span rich application domains such as healthcare and medicine, smart home, smart building, etc. This indicates that CPSs are all about solving real-world problems. With the increasing abundance of sensing devices and effectors, the problems wanted to solve with CPSs are becoming more and more complex. It is also becoming increasingly difficult to extract and express CPS requirements accurately. Problem frame approach aims to shape real-world problems by capturing the characteristics and interconnections of components, where the problem diagram is central to expressing the requirements. CPSs requirements are generally presented in domain-specific documents that are normally expressed in natural language. There is currently no effective way to extract problem diagrams from natural language documents. CPSs requirements extraction and modeling are generally done manually, which is time-consuming, labor-intensive, and error-prone. Large language models (LLMs) have shown excellent performance in natural language understanding. It can be interesting to explore the abilities of LLMs to understand domain-specific documents and identify modeling elements, which this paper is working on. To achieve this goal, we first formulate two tasks (i.e., entity recognition and interaction extraction) and propose a benchmark called CPSBench. Based on this benchmark, extensive experiments are conducted to evaluate the abilities and limitations of seven advanced LLMs. We find some interesting insights. Finally, we establish a taxonomy of LLMs hallucinations in CPSs requirements modeling using problem diagrams. These results will inspire research on the use of LLMs for automated CPSs requirements modeling.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning
Authors:
Biqing Qi,
Junqi Gao,
Xinquan Chen,
Dong Li,
Weinan Zhang,
Bowen Zhou
Abstract:
The ability of humans to rapidly learn new knowledge while retaining old memories poses a significant challenge for current deep learning models. To handle this challenge, we draw inspiration from human memory and learning mechanisms and propose the Self-Reflective Complementary Incremental System (SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and Complementary Memory…
▽ More
The ability of humans to rapidly learn new knowledge while retaining old memories poses a significant challenge for current deep learning models. To handle this challenge, we draw inspiration from human memory and learning mechanisms and propose the Self-Reflective Complementary Incremental System (SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and Complementary Memory Module (CMM), SR-CIS features a small model for fast inference and a large model for slow deliberation in CIM, enabled by the Confidence-Aware Online Anomaly Detection (CA-OAD) mechanism for efficient collaboration. CMM consists of task-specific Short-Term Memory (STM) region and a universal Long-Term Memory (LTM) region. By setting task-specific Low-Rank Adaptive (LoRA) and corresponding prototype weights and biases, it instantiates external storage for parameter and representation memory, thus deconstructing the memory module from the inference module. By storing textual descriptions of images during training and combining them with the Scenario Replay Module (SRM) post-training for memory combination, along with periodic short-to-long-term memory restructuring, SR-CIS achieves stable incremental memory with limited storage requirements. Balancing model plasticity and memory stability under constraints of limited storage and low data resources, SR-CIS surpasses existing competitive baselines on multiple standard and few-shot incremental learning benchmarks.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles
Authors:
Lun Luo,
Si-Yuan Cao,
Xiaorui Li,
Jintao Xu,
Rui Ai,
Zhu Yu,
Xieyuanli Chen
Abstract:
This article introduces BEVPlace++, a novel, fast, and robust LiDAR global localization method for unmanned ground vehicles. It uses lightweight convolutional neural networks (CNNs) on Bird's Eye View (BEV) image-like representations of LiDAR data to achieve accurate global localization through place recognition followed by 3-DoF pose estimation. Our detailed analyses reveal an interesting fact th…
▽ More
This article introduces BEVPlace++, a novel, fast, and robust LiDAR global localization method for unmanned ground vehicles. It uses lightweight convolutional neural networks (CNNs) on Bird's Eye View (BEV) image-like representations of LiDAR data to achieve accurate global localization through place recognition followed by 3-DoF pose estimation. Our detailed analyses reveal an interesting fact that CNNs are inherently effective at extracting distinctive features from LiDAR BEV images. Remarkably, keypoints of two BEV images with large translations can be effectively matched using CNN-extracted features. Building on this insight, we design a rotation equivariant module (REM) to obtain distinctive features while enhancing robustness to rotational changes. A Rotation Equivariant and Invariant Network (REIN) is then developed by cascading REM and a descriptor generator, NetVLAD, to sequentially generate rotation equivariant local features and rotation invariant global descriptors. The global descriptors are used first to achieve robust place recognition, and the local features are used for accurate pose estimation. Experimental results on multiple public datasets demonstrate that BEVPlace++, even when trained on a small dataset (3000 frames of KITTI) only with place labels, generalizes well to unseen environments, performs consistently across different days and years, and adapts to various types of LiDAR scanners. BEVPlace++ achieves state-of-the-art performance in subtasks of global localization including place recognition, loop closure detection, and global localization. Additionally, BEVPlace++ is lightweight, runs in real-time, and does not require accurate pose supervision, making it highly convenient for deployment. The source codes are publicly available at https://github.com/zjuluolun/BEVPlace.
△ Less
Submitted 9 August, 2024; v1 submitted 3 August, 2024;
originally announced August 2024.
-
Search for $X(3872)\toπ^0π^0χ_{c1,2}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using 10.1 fb$^{-1}$ of $e^+e^-$ collision data collected by the BESIII detector with center-of-mass energies between 4.15 GeV and 4.30 GeV, we search for the decays $X(3872)\toπ^0π^0χ_{c1,2}$, where the $X(3872)$ is produced in $e^+e^-\toγX(3872)$. No evidence above $3σ$ is found for either decay. Upper limits at the $90\%$ C.L. on the branching fractions of $X(3872)\toπ^0π^0χ_{c1,2}$ normalized…
▽ More
Using 10.1 fb$^{-1}$ of $e^+e^-$ collision data collected by the BESIII detector with center-of-mass energies between 4.15 GeV and 4.30 GeV, we search for the decays $X(3872)\toπ^0π^0χ_{c1,2}$, where the $X(3872)$ is produced in $e^+e^-\toγX(3872)$. No evidence above $3σ$ is found for either decay. Upper limits at the $90\%$ C.L. on the branching fractions of $X(3872)\toπ^0π^0χ_{c1,2}$ normalized to the branching fraction of $X(3872)\toπ^+π^-J/ψ$ are set to be $\mathcal{B}(X(3872)\toπ^0π^0χ_{c1})/\mathcal{B}(X(3872)\toπ^+π^-J/ψ) < 1.1$ and $\mathcal{B}(X(3872)\toπ^0π^0χ_{c2})/\mathcal{B}(X(3872)\toπ^+π^-J/ψ) < 0.5$, taking into account both statistical and systematic uncertainties.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
A Backbone for Long-Horizon Robot Task Understanding
Authors:
Xiaoshuai Chen,
Wei Chen,
Dongmyoung Lee,
Yukun Ge,
Nicolas Rojas,
Petar Kormushev
Abstract:
End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robo…
▽ More
End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robot configurations, which are then integrated with current foundation models to improve task understanding. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, the Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action execution, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively. Supplementary material is available at: https://sites.google.com/view/therbligsbasedbackbone/home
△ Less
Submitted 7 August, 2024; v1 submitted 2 August, 2024;
originally announced August 2024.
-
DERA: Dense Entity Retrieval for Entity Alignment in Knowledge Graphs
Authors:
Zhichun Wang,
Xuan Chen
Abstract:
Entity Alignment (EA) aims to match equivalent entities in different Knowledge Graphs (KGs), which is essential for knowledge fusion and integration. Recently, embedding-based EA has attracted significant attention and many approaches have been proposed. Early approaches primarily focus on learning entity embeddings from the structural features of KGs, defined by relation triples. Later methods in…
▽ More
Entity Alignment (EA) aims to match equivalent entities in different Knowledge Graphs (KGs), which is essential for knowledge fusion and integration. Recently, embedding-based EA has attracted significant attention and many approaches have been proposed. Early approaches primarily focus on learning entity embeddings from the structural features of KGs, defined by relation triples. Later methods incorporated entities' names and attributes as auxiliary information to enhance embeddings for EA. However, these approaches often used different techniques to encode structural and attribute information, limiting their interaction and mutual enhancement. In this work, we propose a dense entity retrieval framework for EA, leveraging language models to uniformly encode various features of entities and facilitate nearest entity search across KGs. Alignment candidates are first generated through entity retrieval, which are subsequently reranked to determine the final alignments. We conduct comprehensive experiments on both cross-lingual and monolingual EA datasets, demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
A deep learning-enabled smart garment for versatile sleep behaviour monitoring
Authors:
Chenyu Tang,
Wentian Yi,
Muzi Xu,
Yuxuan Jin,
Zibo Zhang,
Xuhang Chen,
Caizhi Liao,
Peter Smielewski,
Luigi G. Occhipinti
Abstract:
Continuous monitoring and accurate detection of complex sleep patterns associated to different sleep-related conditions is essential, not only for enhancing sleep quality but also for preventing the risk of developing chronic illnesses associated to unhealthy sleep. Despite significant advances in research, achieving versatile recognition of various unhealthy and sub-healthy sleep patterns with si…
▽ More
Continuous monitoring and accurate detection of complex sleep patterns associated to different sleep-related conditions is essential, not only for enhancing sleep quality but also for preventing the risk of developing chronic illnesses associated to unhealthy sleep. Despite significant advances in research, achieving versatile recognition of various unhealthy and sub-healthy sleep patterns with simple wearable devices at home remains a significant challenge. Here, we report a robust and durable ultrasensitive strain sensor array printed on a smart garment, in its collar region. This solution allows detecting subtle vibrations associated with multiple sleep patterns at the extrinsic laryngeal muscles. Equipped with a deep learning neural network, it can precisely identify six sleep states-nasal breathing, mouth breathing, snoring, bruxism, central sleep apnea (CSA), and obstructive sleep apnea (OSA)-with an impressive accuracy of 98.6%, all without requiring specific positioning. We further demonstrate its explainability and generalization capabilities in practical applications. Explainable artificial intelligence (XAI) visualizations reflect comprehensive signal pattern analysis with low bias. Transfer learning tests show that the system can achieve high accuracy (overall accuracy of 95%) on new users with very few-shot learning (less than 15 samples per class). The scalable manufacturing process, robustness, high accuracy, and excellent generalization of the smart garment make it a promising tool for next-generation continuous sleep monitoring.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Dark Matter Search Results from 1.54 Tonne$\cdot$Year Exposure of PandaX-4T
Authors:
PandaX Collaboration,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke Han,
Changda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji
, et al. (77 additional authors not shown)
Abstract:
In this letter, we report the dark matter search results from the commissioning run and the first science run of the PandaX-4T experiment. A blind analysis is carried out on the entire data set. The data processing is improved compared to previous work, unifying the low-level signal reconstruction in a wide energy range up to 120 keV. With a total exposure of 1.54 tonne$\cdot$year, no significant…
▽ More
In this letter, we report the dark matter search results from the commissioning run and the first science run of the PandaX-4T experiment. A blind analysis is carried out on the entire data set. The data processing is improved compared to previous work, unifying the low-level signal reconstruction in a wide energy range up to 120 keV. With a total exposure of 1.54 tonne$\cdot$year, no significant excess of nuclear recoil events is found. The lowest 90% confidence level exclusion on the spin-independent cross section is $1.6 \times 10^{-47} \mathrm{cm}^2$ at a dark matter mass of 40 GeV$/c^2$. Our results represent the most stringent constraint for a dark matter mass above 100 GeV$/c^2$.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Partial wave analysis of $ψ(3686)\toΛ\barΣ^0π^0+c.c.$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Based on a sample of $(2712.4\pm14.3)\times10^6\;ψ(3686)$ events collected with the BESIII detector, a partial wave analysis of the decay $ψ(3686)\toΛ\barΣ^0π^0+c.c.$ is performed to investigate $Λ^*$ and $Σ^*$ resonances in the $π^0\barΣ^0$ and $π^0Λ$ invariant mass distributions. Significant contributions are found from the $Λ(1405)$, $Λ(1520)$, $Λ(1600)$, $Λ(1670)$, $Λ(1690)$, $Λ(1800)$,…
▽ More
Based on a sample of $(2712.4\pm14.3)\times10^6\;ψ(3686)$ events collected with the BESIII detector, a partial wave analysis of the decay $ψ(3686)\toΛ\barΣ^0π^0+c.c.$ is performed to investigate $Λ^*$ and $Σ^*$ resonances in the $π^0\barΣ^0$ and $π^0Λ$ invariant mass distributions. Significant contributions are found from the $Λ(1405)$, $Λ(1520)$, $Λ(1600)$, $Λ(1670)$, $Λ(1690)$, $Λ(1800)$, $Λ(1890)$, $Λ(2325)$, $Σ(1385)$, $Σ(1660)$, $Σ(1670)$, $Σ(1750)$, and $Σ(1910)$. The masses, widths, and production branching fractions for each component are determined. In addition, the branching fraction of $ψ(3686)\toΛ\barΣ^0π^0+c.c.$ is measured to be $(1.544\pm0.013\pm0.069)\times10^{-4}$ for the first time, where the first uncertainty is statistical and the second systematic.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving
Authors:
Xi Chen,
Rahul Bhadani,
Larry Head
Abstract:
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle. With the rapid advancement in connected technologies, such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication, valuable information from alternate views becomes accessible via wireless networks. The integration of information from alternative views has the p…
▽ More
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle. With the rapid advancement in connected technologies, such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication, valuable information from alternate views becomes accessible via wireless networks. The integration of information from alternative views has the potential to overcome the inherent limitations associated with a single viewpoint, such as occlusions and limited field of view. In this work, we introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models. Unlike previous approaches where the multi-view data is manually fused or formulated as a separate training stage, our model supports end-to-end training, enhancing both flexibility and performance. Moreover, the predicted multimodal trajectories are calibrated by a post-hoc conformal prediction module to get valid and efficient confidence regions. We evaluated the entire framework using the real-world V2I dataset V2X-Seq. Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU. The code is publicly available at: \url{https://github.com/xichennn/V2I_trajectory_prediction}.
△ Less
Submitted 2 August, 2024; v1 submitted 1 August, 2024;
originally announced August 2024.
-
Systems of curves on non-orientable surfaces
Authors:
Xiao Chen
Abstract:
We show that the order of the cardinality of maximal complete $1$-systems of loops on non-orientable surfaces is $\sim |χ|^{2}$. In particular, we determine the exact cardinality of maximal complete $1$-systems of loops on punctured projective planes. To prove these results, we show that the cardinality of maximal systems of arcs pairwise-intersecting at most once on a non-orientable surface is…
▽ More
We show that the order of the cardinality of maximal complete $1$-systems of loops on non-orientable surfaces is $\sim |χ|^{2}$. In particular, we determine the exact cardinality of maximal complete $1$-systems of loops on punctured projective planes. To prove these results, we show that the cardinality of maximal systems of arcs pairwise-intersecting at most once on a non-orientable surface is $2|χ|(|χ|+1)$.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Construction of various time-dependent Hamiltonians on a single photonic chip
Authors:
Rui Ye,
Guangzhen Li,
Shuai Wan,
Xiaotian Xue,
Piyu Wang,
Xin Qiao,
Hao Li,
Shijie Liu,
Jiayu Wang,
Rui Ma,
Fang Bo,
Yuanlin Zheng,
Chunhua Dong,
Luqi Yuan,
Xianfeng Chen
Abstract:
Integrated photonics provides an important platform for simulating physical models with high-performance chip-scale devices, where the lattice size and the time-dependence of a model are key ingredients for further enriching the functionality of a photonic chip. Here, we propose and demonstrate the construction of various time-dependent Hamiltonian models using a single microresonator on thin-film…
▽ More
Integrated photonics provides an important platform for simulating physical models with high-performance chip-scale devices, where the lattice size and the time-dependence of a model are key ingredients for further enriching the functionality of a photonic chip. Here, we propose and demonstrate the construction of various time-dependent Hamiltonian models using a single microresonator on thin-film lithium niobate chip. Such an integrated microresonator holds high quality factor to 10^6, and supports the construction of the synthetic frequency lattice with effective lattice sites up to 152 under the electro-optic modulation. By further applying a bichromatic modulation composed of two radio-frequency signals oppositely detuned from the resonant frequency in the microresonator, we build different time-dependent Hamiltonians with the time-varying nearest-neighbor coupling strength in synthetic frequency lattice. We measure the temporal features from capturing the dynamic band structures of the lattice and demonstrate a variety of time-dependent synthetic lattice models by engineering the driven pattern of the modulation, highlighting great flexibility of the microresonator. Our work shows a photonic chip for simulating versatile time-dependent Hamiltonians, which pushes forward quantum simulations in integrated photonics with great experimental tunability and reconfigurability.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
No-Go Theorems for Universal Entanglement Purification
Authors:
Allen Zang,
Xinan Chen,
Eric Chitambar,
Martin Suchara,
Tian Zhong
Abstract:
An entanglement purification protocol (EPP) aims to transform multiple noisy entangled states into a single entangled state with higher fidelity. In this work we consider EPPs that always yield an output fidelity no worse than each of the original noisy states, a property we call universality. We prove there is no $n$-to-1 EPP implementable by local operations and classical communication that is u…
▽ More
An entanglement purification protocol (EPP) aims to transform multiple noisy entangled states into a single entangled state with higher fidelity. In this work we consider EPPs that always yield an output fidelity no worse than each of the original noisy states, a property we call universality. We prove there is no $n$-to-1 EPP implementable by local operations and classical communication that is universal for all two-qubit entangled states, whereas such an EPP is possible using more general positive partial transpose-preserving (PPT) operations. We also show that universality is not possible by bilocal Clifford EPPs even when restricted to states with fidelities above an arbitrarily high threshold.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Mixed precision HODLR matrices
Authors:
Erin Carson,
Xinye Chen,
Xiaobo Liu
Abstract:
Hierarchical matrix computations have attracted significant attention in the science and engineering community as exploiting data-sparse structures can significantly reduce the computational complexity of many important kernels. One particularly popular option within this class is the Hierarchical Off-Diagonal Low-Rank (HODLR) format. In this paper, we show that the off-diagonal blocks of HODLR ma…
▽ More
Hierarchical matrix computations have attracted significant attention in the science and engineering community as exploiting data-sparse structures can significantly reduce the computational complexity of many important kernels. One particularly popular option within this class is the Hierarchical Off-Diagonal Low-Rank (HODLR) format. In this paper, we show that the off-diagonal blocks of HODLR matrices that are approximated by low-rank matrices can be represented in low precision without degenerating the quality of the overall approximation (with the error growth bounded by a factor of $2$). We also present an adaptive-precision scheme for constructing and storing HODLR matrices, and we prove that the use of mixed precision does not compromise the numerical stability of the resulting HOLDR matrix--vector product and LU factorization. That is, the resulting error in these computations is not significantly greater than the case where we use one precision (say, double) for constructing and storing the HOLDR matrix. Our analyses further give insight on how one must choose the working precision in HOLDR matrix computations relative to the approximation error in order to not observe the effects of finite precision. Intuitively, when a HOLDR matrix is subject to a high degree of approximation error, subsequent computations can be performed in a lower precision without detriment. We demonstrate the validity of our theoretical results through a range of numerical experiments.
△ Less
Submitted 10 August, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Breaking the Hourglass Phenomenon of Residual Quantization: Enhancing the Upper Bound of Generative Retrieval
Authors:
Zhirui Kuai,
Zuxu Chen,
Huimu Wang,
Mingming Li,
Dadong Miao,
Binbin Wang,
Xusong Chen,
Li Kuang,
Yuxing Han,
Jiaxing Wang,
Guoyu Tang,
Lin Liu,
Songlin Wang,
Jingwei Zhuo
Abstract:
Generative retrieval (GR) has emerged as a transformative paradigm in search and recommender systems, leveraging numeric-based identifier representations to enhance efficiency and generalization. Notably, methods like TIGER employing Residual Quantization-based Semantic Identifiers (RQ-SID), have shown significant promise in e-commerce scenarios by effectively managing item IDs. However, a critica…
▽ More
Generative retrieval (GR) has emerged as a transformative paradigm in search and recommender systems, leveraging numeric-based identifier representations to enhance efficiency and generalization. Notably, methods like TIGER employing Residual Quantization-based Semantic Identifiers (RQ-SID), have shown significant promise in e-commerce scenarios by effectively managing item IDs. However, a critical issue termed the "\textbf{Hourglass}" phenomenon, occurs in RQ-SID, where intermediate codebook tokens become overly concentrated, hindering the full utilization of generative retrieval methods. This paper analyses and addresses this problem by identifying data sparsity and long-tailed distribution as the primary causes. Through comprehensive experiments and detailed ablation studies, we analyze the impact of these factors on codebook utilization and data distribution. Our findings reveal that the "Hourglass" phenomenon substantially impacts the performance of RQ-SID in generative retrieval. We propose effective solutions to mitigate this issue, thereby significantly enhancing the effectiveness of generative retrieval in real-world E-commerce applications.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Deep Learning-Based Longitudinal Prediction of Childhood Myopia Progression Using Fundus Image Sequences and Baseline Refraction Data
Authors:
Mengtian Kang,
Yansong Hu,
Shuo Gao,
Yuanyuan Liu,
Hongbei Meng,
Xuemeng Li,
Xuhang Chen,
Hubin Zhao,
Jing Fu,
Guohua Hu,
Wei Wang,
Yanning Dai,
Arokia Nathan,
Peter Smielewski,
Ningli Wang,
Shiming Li
Abstract:
Childhood myopia constitutes a significant global health concern. It exhibits an escalating prevalence and has the potential to evolve into severe, irreversible conditions that detrimentally impact familial well-being and create substantial economic costs. Contemporary research underscores the importance of precisely predicting myopia progression to enable timely and effective interventions, there…
▽ More
Childhood myopia constitutes a significant global health concern. It exhibits an escalating prevalence and has the potential to evolve into severe, irreversible conditions that detrimentally impact familial well-being and create substantial economic costs. Contemporary research underscores the importance of precisely predicting myopia progression to enable timely and effective interventions, thereby averting severe visual impairment in children. Such predictions predominantly rely on subjective clinical assessments, which are inherently biased and resource-intensive, thus hindering their widespread application. In this study, we introduce a novel, high-accuracy method for quantitatively predicting the myopic trajectory and myopia risk in children using only fundus images and baseline refraction data. This approach was validated through a six-year longitudinal study of 3,408 children in Henan, utilizing 16,211 fundus images and corresponding refractive data. Our method based on deep learning demonstrated predictive accuracy with an error margin of 0.311D per year and AUC scores of 0.944 and 0.995 for forecasting the risks of developing myopia and high myopia, respectively. These findings confirm the utility of our model in supporting early intervention strategies and in significantly reducing healthcare costs, particularly by obviating the need for additional metadata and repeated consultations. Furthermore, our method was designed to rely only on fundus images and refractive error data, without the need for meta data or multiple inquiries from doctors, strongly reducing the associated medical costs and facilitating large-scale screening. Our model can even provide good predictions based on only a single time measurement. Consequently, the proposed method is an important means to reduce medical inequities caused by economic disparities.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
An Extended Kalman Filter Integrated Latent Feature Model on Dynamic Weighted Directed Graphs
Authors:
Hongxun Zhou,
Xiangyu Chen,
Ye Yuan
Abstract:
A dynamic weighted directed graph (DWDG) is commonly encountered in various application scenarios. It involves extensive dynamic interactions among numerous nodes. Most existing approaches explore the intricate temporal patterns hidden in a DWDG from the purely data-driven perspective, which suffers from accuracy loss when a DWDG exhibits strong fluctuations over time. To address this issue, this…
▽ More
A dynamic weighted directed graph (DWDG) is commonly encountered in various application scenarios. It involves extensive dynamic interactions among numerous nodes. Most existing approaches explore the intricate temporal patterns hidden in a DWDG from the purely data-driven perspective, which suffers from accuracy loss when a DWDG exhibits strong fluctuations over time. To address this issue, this study proposes a novel Extended-Kalman-Filter-Incorporated Latent Feature (EKLF) model to represent a DWDG from the model-driven perspective. Its main idea is divided into the following two-fold ideas: a) adopting a control model, i.e., the Extended Kalman Filter (EKF), to track the complex temporal patterns precisely with its nonlinear state-transition and observation functions; and b) introducing an alternating least squares (ALS) algorithm to train the latent features (LFs) alternatively for precisely representing a DWDG. Empirical studies on DWDG datasets demonstrate that the proposed EKLF model outperforms state-of-the-art models in prediction accuracy and computational efficiency for missing edge weights of a DWDG. It unveils the potential for precisely representing a DWDG by incorporating a control model.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Small Object Few-shot Segmentation for Vision-based Industrial Inspection
Authors:
Zilong Zhang,
Chang Niu,
Zhibin Zhao,
Xingwu Zhang,
Xuefeng Chen
Abstract:
Vision-based industrial inspection (VII) aims to locate defects quickly and accurately. Supervised learning under a close-set setting and industrial anomaly detection, as two common paradigms in VII, face different problems in practical applications. The former is that various and sufficient defects are difficult to obtain, while the latter is that specific defects cannot be located. To solve thes…
▽ More
Vision-based industrial inspection (VII) aims to locate defects quickly and accurately. Supervised learning under a close-set setting and industrial anomaly detection, as two common paradigms in VII, face different problems in practical applications. The former is that various and sufficient defects are difficult to obtain, while the latter is that specific defects cannot be located. To solve these problems, in this paper, we focus on the few-shot semantic segmentation (FSS) method, which can locate unseen defects conditioned on a few annotations without retraining. Compared to common objects in natural images, the defects in VII are small. This brings two problems to current FSS methods: 1 distortion of target semantics and 2 many false positives for backgrounds. To alleviate these problems, we propose a small object few-shot segmentation (SOFS) model. The key idea for alleviating 1 is to avoid the resizing of the original image and correctly indicate the intensity of target semantics. SOFS achieves this idea via the non-resizing procedure and the prototype intensity downsampling of support annotations. To alleviate 2, we design an abnormal prior map in SOFS to guide the model to reduce false positives and propose a mixed normal Dice loss to preferentially prevent the model from predicting false positives. SOFS can achieve FSS and few-shot anomaly detection determined by support masks. Diverse experiments substantiate the superior performance of SOFS. Code is available at https://github.com/zhangzilongc/SOFS.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration
Authors:
Xi Chen,
Rahul Bhadani,
Zhanbo Sun,
Larry Head
Abstract:
The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs)…
▽ More
The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs). Our trajectory prediction task is aimed at all the detected surrounding vehicles. To effectively integrate the multi-source data from both sensor and communication technologies, we propose a deep learning framework called MSMA utilizing a cross-attention module for multi-source data fusion. Vector map data is utilized to provide contextual information. The trajectory dataset is collected in CARLA simulator with synthesized data errors introduced. Numerical experiments demonstrate that in a mixed traffic flow scenario, the integration of data from different sources enhances our understanding of the environment. This notably improves trajectory prediction accuracy, particularly in situations with a high CV market penetration rate. The code is available at: https://github.com/xichennn/MSMA.
△ Less
Submitted 2 August, 2024; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Integrated Sensing and Communication in IRS-assisted High-Mobility Systems: Design, Analysis and Optimization
Authors:
Xingyu Peng,
Qin Tao,
Xiaoling Hu,
Richeng Jin,
Chongwen Huang,
Xiaoming Chen
Abstract:
In this paper, we investigate integrated sensing and communication (ISAC) in high-mobility systems with the aid of an intelligent reflecting surface (IRS). To exploit the benefits of Delay-Doppler (DD) spread caused by high mobility, orthogonal time frequency space (OTFS)-based frame structure and transmission framework are proposed. {In such a framework,} we first design a low-complexity ratio-ba…
▽ More
In this paper, we investigate integrated sensing and communication (ISAC) in high-mobility systems with the aid of an intelligent reflecting surface (IRS). To exploit the benefits of Delay-Doppler (DD) spread caused by high mobility, orthogonal time frequency space (OTFS)-based frame structure and transmission framework are proposed. {In such a framework,} we first design a low-complexity ratio-based sensing algorithm for estimating the velocity of mobile user. Then, we analyze the performance of sensing and communication in terms of achievable mean square error (MSE) and achievable rate, respectively, and reveal the impact of key parameters. Next, with the derived performance expressions, we jointly optimize the phase shift matrix of IRS and the receive combining vector at the base station (BS) to improve the overall performance of integrated sensing and communication. Finally, extensive simulation results confirm the effectiveness of the proposed algorithms in high-mobility systems.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Bimodal Scaling Law and Size Effect In Superelastic Nanopillars
Authors:
Mostafa Karami,
Xian Chen
Abstract:
Shape memory alloys that can deform and then spring back to their original shape, have found a wide range of applications in the medical field, from heart valves to stents. As we push the boundaries of technology creating smaller, more precise tools for delicate surgery treatments, the behavior of these alloys at tiny scales becomes increasingly crucial. In this study, we discover that the size ef…
▽ More
Shape memory alloys that can deform and then spring back to their original shape, have found a wide range of applications in the medical field, from heart valves to stents. As we push the boundaries of technology creating smaller, more precise tools for delicate surgery treatments, the behavior of these alloys at tiny scales becomes increasingly crucial. In this study, we discover that the size effect of critical stress required for stress-induced phase transformation is not universal. We propose an orientation-dependent power decay law, indicating a specific increase in critical stress for pillars smaller than 1 micron meter for the nominally soft [001] and hard [111] orientations. Additionally, we observe high transformability with 11\% recoverable strain under high stress (2 GPa) through lattice frustration at 200nm scale. This research opens new avenues for exploring the superior elastic behavior of shape memory alloys for nanodevices.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Shortcuts for Adiabatic and Variational Algorithms in Molecular Simulation
Authors:
Julián Ferreiro-Vélez,
Iñaki Iriarte-Zendoia,
Yue Ban,
Xi Chen
Abstract:
Quantum algorithms are prominent in the pursuit of achieving quantum advantage in various computational tasks. However, addressing challenges, such as limited qubit coherence and high error rate in near-term devices, requires extensive efforts. In this paper, we present a substantial stride in quantum chemistry by integrating shortcuts-to-adiabaticity techniques into adiabatic and variational algo…
▽ More
Quantum algorithms are prominent in the pursuit of achieving quantum advantage in various computational tasks. However, addressing challenges, such as limited qubit coherence and high error rate in near-term devices, requires extensive efforts. In this paper, we present a substantial stride in quantum chemistry by integrating shortcuts-to-adiabaticity techniques into adiabatic and variational algorithms for calculating the molecular ground state. Our approach includes the counter-diabatic driving that accelerates adiabatic evolution by mitigating adiabatic errors. Additionally, we introduce the counter-diabatic terms as the adiabatic gauge ansatz for the variational quantum eigensolver, which exhibits favorable convergence properties with a fewer number of parameters, thereby reducing the circuit depth. Our approach achieves comparable accuracy to other established ansatzes, while enhancing the potential for applications in material science, drug discovery, and molecular simulations.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
On-the-fly Communication-and-Computing to Enable Representation Learning for Distributed Point Clouds
Authors:
Xu Chen,
Hai Wu,
Kaibin Huang
Abstract:
The advent of sixth-generation (6G) mobile networks introduces two groundbreaking capabilities: sensing and artificial intelligence (AI). Sensing leverages multi-modal sensors to capture real-time environmental data, while AI brings powerful models to the network edge, enabling intelligent Internet-of-Things (IoT) applications. These features converge in the Integrated Sensing and Edge AI (ISEA) p…
▽ More
The advent of sixth-generation (6G) mobile networks introduces two groundbreaking capabilities: sensing and artificial intelligence (AI). Sensing leverages multi-modal sensors to capture real-time environmental data, while AI brings powerful models to the network edge, enabling intelligent Internet-of-Things (IoT) applications. These features converge in the Integrated Sensing and Edge AI (ISEA) paradigm, where edge devices collect and locally process sensor data before aggregating it centrally for AI tasks. Point clouds (PtClouds), generated by depth sensors, are crucial in this setup, supporting applications such as autonomous driving and mixed reality. However, the heavy computational load and communication demands of PtCloud fusion pose challenges. To address these, the FlyCom$^2$ framework is proposed, optimizing distributed PtCloud fusion through on-the-fly communication and computing, namely streaming on-sensor processing, progressive data uploading integrated communication-efficient AirComp, and the progressive output of a global PtCloud representation. FlyCom$^2$ distinguishes itself by aligning PtCloud fusion with Gaussian process regression (GPR), ensuring that global PtCloud representation progressively improves as more observations are received. Joint optimization of local observation synthesis and AirComp receiver settings is based on minimizing prediction error, balancing communication distortions, data heterogeneity, and temporal correlation. This framework enhances PtCloud fusion by balancing local processing demands with efficient central aggregation, paving the way for advanced 6G applications. Validation on real-world datasets demonstrates the efficacy of FlyCom$^2$, highlighting its potential in next-generation mobile networks.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Observation of $D^0\to b_1(1235)^- e^+ν_e$ and evidence for $D^+\to b_1(1235)^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (647 additional authors not shown)
Abstract:
By analyzing a data sample of $e^+e^-$ collisions with center-of-mass energy $\sqrt{s}=3.773$ GeV, corresponding to an integrated luminosity of $7.9~\rm {fb}^{-1}$ collected with the BESIII detector operating at the BEPCII collider, we study semileptonic decays of the $D^{0(+)}$ mesons into the axial-vector meson $b_1(1235)$ via the decay $b_1(1235)\to ωπ$. The decay…
▽ More
By analyzing a data sample of $e^+e^-$ collisions with center-of-mass energy $\sqrt{s}=3.773$ GeV, corresponding to an integrated luminosity of $7.9~\rm {fb}^{-1}$ collected with the BESIII detector operating at the BEPCII collider, we study semileptonic decays of the $D^{0(+)}$ mesons into the axial-vector meson $b_1(1235)$ via the decay $b_1(1235)\to ωπ$. The decay $D^0\to b_1(1235)^-e^{+}ν_{e}$ is observed with a significance of 5.2$σ$ after considering systematic uncertainty, while evidence for the decay $D^+\to b_1(1235)^0 e^+ν_e$ is obtained with a 3.1$σ$ significance. The product branching fractions are determined to be ${\mathcal B}(D^0\to b_{1}(1235)^-e^{+}ν_{e})\times {\mathcal B} (b_1(1235)^-\to ωπ^-) = (0.72\pm0.18^{+0.06}_{-0.08})\times10^{-4}$ and ${\mathcal B}(D^+\to b_{1}(1235)^0e^{+}ν_{e})\times {\mathcal B} (b_1(1235)^0~\to ωπ^0) = (1.16\pm0.44\pm0.16)\times10^{-4}$, where the first uncertainties are statistical and the second systematic. The ratio of their partial decay widths is determined to be $\frac{Γ(D^0\to b_{1}(1235)^-e^{+}ν_{e})}{2Γ(D^+\to b_{1}(1235)^0e^{+}ν_{e})}=0.78\pm0.19^{+0.04}_{-0.05}$, which is consistent with unity, predicted by isospin invariance, within uncertainties.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis
Authors:
Yue Pan,
Qile Liu,
Qing Liu,
Li Zhang,
Gan Huang,
Xin Chen,
Fali Li,
Peng Xu,
Zhen Liang
Abstract:
Affective brain-computer interfaces (aBCIs) are increasingly recognized for their potential in monitoring and interpreting emotional states through electroencephalography (EEG) signals. Current EEG-based emotion recognition methods perform well with short segments of EEG data. However, these methods encounter significant challenges in real-life scenarios where emotional states evolve over extended…
▽ More
Affective brain-computer interfaces (aBCIs) are increasingly recognized for their potential in monitoring and interpreting emotional states through electroencephalography (EEG) signals. Current EEG-based emotion recognition methods perform well with short segments of EEG data. However, these methods encounter significant challenges in real-life scenarios where emotional states evolve over extended periods. To address this issue, we propose a Dual Attentive (DuA) transformer framework for long-term continuous EEG emotion analysis. Unlike segment-based approaches, the DuA transformer processes an entire EEG trial as a whole, identifying emotions at the trial level, referred to as trial-based emotion analysis. This framework is designed to adapt to varying signal lengths, providing a substantial advantage over traditional methods. The DuA transformer incorporates three key modules: the spatial-spectral network module, the temporal network module, and the transfer learning module. The spatial-spectral network module simultaneously captures spatial and spectral information from EEG signals, while the temporal network module detects temporal dependencies within long-term EEG data. The transfer learning module enhances the model's adaptability across different subjects and conditions. We extensively evaluate the DuA transformer using a self-constructed long-term EEG emotion database, along with two benchmark EEG emotion databases. On the basis of the trial-based leave-one-subject-out cross-subject cross-validation protocol, our experimental results demonstrate that the proposed DuA transformer significantly outperforms existing methods in long-term continuous EEG emotion analysis, with an average enhancement of 5.28%.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Stronger sum uncertainty relations for non-Hermitian operators
Authors:
Xiao-Feng Song,
Yi-Fang Ren,
Shuang Liu,
Xi-Hao Chen,
Yusuf Turek
Abstract:
Unlike the uncertainty relationships of two arbitrary incompatible observables represented by the product of variances in the past, representing them by the sum of variances is better as it guarantees to be nontrivial for two incompatible operators in some special cases. Although the uncertainty relation is formulated as the sum of variances for unitary operators has been confirmed, its general fo…
▽ More
Unlike the uncertainty relationships of two arbitrary incompatible observables represented by the product of variances in the past, representing them by the sum of variances is better as it guarantees to be nontrivial for two incompatible operators in some special cases. Although the uncertainty relation is formulated as the sum of variances for unitary operators has been confirmed, its general forms for arbitrary non-Hermitian operators have not been yet investigated in detail. Thus, this study develops four sum uncertainty relations for arbitrary non-Hermitian operators acting on system states by utilizing an appropriate Hilbert-space metric. The compatible forms of our sum inequalities with the conventional quantum mechanics are also provided via $G$-metric formalism. Concrete examples demonstrate the validity of the purposed sum uncertainty relations in both $\mathcal{PT}$-symmetric and $\mathcal{PT}$-broken phases. The proposed methods and results can help the reader to understand in-depth the usefulness of $G$-metric formalism in non-Hermitian quantum mechanics and the sum uncertainty relations of incompatible operators within.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models
Authors:
Junda Wu,
Xintong Li,
Tong Yu,
Yu Wang,
Xiang Chen,
Jiuxiang Gu,
Lina Yao,
Jingbo Shang,
Julian McAuley
Abstract:
Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate a backbone LLM with a pre-trained feature encoder for downstream tasks. The major challenge is how to efficiently find the synergy through cooperative learning where LLMs adapt their reasoning abilities in downstream tasks while feature encoders adjust their encoding to provide more relevant modal information…
▽ More
Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate a backbone LLM with a pre-trained feature encoder for downstream tasks. The major challenge is how to efficiently find the synergy through cooperative learning where LLMs adapt their reasoning abilities in downstream tasks while feature encoders adjust their encoding to provide more relevant modal information. In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives, where we find unbalanced learning between the two components, i.e., the feature encoder and the LLM, can cause diminishing learning gradients that slow the model convergence and often lead to sub-optimal results due to insufficient learning. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance, based on which we further design a dynamic learning scheduler that better coordinates the learning. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs considering the learning state of each model component, which potentially prevents each component from gradient diminishing and enables a more accurate estimation of the learning balance coefficient. We conduct experiments with multiple LLM backbones and feature encoders, where our techniques are model-agnostic and can be generically integrated with various MLLM backbones. Experiment results on multiple downstream tasks and modalities in vision and audio, demonstrate the proposed method's better efficiency and effectiveness in MLLM instruction tuning.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Measurement of the $\boldsymbol{e^{+}e^{-}\to K^+K^-ψ(2S)}$ Cross Section at Center-of-Mass Energies from 4.699 to 4.951 GeV and Search for $\boldsymbol{Z_{cs}^{\pm}}$ in the $\boldsymbol{Z_{cs}^\pm\to K^\pmψ(2S)}$ Decay
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (646 additional authors not shown)
Abstract:
We perform the first investigation of the process $e^{+}e^{-}\to K^+K^-ψ(2S)$ and report its Born cross sections over a range of center-of-mass energies from 4.699 to 4.951~GeV. The measurements are carried out using several partial reconstruction techniques using data samples collected by the BESIII detector with a total integrated luminosity of 2.5~fb$^{-1}$. We search for new tetraquark candida…
▽ More
We perform the first investigation of the process $e^{+}e^{-}\to K^+K^-ψ(2S)$ and report its Born cross sections over a range of center-of-mass energies from 4.699 to 4.951~GeV. The measurements are carried out using several partial reconstruction techniques using data samples collected by the BESIII detector with a total integrated luminosity of 2.5~fb$^{-1}$. We search for new tetraquark candidates $Z_{cs}^\pm$ in the decays $Z_{cs}^\pm\to K^\pmψ(2S)$. No significant $Z_{cs}^\pm$ signals are observed.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Cool-Fusion: Fuse Large Language Models without Training
Authors:
Cong Liu,
Xiaojun Quan,
Yan Pan,
Liang Lin,
Weigang Wu,
Xu Chen
Abstract:
We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source…
▽ More
We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source LLMs to leverage their complementary strengths. \emph{Cool-Fusion} is the first method that does not require any type of training like the ensemble approaches. But unlike ensemble methods, it is applicable to any set of source LLMs that have different vocabularies. The basic idea is to have each source LLM individually generate tokens until the tokens can be decoded into a text segment that ends at word boundaries common to all source LLMs. Then, the source LLMs jointly rerank the generated text segment and select the best one, which is the fused text generation in one step. Extensive experiments are conducted across a variety of benchmark datasets. On \emph{GSM8K}, \emph{Cool-Fusion} increases accuracy from three strong source LLMs by a significant 8\%-17.8\%.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Contextuality Helps Representation Learning for Generalized Category Discovery
Authors:
Tingzhang Luo,
Mingxuan Du,
Jiatao Shi,
Xinxiang Chen,
Bingchen Zhao,
Shaoguang Huang
Abstract:
This paper introduces a novel approach to Generalized Category Discovery (GCD) by leveraging the concept of contextuality to enhance the identification and classification of categories in unlabeled datasets. Drawing inspiration from human cognition's ability to recognize objects within their context, we propose a dual-context based method.
Our model integrates two levels of contextuality: instan…
▽ More
This paper introduces a novel approach to Generalized Category Discovery (GCD) by leveraging the concept of contextuality to enhance the identification and classification of categories in unlabeled datasets. Drawing inspiration from human cognition's ability to recognize objects within their context, we propose a dual-context based method.
Our model integrates two levels of contextuality: instance-level, where nearest-neighbor contexts are utilized for contrastive learning, and cluster-level, employing prototypical contrastive learning based on category prototypes. The integration of the contextual information effectively improves the feature learning and thereby the classification accuracy of all categories, which better deals with the real-world datasets. Different from the traditional semi-supervised and novel category discovery techniques, our model focuses on a more realistic and challenging scenario where both known and novel categories are present in the unlabeled data. Extensive experimental results on several benchmark data sets demonstrate that the proposed model outperforms the state-of-the-art. Code is available at: https://github.com/Clarence-CV/Contexuality-GCD
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Robust Beamforming Design for Integrated Satellite-Terrestrial Maritime Communications in the Presence of Wave Fluctuation
Authors:
Kaiwei Xiong,
Xiaoming Chen,
Ming Ying
Abstract:
In order to provide wireless services for wide sea area, this paper designs an integrated satellite-terrestrial maritime communication framework. Specifically, the terrestrial base station (TBS) serves near-shore users, while the low earth orbit (LEO) satellite communicates with off-shore users. We aim to improve the overall performance of integrated satellite-terrestrial maritime communication sy…
▽ More
In order to provide wireless services for wide sea area, this paper designs an integrated satellite-terrestrial maritime communication framework. Specifically, the terrestrial base station (TBS) serves near-shore users, while the low earth orbit (LEO) satellite communicates with off-shore users. We aim to improve the overall performance of integrated satellite-terrestrial maritime communication system. Thus, it makes sense to jointly optimize transmit beamforming at the TBS and LEO satellite. Due to sea wave fluctuation, the obtained channel state information (CSI) is often imperfect. In this context, a robust beamforming design algorithm is proposed with the goal of minimizing the total power consumption of integrated satellite-terrestrial maritime communication system while satisfying quality of service (QoS) requirements. Both theoretical analysis and simulation results confirm the effectiveness of proposed algorithm in maritime communications.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Synthetic monopole with half-integer magnetic charge in Bose-Einstein condensates
Authors:
Xi-Yu Chen,
Lijia Jiang,
Wen-Kai Bai,
Tao Yang,
Jun-Hui Zheng
Abstract:
We propose a scheme to create monopoles with half-integer magnetic charges in a spinful cold atom system. With a minimal monopole in the center, we derive the ground-state single-vortex wave function on the sphere and develop the vortex's kinematic equation in the presence of an external electromagnetic field. The vortex's trajectory is generally depicted by the precession of the system. We furthe…
▽ More
We propose a scheme to create monopoles with half-integer magnetic charges in a spinful cold atom system. With a minimal monopole in the center, we derive the ground-state single-vortex wave function on the sphere and develop the vortex's kinematic equation in the presence of an external electromagnetic field. The vortex's trajectory is generally depicted by the precession of the system. We further formulate the inter-vortex interaction and build up a theory of multi-vortex dynamics in high-charge monopole systems. We predict the vortices'trajectory in the bi-vortex system and figure out stable vortex (line) patterns in multi-vortex systems. Our study provides deep insights into properties of magnetic monopoles and vortices and paves the way for experimental verification.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.