Search | arXiv e-print repository

Modeling Domain and Feedback Transitions for Cross-Domain Sequential Recommendation

Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Qi Liu, Ruobing Xie, Jun Xu, Ji-Rong Wen

Abstract: Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain… ▽ More Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain sequential recommendation methods typically model user interests by focusing solely on information about domain transitions, often overlooking the valuable insights provided by users' feedback transitions. In this paper, we propose $\text{Transition}^2$, a novel method to model transitions across both domains and types of user feedback. Specifically, $\text{Transition}^2$ introduces a transition-aware graph encoder based on user history, assigning different weights to edges according to the feedback type. This enables the graph encoder to extract historical embeddings that capture the transition information between different domains and feedback types. Subsequently, we encode the user history using a cross-transition multi-head self-attention, incorporating various masks to distinguish different types of transitions. Finally, we integrate these modules to make predictions across different domains. Experimental results on two public datasets demonstrate the effectiveness of $\text{Transition}^2$. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.08192 [pdf, other]

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Authors: Chenyu Zhang, Xu Chen, Xuan Di

Abstract: Mean field games (MFGs) model the interactions within a large-population multi-agent system using the population distribution. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), which calculates best responses and induced population distribution separately and sequentially. However, FPI-type methods suffer from inefficiency and instability, due to oscillations caused b… ▽ More Mean field games (MFGs) model the interactions within a large-population multi-agent system using the population distribution. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), which calculates best responses and induced population distribution separately and sequentially. However, FPI-type methods suffer from inefficiency and instability, due to oscillations caused by the forward-backward procedure. This paper considers an online learning method for MFGs, where an agent updates its policy and population estimates simultaneously and fully asynchronously, resulting in a simple stochastic gradient descent (SGD) type method called SemiSGD. Not only does SemiSGD exhibit numerical stability and efficiency, but it also provides a novel perspective by treating the value function and population distribution as a unified parameter. We theoretically show that SemiSGD directs this unified parameter along a descent direction to the mean field equilibrium. Motivated by this perspective, we develop a linear function approximation (LFA) for both the value function and the population distribution, resulting in the first population-aware LFA for MFGs on continuous state-action space. Finite-time convergence and approximation error analysis are provided for SemiSGD equipped with population-aware LFA. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.08120 [pdf, other]

Study of non-diffusive thermal behaviors in nanoscale transistors under different heating strategies

Authors: Chuang Zhang, Ziyang Xin, Qin Lou, Hong Liang

Abstract: Understanding the phonon transport mechanisms and efficiently capturing the spatiotemporal distributions of temperature is of great significance for alleviating hotspot issues in the electronic devices. Most previous simulations mainly focused on the steady-state problem with continuous heating, and the effective Fourier's law (EFL) is widely used for practical multiscale thermal engineering due t… ▽ More Understanding the phonon transport mechanisms and efficiently capturing the spatiotemporal distributions of temperature is of great significance for alleviating hotspot issues in the electronic devices. Most previous simulations mainly focused on the steady-state problem with continuous heating, and the effective Fourier's law (EFL) is widely used for practical multiscale thermal engineering due to its simplicity and efficiency although it still follows the diffusive rule. However, non-continuous heating is more common in the electronic devices, and few comparative study is conducted to estimate how much deviation the EFL would produce. To answer above questions, the heat conduction in nanoscale bulk or silicon-on-insulator (SOI) transistors is investigated by the phonon Boltzmann transport equation (BTE) under three heating strategies, namely, `Continuous', `Intermittent' and `Alternating' heating. Numerical results in the quasi-2D or 3D hotspot systems show that it is not easy to accurately capture the micro/nano scale heat conduction by the EFL, especially near the hotspot regions. Different heating strategies have great influence on the temperature rise and transient thermal dissipation process. Compared to `Intermittent' heating, the temperature variance of `Alternating' heating is smaller. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: 25 pages, 52 figures

MSC Class: 82D37; 80A05 80A19

arXiv:2408.08105 [pdf, other]

Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

Authors: Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai

Abstract: Large Language Models (LLMs) have showcased exceptional ability in causal reasoning from textual information. However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided? Motivated by this, we propose a novel Multimodal Causal Reasoning benchmark, namely MuCR, to challenge VLLMs to infer semantic cause-and-effect relationship… ▽ More Large Language Models (LLMs) have showcased exceptional ability in causal reasoning from textual information. However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided? Motivated by this, we propose a novel Multimodal Causal Reasoning benchmark, namely MuCR, to challenge VLLMs to infer semantic cause-and-effect relationship when solely relying on visual cues such as action, appearance, clothing, and environment. Specifically, we introduce a prompt-driven image synthesis approach to create siamese images with embedded semantic causality and visual cues, which can effectively evaluate VLLMs' causal reasoning capabilities. Additionally, we develop tailored metrics from multiple perspectives, including image-level match, phrase-level understanding, and sentence-level explanation, to comprehensively assess VLLMs' comprehension abilities. Our extensive experiments reveal that the current state-of-the-art VLLMs are not as skilled at multimodal causal reasoning as we might have hoped. Furthermore, we perform a comprehensive analysis to understand these models' shortcomings from different views and suggest directions for future research. We hope MuCR can serve as a valuable resource and foundational benchmark in multimodal causal reasoning research. The project is available at: https://github.com/Zhiyuan-Li-John/MuCR △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: 20 pages

arXiv:2408.07733 [pdf, other]

Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack

Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

Abstract: In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential f… ▽ More In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential for further reduction. Given the essence of adversarial attacks is to impair model integrity with minimal noise on original samples, exploring avenues to maximize the utility of such perturbations is imperative. Against this backdrop, we have delved into the complexities of adversarial attack algorithms, dissecting the adversarial process into two critical phases: the Directional Supervision Process (DSP) and the Directional Optimization Process (DOP). While DSP determines the direction of updates based on the current samples and model parameters, it has been observed that existing model parameters may not always be conducive to adversarial attacks. The impact of models on adversarial efficacy is often overlooked in current research, leading to the neglect of DSP. We propose that under certain conditions, fine-tuning model parameters can significantly enhance the quality of DSP. For the first time, we propose that under certain conditions, fine-tuning model parameters can significantly improve the quality of the DSP. We provide, for the first time, rigorous mathematical definitions and proofs for these conditions, and introduce multiple methods for fine-tuning model parameters within DSP. Our extensive experiments substantiate the effectiveness of the proposed P3A method. Our code is accessible at: https://anonymous.4open.science/r/P3A-A12C/ △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.07605 [pdf, other]

Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency and increased resolution. Extensive experiments show that the generated video samples from Panacea+ greatly benefit a wide range of tasks on different datasets, including 3D object tracking, 3D object detection, and lane detection tasks on the nuScenes and Argoverse 2 dataset. These results strongly prove Panacea+ to be a valuable data generation framework for autonomous driving. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

arXiv:2408.07403 [pdf, ps, other]

Generating Fock-state superposition from coherent state by quantum measurement

Authors: Chen-yi Zhang, Jun Jing

Abstract: High-level Fock states and their superpositions are essentially exotic testbeds for nonclassical physics and valuable resources for quantum technologies. We provide a simple protocol on quantum measurement to generate an arbitrary Fock state and selected superposed Fock states from a coherent state of a target resonator, without any carefully tailored external driving. This conditional protocol ca… ▽ More High-level Fock states and their superpositions are essentially exotic testbeds for nonclassical physics and valuable resources for quantum technologies. We provide a simple protocol on quantum measurement to generate an arbitrary Fock state and selected superposed Fock states from a coherent state of a target resonator, without any carefully tailored external driving. This conditional protocol can be efficiently constructed by a sequence of joint free-evolution of the resonator and an ancillary qubit, that are coupled via a Jaynes-Cummings interaction, and projective measurements on the qubit. By properly choosing the duration of each evolution-measurement cycle and the initial state of the resonator, we can generate a desired Fock state $|n\rangle$ and a superposed Fock state $(|0\rangle+|n\rangle)/\sqrt{2}$, $n\sim10$, with a fidelity over $99\%$ in less than $30$ measurements. Moreover, our protocol can be straightforwardly extended to the generation of a multi-excitation Bell state $(|00\rangle+|nn\rangle)/\sqrt{2}$ in a double-resonator system. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 11 pages, 11 figures

arXiv:2408.07401 [pdf, other]

DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

Authors: Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong

Abstract: Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in… ▽ More Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in free form (i.e. FeVisQA), and explicating tabular data (i.e., table-to-text), is vital for advancing the field. Despite their potential, the application of pre-trained language models (PLMs) like T5 and BERT in DV has been limited by high costs and challenges in handling cross-modal information, leading to few studies on PLMs for DV. We introduce \textbf{DataVisT5}, a novel PLM tailored for DV that enhances the T5 architecture through a hybrid objective pre-training and multi-task fine-tuning strategy, integrating text and DV datasets to effectively interpret cross-modal semantics. Extensive evaluations on public datasets show that DataVisT5 consistently outperforms current state-of-the-art models on various DV-related tasks. We anticipate that DataVisT5 will not only inspire further research on vertical PLMs but also expand the range of applications for PLMs. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.07342 [pdf]

Evidence of P-wave Pairing in K2Cr3As3 Superconductors from Phase-sensitive Measurement

Authors: Zhiyuan Zhang, Ziwei Dou, Anqi Wang, Cuiwei Zhang, Yu Hong, Xincheng Lei, Yue Pan, Zhongchen Xu, Zhipeng Xu, Yupeng Li, Guoan Li, Xiaofan Shi, Xingchen Guo, Xiao Deng, Zhaozheng Lyu, Peiling Li, Faming Qu, Guangtong Liu, Dong Su, Kun Jiang, Youguo Shi, Li Lu, Jie Shen, Jiangping Hu

Abstract: P-wave superconductors hold immense promise for both fundamental physics and practical applications due to their unusual pairing symmetry and potential topological superconductivity. However, the exploration of the p-wave superconductors has proved to be a complex endeavor. Not only are they rare in nature but also the identification of p-wave superconductors has been an arduous task in history. F… ▽ More P-wave superconductors hold immense promise for both fundamental physics and practical applications due to their unusual pairing symmetry and potential topological superconductivity. However, the exploration of the p-wave superconductors has proved to be a complex endeavor. Not only are they rare in nature but also the identification of p-wave superconductors has been an arduous task in history. For example, phase-sensitive measurement, an experimental technique which can provide conclusive evidence for unconventional pairing, has not been implemented successfully to identify p-wave superconductors. Here, we study a recently discovered family of superconductors, A2Cr3As3 (A = K, Rb, Cs), which were proposed theoretically to be a candidate of p-wave superconductors. We fabricate superconducting quantum interference devices (SQUIDs) on exfoliated K2Cr3As3, and perform the phase-sensitive measurement. We observe that such SQUIDs exhibit a pronounced second-order harmonic component sin(2φ) in the current-phase relation, suggesting the admixture of 0- and π-phase. By carefully examining the magnetic field dependence of the oscillation patterns of critical current and Shapiro steps under microwave irradiation, we reveal a crossover from 0- to π-dominating phase state and conclude that the existence of the π-phase is in favor of the p-wave pairing symmetry in K2Cr3As3. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.06901 [pdf, other]

Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries

Authors: Qi Song, Qingyong Hu, Chi Zhang, Yongquan Chen, Rui Huang

Abstract: 3D perception tasks, such as 3D object detection and Bird's-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic and 3D scene layouts are crucial for this task, existing techniques often neglect the synergistic effects of semantic and depth cues, leading to the occurrence of classification and po… ▽ More 3D perception tasks, such as 3D object detection and Bird's-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic and 3D scene layouts are crucial for this task, existing techniques often neglect the synergistic effects of semantic and depth cues, leading to the occurrence of classification and position estimation errors. Additionally, the input-independent nature of initial queries also limits the learning capacity of Transformer-based models. To tackle these challenges, we propose an input-aware Transformer framework that leverages Semantics and Depth as priors (named SDTR). Our approach involves the use of an S-D Encoder that explicitly models semantic and depth priors, thereby disentangling the learning process of object categorization and position estimation. Moreover, we introduce a Prior-guided Query Builder that incorporates the semantic prior into the initial queries of the Transformer, resulting in more effective input-aware queries. Extensive experiments on the nuScenes and Lyft benchmarks demonstrate the state-of-the-art performance of our method in both 3D object detection and BEV segmentation tasks. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: Accepted by TIP 2024

arXiv:2408.06677 [pdf, other]

Search for $η_c(2S)\toωω$ and $ωφ$ decays and measurements of $χ_{cJ}\toωω$ and $ωφ$ in $ψ(2S)$ radiative processes

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $(2712\pm 14)$ $\times$ 10$^{6}$ $ψ(2S)$ events collected with the BESIII detector at the BEPCII collider, we search for the decays $η_{c}(2S)\toωω$ and $η_{c}(2S)\toωφ$ via the process $ψ(2S)\toγη_{c}(2S)$. Evidence of $η_{c}(2S)\toωω$ is found with a statistical significance of $3.2σ$. The branching fraction is measured to be… ▽ More Using $(2712\pm 14)$ $\times$ 10$^{6}$ $ψ(2S)$ events collected with the BESIII detector at the BEPCII collider, we search for the decays $η_{c}(2S)\toωω$ and $η_{c}(2S)\toωφ$ via the process $ψ(2S)\toγη_{c}(2S)$. Evidence of $η_{c}(2S)\toωω$ is found with a statistical significance of $3.2σ$. The branching fraction is measured to be $\mathcal{B}(η_{c}(2S)\toωω)=(5.65\pm3.77(\rm stat.)\pm5.32(\rm syst.))\times10^{-4}$. No statistically significant signal is observed for the decay $η_{c}(2S)\toωφ$. The upper limit of the branching fraction at the 90\% confidence level is determined to be $\mathcal{B}(ψ(2S)\toγη_{c}(2S),η_{c}(2S)\toωφ)<2.24\times 10^{-7}$. We also update the branching fractions of $χ_{cJ}\to ωω$ and $χ_{cJ}\toωφ$ decays via the $ψ(2S)\toγχ_{cJ}$ transition. The branching fractions are determined to be $\mathcal{B}(χ_{c0}\toωω)=(10.63\pm0.11\pm0.46)\times 10^{-4}$, $\mathcal{B}(χ_{c1}\toωω)=(6.39\pm0.07\pm0.29)\times 10^{-4}$, $\mathcal{B}(χ_{c2}\toωω)=(8.50\pm0.08\pm0.38)\times 10^{-4}$, $\mathcal{B}(χ_{c0}\toωφ)=(1.18\pm0.03\pm0.05)\times 10^{-4}$, $\mathcal{B}(χ_{c1}\toωφ)=(2.03\pm0.15\pm0.12)\times 10^{-5}$, and $\mathcal{B}(χ_{c2}\toωφ)=(9.37\pm1.07\pm0.59)\times 10^{-6}$, where the first uncertainties are statistical and the second are systematic. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06660 [pdf, other]

The maximal coarse Baum-Connes conjecture for spaces that admit an A-by-FCE coarse fibration structure

Authors: Liang Guo, Qin Wang, Chen Zhang

Abstract: In this paper, we introduce the concept of an A-by-FCE coarse fibration structure for metric spaces, which serves as a generalization of the A-by-CE structure for a sequence of group extensions proposed by Deng, Wang, and Yu. We show that the maximal coarse Baum-Connes conjecture holds for metric spaces with bounded geometry that admit an A-by-FCE coarse fibration structure. As an application, the… ▽ More In this paper, we introduce the concept of an A-by-FCE coarse fibration structure for metric spaces, which serves as a generalization of the A-by-CE structure for a sequence of group extensions proposed by Deng, Wang, and Yu. We show that the maximal coarse Baum-Connes conjecture holds for metric spaces with bounded geometry that admit an A-by-FCE coarse fibration structure. As an application, the relative expanders constructed by Arzhantseva and Tessera, as well as the box space derived from an extension of Haagerup groups by amenable groups, are shown to exhibit the A-by-FCE coarse fibration structure. Consequently, their maximal coarse Baum-Connes conjectures are affirmed. △ Less

Submitted 13 August, 2024; originally announced August 2024.

MSC Class: 19K56; 46L80

arXiv:2408.06654 [pdf, other]

Advancing Nonadiabatic Molecular Dynamics Simulations for Solids: Achieving Supreme Accuracy and Efficiency with Machine Learning

Authors: Changwei Zhang, Yang Zhong, Zhi-Guo Tao, Xinming Qing, Honghui Shang, Zhenggang Lan, Oleg V. Prezhdo, Xin-Gao Gong, Weibin Chu, Hongjun Xiang

Abstract: Non-adiabatic molecular dynamics (NAMD) simulations have become an indispensable tool for investigating excited-state dynamics in solids. In this work, we propose a general framework, N$^2$AMD which employs an E(3)-equivariant deep neural Hamiltonian to boost the accuracy and efficiency of NAMD simulations. The preservation of Euclidean symmetry of Hamiltonian enables N$^2$AMD to achieve state-of-… ▽ More Non-adiabatic molecular dynamics (NAMD) simulations have become an indispensable tool for investigating excited-state dynamics in solids. In this work, we propose a general framework, N$^2$AMD which employs an E(3)-equivariant deep neural Hamiltonian to boost the accuracy and efficiency of NAMD simulations. The preservation of Euclidean symmetry of Hamiltonian enables N$^2$AMD to achieve state-of-the-art performance. Distinct from conventional machine learning methods that predict key quantities in NAMD, N$^2$AMD computes these quantities directly with a deep neural Hamiltonian, ensuring supreme accuracy, efficiency, and consistency. Furthermore, N$^2$AMD demonstrates excellent generalizability and enables seamless integration with advanced NAMD techniques and infrastructures. Taking several extensively investigated semiconductors as the prototypical system, we successfully simulate carrier recombination in both pristine and defective systems at large scales where conventional NAMD often significantly underestimates or even qualitatively incorrectly predicts lifetimes. This framework not only boosts the efficiency and precision of NAMD simulations but also opens new avenues to advance materials research. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: 25 pages, 7 figures

arXiv:2408.06385 [pdf, other]

ViC: Virtual Compiler Is All You Need For Assembly Code Search

Authors: Zeyu Gao, Hao Wang, Yuanda Wang, Chao Zhang

Abstract: Assembly code search is vital for reducing the burden on reverse engineers, allowing them to quickly identify specific functions using natural language within vast binary programs. Despite its significance, this critical task is impeded by the complexities involved in building high-quality datasets. This paper explores training a Large Language Model (LLM) to emulate a general compiler. By leverag… ▽ More Assembly code search is vital for reducing the burden on reverse engineers, allowing them to quickly identify specific functions using natural language within vast binary programs. Despite its significance, this critical task is impeded by the complexities involved in building high-quality datasets. This paper explores training a Large Language Model (LLM) to emulate a general compiler. By leveraging Ubuntu packages to compile a dataset of 20 billion tokens, we further continue pre-train CodeLlama as a Virtual Compiler (ViC), capable of compiling any source code of any language to assembly code. This approach allows for virtual compilation across a wide range of programming languages without the need for a real compiler, preserving semantic equivalency and expanding the possibilities for assembly code dataset construction. Furthermore, we use ViC to construct a sufficiently large dataset for assembly code search. Employing this extensive dataset, we achieve a substantial improvement in assembly code search performance, with our model surpassing the leading baseline by 26%. △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2408.06294 [pdf, other]

AniBalloons: Animated Chat Balloons as Affective Augmentation for Social Messaging and Chatbot Interaction

Authors: Pengcheng An, Chaoyu Zhang, Haichen Gao, Ziqi Zhou, Yage Xiao, Jian Zhao

Abstract: Despite being prominent and ubiquitous, message-based interaction is limited in nonverbally conveying emotions. Besides emoticons or stickers, messaging users continue seeking richer options for affective communication. Recent research explored using chat balloons' shape and color to communicate emotional states. However, little work explored whether and how chat-balloon animations could be design… ▽ More Despite being prominent and ubiquitous, message-based interaction is limited in nonverbally conveying emotions. Besides emoticons or stickers, messaging users continue seeking richer options for affective communication. Recent research explored using chat balloons' shape and color to communicate emotional states. However, little work explored whether and how chat-balloon animations could be designed to convey emotions. We present the design of AniBalloons, 30 chat-balloon animations conveying Joy, Anger, Sadness, Surprise, Fear, and Calmness. Using AniBalloons as a research means, we conducted three studies to assess the animations' affect recognizability and emotional properties (N = 40), and probe how animated chat balloons would influence communication experience in typical scenarios including instant messaging (N = 72) and chatbot service (N = 70). Our exploration contributes a set of chat-balloon animations to complement non-nonverbal affective communication for a range of message-based interfaces, and empirical insights into how animated chat balloons might mediate particular conversation experiences (e.g., perceived interpersonal closeness, or chatbot personality). △ Less

Submitted 14 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

Comments: under the 2nd review after minor revision by International Journal of Human-Computer Studies

arXiv:2408.06164 [pdf, other]

Prototyping and Experimental Results for ISAC-based Channel Knowledge Map

Authors: Chaoyue Zhang, Zhiwen Zhou, Xiaoli Xu, Yong Zeng, Zaichen Zhang, Shi Jin

Abstract: Channel knowledge map (CKM) is a novel approach for achieving environment-aware communication and sensing. This paper presents an integrated sensing and communication (ISAC)-based CKM prototype system, demonstrating the mutualistic relationship between ISAC and CKM. The system consists of an ISAC base station (BS), a user equipment (UE), and a server. By using a shared orthogonal frequency divisio… ▽ More Channel knowledge map (CKM) is a novel approach for achieving environment-aware communication and sensing. This paper presents an integrated sensing and communication (ISAC)-based CKM prototype system, demonstrating the mutualistic relationship between ISAC and CKM. The system consists of an ISAC base station (BS), a user equipment (UE), and a server. By using a shared orthogonal frequency division multiplexing (OFDM) waveform over the millimeter wave (mmWave) band, the ISAC BS is able to communicate with the UE while simultaneously sensing the environment and acquiring the UE's location. The prototype showcases the complete process of the construction and application of the ISAC-based CKM. For CKM construction phase, the BS stores the UE's channel feedback information in a database indexed by the UE's location, including beam indices and channel gain. For CKM application phase, the BS looks up the best beam index from the CKM based on the UE's location to achieve training-free mmWave beam alignment. The experimental results show that ISAC can be used to construct or update CKM while communicating with UEs, and the pre-learned CKM can assist ISAC for training-free beam alignment. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.05878 [pdf]

Drone based superconducting single photon detection system with detection efficiency more than 90%

Authors: Ruoyan Ma, Zhimin Guo, Dai Chen, Xiaojun Dai, You Xiao, ChengJun Zhang, Jiamin Xiong, Jia Huang, Xingyu Zhang, Xiaoyu Liu, Liangliang Rong, Hao Li, Xiaofu Zhang, Lixing You

Abstract: Bounded by the size, weight, and power consumption (SWaP) of conventional superconducting single photon detectors (SSPD), applications of SSPDs were commonly confined in the laboratory. However, booming demands for high efficiency single photon detector incorporated with avionic platforms arise with the development of remote imaging and sensing or long-haul quantum communication without topographi… ▽ More Bounded by the size, weight, and power consumption (SWaP) of conventional superconducting single photon detectors (SSPD), applications of SSPDs were commonly confined in the laboratory. However, booming demands for high efficiency single photon detector incorporated with avionic platforms arise with the development of remote imaging and sensing or long-haul quantum communication without topographical constraints. We herein designed and manufactured the first drone based SSPD system with a SDE as high as 91.8%. This drone based SSPD system is established with high performance NbTiN SSPDs, self-developed miniature liquid helium dewar, and homemade integrated electric setups, which is able to be launched in complex topographical conditions. Such a drone based SSPD system may open the use of SSPDs for applications that demand high-SDE in complex environments. △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2408.05758 [pdf, other]

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the speech modality. We propose a method called "Vector Quantized Contrastive Token-Acoustic Pre-training (VQ-CTAP)", which uses the cross-modal aligned sequence transcoder to bring text and speech into a joint multimodal space, learning how to connect text and speech at the frame level. The proposed VQ-CTAP is a paradigm for cross-modal sequence representation learning, offering a promising solution for fine-grained generation and recognition tasks in speech processing. The VQ-CTAP can be directly applied to VC and ASR tasks without fine-tuning or additional structures. We propose a sequence-aware semantic connector, which connects multiple frozen pre-trained modules for the TTS task, exhibiting a plug-and-play capability. We design a stepping optimization strategy to ensure effective model convergence by gradually injecting and adjusting the influence of various loss components. Furthermore, we propose a semantic-transfer-wise paralinguistic consistency loss to enhance representational capabilities, allowing the model to better generalize to unseen data and capture the nuances of paralinguistic information. In addition, VQ-CTAP achieves high-compression speech coding at a rate of 25Hz from 24kHz input waveforms, which is a 960-fold reduction in the sampling rate. The audio demo is available at https://qiangchunyu.github.io/VQCTAP/ △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2408.05726 [pdf]

Superconductivity Discovered in Niobium Polyhydride at High Pressures

Authors: X. He, C. L. Zhang, Z. W. Li, K. Lu, S. J. Zhang, B. S. Min, J. Zhang, L. C. Shi, S. M. Feng, Q. Q. Liu, J. Song, X. C. Wang, Y. Peng, L. H. Wang, V. B. Prakapenka, S. Chariton, H. Z. Liu, C. Q. Jin

Abstract: Niobium polyhydride was synthesized at high pressure and high temperature conditions by using diamond anvil cell combined with in situ high pressure laser heating techniques. High pressure electric transport experiments demonstrate that superconducting transition occurs with critical temperature(Tc) 42 K at 187 GPa. The shift of Tc as function of external applied magnetic field is in consistent to… ▽ More Niobium polyhydride was synthesized at high pressure and high temperature conditions by using diamond anvil cell combined with in situ high pressure laser heating techniques. High pressure electric transport experiments demonstrate that superconducting transition occurs with critical temperature(Tc) 42 K at 187 GPa. The shift of Tc as function of external applied magnetic field is in consistent to the nature of superconductivity while the upper critical field at zero temperature Hc2(0) is estimated to~16.8 Tesla while the GL coherent length ~57 angstrom is estimated. The structure investigation using synchrotron radiation implies that the observed superconductivity may come from Fm-3m phase of NbH3. △ Less

Submitted 19 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

Comments: Accepted by Materials Today Physics

arXiv:2408.05705 [pdf, other]

TC-KANRecon: High-Quality and Accelerated MRI Reconstruction via Adaptive KAN Mechanisms and Intelligent Feature Scaling

Authors: Ruiquan Ge, Xiao Yu, Yifei Chen, Fan Jia, Shenghao Zhu, Guanyu Zhou, Yiyu Huang, Chenyan Zhang, Dong Zeng, Changmiao Wang, Qiegen Liu, Shanzhou Niu

Abstract: Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic… ▽ More Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic clipping strategy. TC-KANRecon model aims to accelerate the MRI reconstruction process through deep learning methods while maintaining the quality of the reconstructed images. The MF-UKAN module can effectively balance the tradeoff between image denoising and structure preservation. Specifically, it presents the multi-head attention mechanisms and scalar modulation factors, which significantly enhances the model's robustness and structure preservation capabilities in complex noise environments. Moreover, the dynamic clipping strategy in TC-KANRecon adjusts the cropping interval according to the sampling steps, thereby mitigating image detail loss typically caused by traditional cropping methods and enriching the visual features of the images. Furthermore, the MC-Model module incorporates full-sampling k-space information, realizing efficient fusion of conditional information, enhancing the model's ability to process complex data, and improving the realism and detail richness of reconstructed images. Experimental results demonstrate that the proposed method outperforms other MRI reconstruction methods in both qualitative and quantitative evaluations. Notably, TC-KANRecon method exhibits excellent reconstruction results when processing high-noise, low-sampling-rate MRI data. Our source code is available at https://github.com/lcbkmm/TC-KANRecon. △ Less

Submitted 11 August, 2024; originally announced August 2024.

Comments: 10 pages, 3 figures

arXiv:2408.05584 [pdf]

Dynamical causality under invisible confounders

Authors: Jinling Yan, Shao-Wu Zhang, Chihao Zhang, Weitian Huang, Jifan Shi, Luonan Chen

Abstract: Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,… ▽ More Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result, accurately inferring causation with invisible confounders remains a largely unexplored and outstanding issue in data science and AI fields. In this work, we propose a method to overcome such challenges to infer dynamical causality under invisible confounders (CIC method) and further reconstruct the invisible confounders from time-series data by developing an orthogonal decomposition theorem in a delay embedding space. The core of our CIC method lies in its ability to decompose the observed variables not in their original space but in their delay embedding space into the common and private subspaces respectively, thereby quantifying causality between those variables both theoretically and computationally. This theoretical foundation ensures the causal detection for any high-dimensional system even with only two observed variables under many invisible confounders, which is actually a long-standing problem in the field. In addition to the invisible confounder problem, such a decomposition actually makes the intertwined variables separable in the embedding space, thus also solving the non-separability problem of causal inference. Extensive validation of the CIC method is carried out using various real datasets, and the experimental results demonstrates its effectiveness to reconstruct real biological networks even with unobserved confounders. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: 23 pages, 5 figures

arXiv:2408.05508 [pdf, other]

PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture

Authors: Qiang Zheng, Chao Zhang, Jian Sun

Abstract: In recent years, point cloud analysis methods based on the Transformer architecture have made significant progress, particularly in the context of multimedia applications such as 3D modeling, virtual reality, and autonomous systems. However, the high computational resource demands of the Transformer architecture hinder its scalability, real-time processing capabilities, and deployment on mobile de… ▽ More In recent years, point cloud analysis methods based on the Transformer architecture have made significant progress, particularly in the context of multimedia applications such as 3D modeling, virtual reality, and autonomous systems. However, the high computational resource demands of the Transformer architecture hinder its scalability, real-time processing capabilities, and deployment on mobile devices and other platforms with limited computational resources. This limitation remains a significant obstacle to its practical application in scenarios requiring on-device intelligence and multimedia processing. To address this challenge, we propose an efficient point cloud analysis architecture, \textbf{Point} \textbf{M}LP-\textbf{T}ransformer (PointMT). This study tackles the quadratic complexity of the self-attention mechanism by introducing a linear complexity local attention mechanism for effective feature aggregation. Additionally, to counter the Transformer's focus on token differences while neglecting channel differences, we introduce a parameter-free channel temperature adaptation mechanism that adaptively adjusts the attention weight distribution in each channel, enhancing the precision of feature aggregation. To improve the Transformer's slow convergence speed due to the limited scale of point cloud datasets, we propose an MLP-Transformer hybrid module, which significantly enhances the model's convergence speed. Furthermore, to boost the feature representation capability of point tokens, we refine the classification head, enabling point tokens to directly participate in prediction. Experimental results on multiple evaluation benchmarks demonstrate that PointMT achieves performance comparable to state-of-the-art methods while maintaining an optimal balance between performance and accuracy. △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2408.05233 [pdf, other]

Large Language Model based Agent Framework for Electric Vehicle Charging Behavior Simulation

Authors: Junkang Feng, Chenggang Cui, Chuanlin Zhang, Zizhu Fan

Abstract: This paper introduces a new LLM based agent framework for simulating electric vehicle (EV) charging behavior, integrating user preferences, psychological characteristics, and environmental factors to optimize the charging process. The framework comprises several modules, enabling sophisticated, adaptive simulations. Dynamic decision making is supported by continuous reflection and memory updates,… ▽ More This paper introduces a new LLM based agent framework for simulating electric vehicle (EV) charging behavior, integrating user preferences, psychological characteristics, and environmental factors to optimize the charging process. The framework comprises several modules, enabling sophisticated, adaptive simulations. Dynamic decision making is supported by continuous reflection and memory updates, ensuring alignment with user expectations and enhanced efficiency. The framework's ability to generate personalized user profiles and real-time decisions offers significant advancements for urban EV charging management. Future work could focus on incorporating more intricate scenarios and expanding data sources to enhance predictive accuracy and practical utility. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 7 pages,3 figures

arXiv:2408.05134 [pdf, other]

Observation of muonic Dalitz decays of $χ_{b}$ mesons and precise spectroscopy of hidden-beauty states

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1114 additional authors not shown)

Abstract: The decays of the $χ_{b1}(1P)$, $χ_{b2}(1P)$, $χ_{b1}(2P)$ and $χ_{b2}(2P)$~mesons into the~$Υ(1S)μ^+μ^-$ final state are observed with a high significance using proton-proton collision data collected with the LHCb detector and corresponding to an integrated luminosity of 9fb$^{-1}$. The newly observed decays together with the $Υ(2S)\rightarrow Υ(1S)π^+π^-$ and $Υ(3S)\rightarrow Υ(2S)π^+π^-$ decay… ▽ More The decays of the $χ_{b1}(1P)$, $χ_{b2}(1P)$, $χ_{b1}(2P)$ and $χ_{b2}(2P)$~mesons into the~$Υ(1S)μ^+μ^-$ final state are observed with a high significance using proton-proton collision data collected with the LHCb detector and corresponding to an integrated luminosity of 9fb$^{-1}$. The newly observed decays together with the $Υ(2S)\rightarrow Υ(1S)π^+π^-$ and $Υ(3S)\rightarrow Υ(2S)π^+π^-$ decay modes are used for precision measurements of the mass and mass splittings for the hidden-beauty states. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-025.html

Report number: LHCb-PAPER-2024-025,CERN-EP-2024-207

arXiv:2408.05058 [pdf, other]

Variational Bayesian Phylogenetic Inference with Semi-implicit Branch Length Distributions

Authors: Tianyu Xie, Frederick A. Matsen IV, Marc A. Suchard, Cheng Zhang

Abstract: Reconstructing the evolutionary history relating a collection of molecular sequences is the main subject of modern Bayesian phylogenetic inference. However, the commonly used Markov chain Monte Carlo methods can be inefficient due to the complicated space of phylogenetic trees, especially when the number of sequences is large. An alternative approach is variational Bayesian phylogenetic inference… ▽ More Reconstructing the evolutionary history relating a collection of molecular sequences is the main subject of modern Bayesian phylogenetic inference. However, the commonly used Markov chain Monte Carlo methods can be inefficient due to the complicated space of phylogenetic trees, especially when the number of sequences is large. An alternative approach is variational Bayesian phylogenetic inference (VBPI) which transforms the inference problem into an optimization problem. While effective, the default diagonal lognormal approximation for the branch lengths of the tree used in VBPI is often insufficient to capture the complexity of the exact posterior. In this work, we propose a more flexible family of branch length variational posteriors based on semi-implicit hierarchical distributions using graph neural networks. We show that this semi-implicit construction emits straightforward permutation equivariant distributions, and therefore can handle the non-Euclidean branch length space across different tree topologies with ease. To deal with the intractable marginal probability of semi-implicit variational distributions, we develop several alternative lower bounds for stochastic optimization. We demonstrate the effectiveness of our proposed method over baseline methods on benchmark data examples, in terms of both marginal likelihood estimation and branch length posterior approximation. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 26 pages, 7 figures

arXiv:2408.04967 [pdf, other]

ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

Authors: Jiangyan Yi, Chu Yuan Zhang, Jianhua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou

Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manip… ▽ More The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manipulated intervals in partially fake audio and determining the source responsible for generating any fake audio, both with real-life implications, notably in audio forensics, law enforcement, and construction of reliable and trustworthy evidence. To further foster research in this area, in this article, we describe the dataset that was used in the fake game, manipulation region location and deepfake algorithm recognition tracks of the challenge. We also focus on the analysis of the technical methodologies by the top-performing participants in each task and note the commonalities and differences in their approaches. Finally, we discuss the current technical limitations as identified through the technical analysis, and provide a roadmap for future research directions. The dataset is available for download. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2408.04889 [pdf, other]

Deep joint source-channel coding for wireless point cloud transmission

Authors: Cixiao Zhang, Mufan Liu, Wenjie Huang, Yin Xu, Yiling Xu, Dazhi He

Abstract: The growing demand for high-quality point cloud transmission over wireless networks presents significant challenges, primarily due to the large data sizes and the need for efficient encoding techniques. In response to these challenges, we introduce a novel system named Deep Point Cloud Semantic Transmission (PCST), designed for end-to-end wireless point cloud transmission. Our approach employs a p… ▽ More The growing demand for high-quality point cloud transmission over wireless networks presents significant challenges, primarily due to the large data sizes and the need for efficient encoding techniques. In response to these challenges, we introduce a novel system named Deep Point Cloud Semantic Transmission (PCST), designed for end-to-end wireless point cloud transmission. Our approach employs a progressive resampling framework using sparse convolution to project point cloud data into a semantic latent space. These semantic features are subsequently encoded through a deep joint source-channel (JSCC) encoder, generating the channel-input sequence. To enhance transmission efficiency, we use an adaptive entropy-based approach to assess the importance of each semantic feature, allowing transmission lengths to vary according to their predicted entropy. PCST is robust across diverse Signal-to-Noise Ratio (SNR) levels and supports an adjustable rate-distortion (RD) trade-off, ensuring flexible and efficient transmission. Experimental results indicate that PCST significantly outperforms traditional separate source-channel coding (SSCC) schemes, delivering superior reconstruction quality while achieving over a 50% reduction in bandwidth usage. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.04708 [pdf, other]

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

Authors: Jiawei Huang, Chen Zhang, Yi Ren, Ziyue Jiang, Zhenhui Ye, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao

Abstract: Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monolingual and cross-lingual scenarios) has yet to be extensively studied. It faces two main challenges: 1) the considerable variability in prosody and art… ▽ More Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monolingual and cross-lingual scenarios) has yet to be extensively studied. It faces two main challenges: 1) the considerable variability in prosody and articulation habits across languages; and 2) the rarity of paired multi-lingual datasets from the same speaker. In this paper, we propose MulliVC, a novel voice conversion system that only converts timbre and keeps original content and source language prosody without multi-lingual paired data. Specifically, each training step of MulliVC contains three substeps: In step one the model is trained with monolingual speech data; then, steps two and three take inspiration from back translation, construct a cyclical process to disentangle the timbre and other information (content, prosody, and other language-related information) in the absence of multi-lingual data from the same speaker. Both objective and subjective results indicate that MulliVC significantly surpasses other methods in both monolingual and cross-lingual contexts, demonstrating the system's efficacy and the viability of the three-step approach with cycle consistency. Audio samples can be found on our demo page (mullivc.github.io). △ Less