Search | arXiv e-print repository

Two-neutrino double electron capture of $^{124}$Xe in the first LUX-ZEPLIN exposure

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, J. W. Bargemann, E. E. Barillier, K. Beattie, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer, C. A. J. Brew , et al. (180 additional authors not shown)

Abstract: The broad physics reach of the LUX-ZEPLIN (LZ) experiment covers rare phenomena beyond the direct detection of dark matter. We report precise measurements of the extremely rare decay of $^{124}$Xe through the process of two-neutrino double electron capture (2$ν$2EC), utilizing a $1.39\,\mathrm{kg} \times \mathrm{yr}$ isotopic exposure from the first LZ science run. A half-life of… ▽ More The broad physics reach of the LUX-ZEPLIN (LZ) experiment covers rare phenomena beyond the direct detection of dark matter. We report precise measurements of the extremely rare decay of $^{124}$Xe through the process of two-neutrino double electron capture (2$ν$2EC), utilizing a $1.39\,\mathrm{kg} \times \mathrm{yr}$ isotopic exposure from the first LZ science run. A half-life of $T_{1/2}^{2\nu2\mathrm{EC}} = (1.09 \pm 0.14_{\text{stat}} \pm 0.05_{\text{sys}}) \times 10^{22}\,\mathrm{yr}$ is observed with a statistical significance of $8.3\,σ$, in agreement with literature. First empirical measurements of the KK capture fraction relative to other K-shell modes were conducted, and demonstrate consistency with respect to recent signal models at the $1.4\,σ$ level. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 15 pages, 3 figures

arXiv:2408.17224 [pdf, other]

Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He

Authors: F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De Benedittis, I. De Mitri, F. de Palma, A. Di Giovanni, Q. Ding, T. K. Dong , et al. (126 additional authors not shown)

Abstract: Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp… ▽ More Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 17 pages, submitted to PRD

arXiv:2408.17071 [pdf, other]

Search for $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0h_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (653 additional authors not shown)

Abstract: Using $(2712.4 \pm 14.3) \times 10^6~ψ$(3686) events collected with the BESIII detector operating at the BEPCII collider, we search for the hadronic transition $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0 h_c$. No significant signal is observed. We set the most stringent upper limits to date on the branching fractions $\mathcal{B}(ψ(3686)\to π^0 h_c)\times\mathcal{B}(h_c\toπ^+π^-J/ψ)$ and… ▽ More Using $(2712.4 \pm 14.3) \times 10^6~ψ$(3686) events collected with the BESIII detector operating at the BEPCII collider, we search for the hadronic transition $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0 h_c$. No significant signal is observed. We set the most stringent upper limits to date on the branching fractions $\mathcal{B}(ψ(3686)\to π^0 h_c)\times\mathcal{B}(h_c\toπ^+π^-J/ψ)$ and $\mathcal{B}(h_c \to π^+π^-J/ψ)$ at the 90$\%$ confidence level, which are determined to be $6.7\times 10^{-7}$ and $9.4 \times10^{-4}$, respectively. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.16867 [pdf, other]

CalTag: Robust calibration of mmWave Radar and LiDAR using backscatter tags

Authors: Junyi Xu, Kshitiz Bansal, Dinesh Bharadia

Abstract: The rise of automation in robotics necessitates the use of high-quality perception systems, often through the use of multiple sensors. A crucial aspect of a successfully deployed multi-sensor systems is the calibration with a known object typically named fiducial. In this work, we propose a novel fiducial system for millimeter wave radars, termed as \name. \name addresses the limitations of tradit… ▽ More The rise of automation in robotics necessitates the use of high-quality perception systems, often through the use of multiple sensors. A crucial aspect of a successfully deployed multi-sensor systems is the calibration with a known object typically named fiducial. In this work, we propose a novel fiducial system for millimeter wave radars, termed as \name. \name addresses the limitations of traditional corner reflector-based calibration methods in extremely cluttered environments. \name leverages millimeter wave backscatter technology to achieve more reliable calibration than corner reflectors, enhancing the overall performance of multi-sensor perception systems. We compare the performance in several real-world environments and show the improvement achieved by using \name as the radar fiducial over a corner reflector. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16654 [pdf, other]

Measurement of the Decay $Ξ^{0}\toΛγ$ with Entangled $Ξ^{0}\barΞ^{0}$ Pairs

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: In this Letter, a systematic study of the weak radiative hyperon decay $Ξ^{0}\toΛγ$ at an electron-positron collider using entangled $Ξ^{0}\barΞ^{0}$ pair events is presented. The absolute branching fraction for this decay has been measured for the first time, and is $\left(1.347 \pm 0.066_{\mathrm stat.}\pm0.054_{\mathrm syst.}\right)\times 10^{-3}$. The decay asymmetry parameter, which character… ▽ More In this Letter, a systematic study of the weak radiative hyperon decay $Ξ^{0}\toΛγ$ at an electron-positron collider using entangled $Ξ^{0}\barΞ^{0}$ pair events is presented. The absolute branching fraction for this decay has been measured for the first time, and is $\left(1.347 \pm 0.066_{\mathrm stat.}\pm0.054_{\mathrm syst.}\right)\times 10^{-3}$. The decay asymmetry parameter, which characterizes the effect of parity violation in the decay, is determined to be $-0.741 \pm 0.062_{\mathrm stat.}\pm 0.019_{\mathrm syst.}$. The obtained results are consistent with the world average values within the uncertainties, offering valuable insights into the underlying mechanism governing the weak radiative hyperon decays. The charge conjugation parity ($CP$) symmetries of branching fraction and decay asymmetry parameter in the decay are also studied. No statistically significant violation of charge conjugation parity symmetry is observed. △ Less

Submitted 29 August, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: 10 pages, 3 figures

arXiv:2408.16646 [pdf, other]

Study of the rare decay $J/ψ\to μ^+μ^-μ^+μ^-$

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1096 additional authors not shown)

Abstract: The rare electromagnetic $J/ψ\to μ^+μ^-μ^+μ^-$ decay is observed with a significance greatly exceeding the discovery threshold, using proton-proton collision data collected by the LHCb experiment during 2016-2018 at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of $5.4\,\text{fb}^{-1}$. The rate of this decay is measured relative to that of the $J/ψ\to μ^+μ^-$ mode.… ▽ More The rare electromagnetic $J/ψ\to μ^+μ^-μ^+μ^-$ decay is observed with a significance greatly exceeding the discovery threshold, using proton-proton collision data collected by the LHCb experiment during 2016-2018 at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of $5.4\,\text{fb}^{-1}$. The rate of this decay is measured relative to that of the $J/ψ\to μ^+μ^-$ mode. Using the QED model for the four-muon decay in the efficiency estimation, its branching fraction is determined to be \begin{equation*} {\mathcal{B}}(J/ψ\to μ^+μ^-μ^+μ^-) = (1.13\pm0.10\pm0.05\pm0.01)\times 10^{-6}, \end{equation*} where the uncertainties are statistical, systematic and due to the uncertainty on the branching fraction of the $J/ψ\to μ^+μ^-$ decay. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3453 (LHCb public pages)

Report number: LHCb-PAPER-2024-016, CERN-EP-2024-201

arXiv:2408.16515 [pdf, other]

doi 10.1145/3658644.3690269

CanCal: Towards Real-time and Lightweight Ransomware Detection and Response in Industrial Environments

Authors: Shenao Wang, Feng Dong, Hangfeng Yang, Jingheng Xu, Haoyu Wang

Abstract: Ransomware attacks have emerged as one of the most significant cybersecurity threats. Despite numerous proposed detection and defense methods, existing approaches face two fundamental limitations in large-scale industrial applications: intolerable system overheads and notorious alert fatigue. To address these challenges, we propose CanCal, a real-time and lightweight ransomware detection system. S… ▽ More Ransomware attacks have emerged as one of the most significant cybersecurity threats. Despite numerous proposed detection and defense methods, existing approaches face two fundamental limitations in large-scale industrial applications: intolerable system overheads and notorious alert fatigue. To address these challenges, we propose CanCal, a real-time and lightweight ransomware detection system. Specifically, CanCal selectively filters suspicious processes by the monitoring layers and then performs in-depth behavioral analysis to isolate ransomware activities from benign operations, minimizing alert fatigue while ensuring lightweight computational and storage overhead. The experimental results on a large-scale industrial environment~(1,761 ransomware, ~3 million events, continuous test over 5 months) indicate that CanCal is as effective as state-of-the-art techniques while enabling rapid inference within 30ms and real-time response within a maximum of 3 seconds. CanCal dramatically reduces average CPU utilization by 91.04% (from 6.7% to 0.6%) and peak CPU utilization by 76.69% (from 26.6% to 6.2%), while avoiding 76.50% (from 3,192 to 750) of the inspection efforts from security analysts. By the time of this writing, CanCal has been integrated into a commercial product and successfully deployed on 3.32 million endpoints for over a year. From March 2023 to April 2024, CanCal successfully detected and thwarted 61 ransomware attacks, demonstrating the effectiveness of CanCal in combating sophisticated ransomware threats in real-world scenarios. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: To appear in the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS'24), October 14--18, 2024, Salt Lake City

arXiv:2408.16451 [pdf, other]

Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition

Authors: Yongcun Zhang, Jiajun Xu, Yina He, Shaozi Li, Zhiming Luo, Huangwei Lei

Abstract: Tongue diagnosis in Traditional Chinese Medicine (TCM) is a crucial diagnostic method that can reflect an individual's health status. Traditional methods for identifying tooth-marked tongues are subjective and inconsistent because they rely on practitioner experience. We propose a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue… ▽ More Tongue diagnosis in Traditional Chinese Medicine (TCM) is a crucial diagnostic method that can reflect an individual's health status. Traditional methods for identifying tooth-marked tongues are subjective and inconsistent because they rely on practitioner experience. We propose a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue extraction and tooth-marked tongue recognition. Our approach first accurately detects and extracts the tongue region from clinical images, removing any irrelevant background information. Then, we implement an end-to-end weakly supervised object detection method. We utilize Vision Transformer (ViT) to process tongue images in patches and employ multiple instance loss to identify tooth-marked regions with only image-level annotations. WSVM achieves high accuracy in tooth-marked tongue classification, and visualization experiments demonstrate its effectiveness in pinpointing these regions. This automated approach enhances the objectivity and accuracy of tooth-marked tongue diagnosis. It provides significant clinical value by assisting TCM practitioners in making precise diagnoses and treatment recommendations. Code is available at https://github.com/yc-zh/WSVM. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16279 [pdf, ps, other]

Model-independent determination of the strong-phase difference between $D^0$ and $\bar{D}^0 \to π^+π^-π^+π^-$ decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (647 additional authors not shown)

Abstract: Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a… ▽ More Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a superposition of flavor eigenstates. The reported results are valuable for measurements of the $C\!P$-violating phase $γ$ (also denoted $φ_3$) in $B^\pm \to DK^\pm$, $D \to π^+π^-π^+π^-$ decays, and the binning schemes are designed to provide good statistical sensitivity to this parameter. The expected uncertainty on $γ$ arising from the precision of the strong-phase measurements, when applied to very large samples of $B$-meson decays, is around $1.5^\circ$ or $2^\circ$, depending on the binning scheme. The binned strong-phase parameters are combined to give a value of $F_+^{4π} = 0.746 \pm 0.010 \pm 0.004$ for the $C\!P$-even fraction of $D^0 \to π^+π^-π^+π^-$ decays, which is around 30\% more precise than the previous best measurement of this quantity. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.15966 [pdf, other]

More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Jinfeng Xu, Yixue Hao, Long Hu, Min Chen

Abstract: Enabling Large Language Models (LLMs) to comprehend the 3D physical world remains a significant challenge. Due to the lack of large-scale 3D-text pair datasets, the success of LLMs has yet to be replicated in 3D understanding. In this paper, we rethink this issue and propose a new task: 3D Data-Efficient Point-Language Understanding. The goal is to enable LLMs to achieve robust 3D object understan… ▽ More Enabling Large Language Models (LLMs) to comprehend the 3D physical world remains a significant challenge. Due to the lack of large-scale 3D-text pair datasets, the success of LLMs has yet to be replicated in 3D understanding. In this paper, we rethink this issue and propose a new task: 3D Data-Efficient Point-Language Understanding. The goal is to enable LLMs to achieve robust 3D object understanding with minimal 3D point cloud and text data pairs. To address this task, we introduce GreenPLM, which leverages more text data to compensate for the lack of 3D data. First, inspired by using CLIP to align images and text, we utilize a pre-trained point cloud-text encoder to map the 3D point cloud space to the text space. This mapping leaves us to seamlessly connect the text space with LLMs. Once the point-text-LLM connection is established, we further enhance text-LLM alignment by expanding the intermediate text space, thereby reducing the reliance on 3D point cloud data. Specifically, we generate 6M free-text descriptions of 3D objects, and design a three-stage training strategy to help LLMs better explore the intrinsic connections between different modalities. To achieve efficient modality alignment, we design a zero-parameter cross-attention module for token pooling. Extensive experimental results show that GreenPLM requires only 12% of the 3D training data used by existing state-of-the-art models to achieve superior 3D understanding. Remarkably, GreenPLM also achieves competitive performance using text-only data. The code and weights are available at: https://github.com/TangYuan96/GreenPLM. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15599 [pdf, ps, other]

Precompact Sets in Matrix Weighted Lebesgue Spaces with Variable Exponent

Authors: Shengrong Wang, Pengfei Guo, Jingshi Xu

Abstract: In this paper, we first give a sufficiently condition for precompactness in the matrix-weighted Lebesgue spaces with variable exponent by translation operator. Then we obtain a criterion for precompactness in the matrix-weighted Lebesgue space with variable exponent by average operator. Next, we give a criterion for precompactness in the matrix-weighted Lebesgue space with variable exponent by app… ▽ More In this paper, we first give a sufficiently condition for precompactness in the matrix-weighted Lebesgue spaces with variable exponent by translation operator. Then we obtain a criterion for precompactness in the matrix-weighted Lebesgue space with variable exponent by average operator. Next, we give a criterion for precompactness in the matrix-weighted Lebesgue space with variable exponent by approximate identity. Finally, precompactness in the matrix-weighted Sobolev space with variable exponent is also considered. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15488 [pdf, other]

Legilimens: Practical and Unified Content Moderation for Large Language Model Services

Authors: Jialin Wu, Jiangyi Deng, Shengyuan Pang, Yanjiao Chen, Jiayang Xu, Xinfeng Li, Wenyuan Xu

Abstract: Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we… ▽ More Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we reveal for the first time that effective and efficient content moderation can be achieved by extracting conceptual features from chat-oriented LLMs, despite their initial fine-tuning for conversation rather than content moderation. We propose a practical and unified content moderation framework for LLM services, named Legilimens, which features both effectiveness and efficiency. Our red-team model-based data augmentation enhances the robustness of Legilimens against state-of-the-art jailbreaking. Additionally, we develop a framework to theoretically analyze the cost-effectiveness of Legilimens compared to other methods. We have conducted extensive experiments on five host LLMs, seventeen datasets, and nine jailbreaking methods to verify the effectiveness, efficiency, and robustness of Legilimens against normal and adaptive adversaries. A comparison of Legilimens with both commercial and academic baselines demonstrates the superior performance of Legilimens. Furthermore, we confirm that Legilimens can be applied to few-shot scenarios and extended to multi-label classification tasks. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

arXiv:2408.15276 [pdf, other]

A Survey of Deep Learning for Group-level Emotion Recognition

Authors: Xiaohua Huang, Jinke Xu, Wenming Zheng, Qirong Mao, Abhinav Dhall

Abstract: With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. U… ▽ More With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. Unlike individual's emotion, group emotions exhibit diversity and dynamics. Presently, several DL approaches have been proposed to effectively leverage the rich information inherent in group-level image and enhance GER performance significantly. In this survey, we present a comprehensive review of DL techniques applied to GER, proposing a new taxonomy for the field cover all aspects of GER based on DL. The survey overviews datasets, the deep GER pipeline, and performance comparisons of the state-of-the-art methods past decade. Moreover, it summarizes and discuss the fundamental approaches and advanced developments for each aspect. Furthermore, we identify outstanding challenges and suggest potential avenues for the design of robust GER systems. To the best of our knowledge, thus survey represents the first comprehensive review of deep GER methods, serving as a pivotal references for future GER research endeavors. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: 16 pages, 2 figures

arXiv:2408.15079 [pdf, other]

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Authors: Guosheng Dong, Da Pan, Yiding Sun, Shusen Zhang, Zheng Liang, Xin Wu, Yanjun Shen, Fan Yang, Haoze Sun, Tianpeng Li, Mingan Lin, Jianhua Xu, Yufan Zhang, Xiaonan Nie, Lei Su, Bingning Wang, Wentao Zhang, Jiaxin Mao, Zenan Zhou, Weipeng Chen

Abstract: The general capabilities of Large Language Models (LLM) highly rely on the composition and selection on extensive pretraining datasets, treated as commercial secrets by several institutions. To mitigate this issue, we open-source the details of a universally applicable data processing pipeline and validate its effectiveness and potential by introducing a competitive LLM baseline. Specifically, the… ▽ More The general capabilities of Large Language Models (LLM) highly rely on the composition and selection on extensive pretraining datasets, treated as commercial secrets by several institutions. To mitigate this issue, we open-source the details of a universally applicable data processing pipeline and validate its effectiveness and potential by introducing a competitive LLM baseline. Specifically, the data processing pipeline consists of broad collection to scale up and reweighting to improve quality. We then pretrain a 7B model BaichuanSEED with 3T tokens processed by our pipeline without any deliberate downstream task-related optimization, followed by an easy but effective supervised fine-tuning stage. BaichuanSEED demonstrates consistency and predictability throughout training and achieves comparable performance on comprehensive benchmarks with several commercial advanced large language models, such as Qwen1.5 and Llama3. We also conduct several heuristic experiments to discuss the potential for further optimization of downstream tasks, such as mathematics and coding. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 19 pages, 6 figures

arXiv:2408.14625 [pdf, other]

A Bayesian approach for fitting semi-Markov mixture models of cancer latency to individual-level data

Authors: Raphael Morsomme, Shannon Holloway, Marc Ryser, Jason Xu

Abstract: Multi-state models of cancer natural history are widely used for designing and evaluating cancer early detection strategies. Calibrating such models against longitudinal data from screened cohorts is challenging, especially when fitting non-Markovian mixture models against individual-level data. Here, we consider a family of semi-Markov mixture models of cancer natural history introduce an efficie… ▽ More Multi-state models of cancer natural history are widely used for designing and evaluating cancer early detection strategies. Calibrating such models against longitudinal data from screened cohorts is challenging, especially when fitting non-Markovian mixture models against individual-level data. Here, we consider a family of semi-Markov mixture models of cancer natural history introduce an efficient data-augmented Markov chain Monte Carlo sampling algorithm for fitting these models to individual-level screening and cancer diagnosis histories. Our fully Bayesian approach supports rigorous uncertainty quantification and model selection through leave-one-out cross-validation, and it enables the estimation of screening-related overdiagnosis rates. We demonstrate the effectiveness of our approach using synthetic data, showing that the sampling algorithm efficiently explores the joint posterior distribution of model parameters and latent variables. Finally, we apply our method to data from the US Breast Cancer Surveillance Consortium and estimate the extent of breast cancer overdiagnosis associated with mammography screening. The sampler and model comparison method are available in the R package baclava. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Submitted for review

arXiv:2408.14199 [pdf, other]

doi 10.13140/RG.2.2.16176.74248/1

A Survey on Small-Scale Testbeds for Connected and Automated Vehicles and Robot Swarms

Authors: Armin Mokhtarian, Jianye Xu, Patrick Scheffe, Maximilian Kloock, Simon Schäfer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, Johannes Betz, Sean Wilson, Spring Berman, Liam Paull, Amanda Prorok, Bassam Alrifaee

Abstract: Connected and automated vehicles and robot swarms hold transformative potential for enhancing safety, efficiency, and sustainability in the transportation and manufacturing sectors. Extensive testing and validation of these technologies is crucial for their deployment in the real world. While simulations are essential for initial testing, they often have limitations in capturing the complex dynami… ▽ More Connected and automated vehicles and robot swarms hold transformative potential for enhancing safety, efficiency, and sustainability in the transportation and manufacturing sectors. Extensive testing and validation of these technologies is crucial for their deployment in the real world. While simulations are essential for initial testing, they often have limitations in capturing the complex dynamics of real-world interactions. This limitation underscores the importance of small-scale testbeds. These testbeds provide a realistic, cost-effective, and controlled environment for testing and validating algorithms, acting as an essential intermediary between simulation and full-scale experiments. This work serves to facilitate researchers' efforts in identifying existing small-scale testbeds suitable for their experiments and provide insights for those who want to build their own. In addition, it delivers a comprehensive survey of the current landscape of these testbeds. We derive 62 characteristics of testbeds based on the well-known sense-plan-act paradigm and offer an online table comparing 22 small-scale testbeds based on these characteristics. The online table is hosted on our designated public webpage www.cpm-remote.de/testbeds, and we invite testbed creators and developers to contribute to it. We closely examine nine testbeds in this paper, demonstrating how the derived characteristics can be used to present testbeds. Furthermore, we discuss three ongoing challenges concerning small-scale testbeds that we identified, i.e., small-scale to full-scale transition, sustainability, and power and resource management. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 16 pages, 11 figures, 1 table. This work has been submitted to the IEEE Robotics & Automation Magazine for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2408.14156 [pdf, other]

Integrated Sensing, Communication, and Powering over Multi-antenna OFDM Systems

Authors: Yilong Chen, Chao Hu, Zixiang Ren, Han Hu, Jie Xu, Lexi Xu, Lei Liu, Shuguang Cui

Abstract: This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on t… ▽ More This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on the echo signals. To facilitate ISCAP, the BS employs the joint transmit beamforming design by sending dedicated sensing/energy beams jointly with information beams. Furthermore, we consider the beam scanning for sensing, in which the joint beams scan in different directions over time to sense potential targets. In order to ensure the sensing beam scanning performance and meet the communication and powering requirements, it is essential to properly schedule IRs and ERs and design the resource allocation over time, frequency, and space. More specifically, we optimize the joint transmit beamforming over multiple OFDM symbols and subcarriers, with the objective of minimizing the average beampattern matching error of beam scanning for sensing, subject to the constraints on the average communication rates at IRs and the average harvested power at ERs. We find converged high-quality solutions to the formulated problem by proposing efficient iterative algorithms based on advanced optimization techniques. We also develop various heuristic designs based on the principles of zero-forcing (ZF) beamforming, round-robin user scheduling, and time switching, respectively. Numerical results show that our proposed algorithms adaptively generate information and sensing/energy beams at each time-frequency slot to match the scheduled IRs/ERs with the desired scanning beam, significantly outperforming the heuristic designs. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 13 pages, 12 figures

arXiv:2408.13674 [pdf, other]

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

Authors: Keqiang Sun, Amin Jourabloo, Riddhish Bhalodia, Moustafa Meshry, Yu Rong, Zhengyu Yang, Thu Nguyen-Phuoc, Christian Haene, Jiu Xu, Sam Johnson, Hongsheng Li, Sofien Bouaziz

Abstract: Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar, which limits their scalability. Furthermore, these methods do not offer the flexibility to sample new identit… ▽ More Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar, which limits their scalability. Furthermore, these methods do not offer the flexibility to sample new identities or modify existing ones. On the other hand, by learning a strong prior from data, generative models provide a promising alternative to traditional reconstruction methods, easing the time constraints for both data capture and processing. Additionally, generative methods enable downstream applications beyond reconstruction, such as editing and stylization. Nonetheless, the research on generative 3D avatars is still in its infancy, and therefore current methods still have limitations such as creating static avatars, lacking photo-realism, having incomplete facial details, or having limited drivability. To address this, we propose a text-conditioned generative model that can generate photo-realistic facial avatars of diverse identities, with more complete details like hair, eyes and mouth interior, and which can be driven through a powerful non-parametric latent expression space. Specifically, we integrate the generative and editing capabilities of latent diffusion models with a strong prior model for avatar expression driving. Our model can generate and control high-fidelity avatars, even those out-of-distribution. We also highlight its potential for downstream applications, including avatar editing and single-shot avatar reconstruction. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.13610 [pdf, ps, other]

The Boltzmann equation in the homogeneous critical regularity framework

Authors: Jing Liu, Ling-Yun Shou, Jiang Xu

Abstract: We construct a unique global solution to the Cauchy problem of the 3D Boltzmann equation for initial data around the Maxwellian in the spatially critical homogeneous Besov space $\widetilde{L}^2_ξ(\dot{B}_{2,1}^{1/2}\cap\dot{B}_{2,1}^{3/2})$. In addition, under the condition that the low-frequency part of initial perturbation is bounded in $\widetilde{L}^2_ξ(\dot{B}_{2,\infty}^{σ_{0}})$ with… ▽ More We construct a unique global solution to the Cauchy problem of the 3D Boltzmann equation for initial data around the Maxwellian in the spatially critical homogeneous Besov space $\widetilde{L}^2_ξ(\dot{B}_{2,1}^{1/2}\cap\dot{B}_{2,1}^{3/2})$. In addition, under the condition that the low-frequency part of initial perturbation is bounded in $\widetilde{L}^2_ξ(\dot{B}_{2,\infty}^{σ_{0}})$ with $-3/2\leqσ_{0}<1/2$, it is shown that the solution converges to its equilibrium in large times with the optimal rate of $\mathcal{O}(t^{-(σ-σ_{0})/2})$ in $\widetilde{L}^2_ξ(\dot{B}_{2,1}^σ)$ with some $σ>σ_0$, and the microscopic part decays at an enhanced rate of $\mathcal{O}(t^{-(σ-σ_{0})/2-1/2})$. In contrast to [19], the usual $L^2$ estimates are not necessary in our approach, which provides a new understanding of hypocoercivity theory for the Boltzmann equation allowing to construct the Lyapunov functional with different dissipation rates at low and high frequencies. Furthermore, a time-weighted Lyapunov energy argument can be developed to deduce the optimal time-decay estimates. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 35 pages

MSC Class: 35Q20; 76N10

arXiv:2408.12162 [pdf, ps, other]

Empowering Over-the-Air Personalized Federated Learning via RIS

Authors: Wei Shi, Jiacheng Yao, Jindan Xu, Wei Xu, Lexi Xu, Chunming Zhao

Abstract: Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. However, AirComp-enabled FL (AirFL) with a single global consensus model fails to address the data heterogeneity in real-life FL scenarios with non-independent and identically distributed l… ▽ More Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. However, AirComp-enabled FL (AirFL) with a single global consensus model fails to address the data heterogeneity in real-life FL scenarios with non-independent and identically distributed local datasets. In this paper, we introduce reconfigurable intelligent surface (RIS) technology to enable efficient personalized AirFL, mitigating the data heterogeneity issue. First, we achieve statistical interference elimination across different clusters in the personalized AirFL framework via RIS phase shift configuration. Then, we propose two personalized aggregation schemes involving power control and denoising factor design from the perspectives of first- and second-order moments, respectively, to enhance the FL convergence. Numerical results validate the superior performance of our proposed schemes over existing baselines. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: Accepted by SCIENCE CHINA Information Sciences

arXiv:2408.12054 [pdf, other]

Spin relaxation in graphite due to spin-orbital-phonon interaction from first-principles density-matrix approach

Authors: Junqing Xu

Abstract: We predict "intrinsic" spin relaxation times ($T_{1}$) of graphite due to spin-orbit-phonon interaction, i.e., the combination of spin-orbit coupling and electron-phonon interaction, using our developed first-principles density-matrix approach. We obtain ultralong $T_{1}$, e.g., $\sim$600 ns at 300 K, which leads to ultralong in-plane spin diffusion length $\sim$110 $μ$m within the drift-diffusion… ▽ More We predict "intrinsic" spin relaxation times ($T_{1}$) of graphite due to spin-orbit-phonon interaction, i.e., the combination of spin-orbit coupling and electron-phonon interaction, using our developed first-principles density-matrix approach. We obtain ultralong $T_{1}$, e.g., $\sim$600 ns at 300 K, which leads to ultralong in-plane spin diffusion length $\sim$110 $μ$m within the drift-diffusion model. Our prediction sets the upper bound of $T_{1}$ of graphite at each given temperature and Fermi level. The anisotropy ratios of $T_{1}$ or values of $T_{1z}/T_{1x}$ are found small and around 0.6. We examine the applicability of the well-known Elliot-Yafet (EY) relation, which declares that spin relaxation rate $T_{1α}^{-1}$ ($α=x,y,z$) is proportional to the product of the ensemble average of spin mixing parameter $\left\langle b_α^{2}\right\rangle $ and carrier relaxation rate $τ_{p}^{-1}$. Our numerical tests suggest that the EY relation works qualitatively if the degeneracy threshold $t^{\mathrm{deg}}$ for evaluating $b_α^{2}$ is elatively large (not much smaller than or comparable to $k_{B}T$), e.g., $10^{-3}$ eV or larger, but fails if $t^{\mathrm{deg}}$ is too tiny (much smaller than $k_{B}T$), e.g., $10^{-6}$ eV or smaller. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 9 pages, 4 figures

arXiv:2408.11982 [pdf, other]

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at https://challenges.videoprocessing.ai/challenges/compressedvideo-quality-assessment.html. △ Less

Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.10996 [pdf, ps, other]

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

Authors: Tong Mao, Jonathan W. Siegel, Jinchao Xu

Abstract: Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a v… ▽ More Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$. △ Less

Submitted 20 August, 2024; originally announced August 2024.

MSC Class: 62M45; 41A25; 41A30

arXiv:2408.10581 [pdf, other]

Multi-view Hand Reconstruction with a Point-Embedded Transformer

Authors: Lixin Yang, Licheng Zhong, Pengxiang Zhu, Xinyu Zhan, Junxiao Kong, Jian Xu, Cewu Lu

Abstract: This work introduces a novel and generalizable multi-view Hand Mesh Reconstruction (HMR) model, named POEM, designed for practical use in real-world hand motion capture scenarios. The advances of the POEM model consist of two main aspects. First, concerning the modeling of the problem, we propose embedding a static basis point within the multi-view stereo space. A point represents a natural form o… ▽ More This work introduces a novel and generalizable multi-view Hand Mesh Reconstruction (HMR) model, named POEM, designed for practical use in real-world hand motion capture scenarios. The advances of the POEM model consist of two main aspects. First, concerning the modeling of the problem, we propose embedding a static basis point within the multi-view stereo space. A point represents a natural form of 3D information and serves as an ideal medium for fusing features across different views, given its varied projections across these views. Consequently, our method harnesses a simple yet effective idea: a complex 3D hand mesh can be represented by a set of 3D basis points that 1) are embedded in the multi-view stereo, 2) carry features from the multi-view images, and 3) encompass the hand in it. The second advance lies in the training strategy. We utilize a combination of five large-scale multi-view datasets and employ randomization in the number, order, and poses of the cameras. By processing such a vast amount of data and a diverse array of camera configurations, our model demonstrates notable generalizability in the real-world applications. As a result, POEM presents a highly practical, plug-and-play solution that enables user-friendly, cost-effective multi-view motion capture for both left and right hands. The model and source codes are available at https://github.com/JubSteven/POEM-v2. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: Generalizable multi-view Hand Mesh Reconstruction (HMR) model. Extension of the original work at CVPR2023

arXiv:2408.10531 [pdf, other]

Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception

Authors: Jiaru Zhong, Haibao Yu, Tianyi Zhu, Jiahui Xu, Wenxian Yang, Zaiqing Nie, Chao Sun

Abstract: Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. Howeve… ▽ More Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. However, cooperative perception still faces numerous challenges, including limited communication bandwidth and practical communication interruptions. In this paper, we propose CTCE, a novel framework for cooperative 3D object detection. This framework transmits queries with temporal contexts enhancement, effectively balancing transmission efficiency and performance to accommodate real-world communication conditions. Additionally, we propose a temporal-guided fusion module to further improve performance. The roadside temporal enhancement and vehicle-side spatial-temporal fusion together constitute a multi-level temporal contexts integration mechanism, fully leveraging temporal information to enhance performance. Furthermore, a motion-aware reconstruction module is introduced to recover lost roadside queries due to communication interruptions. Experimental results on V2X-Seq and V2X-Sim datasets demonstrate that CTCE outperforms the baseline QUEST, achieving improvements of 3.8% and 1.3% in mAP, respectively. Experiments under communication interruption conditions validate CTCE's robustness to communication interruptions. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: Accepted by IEEE ITSC 2024

arXiv:2408.10230 [pdf, other]

A General-Purpose Device for Interaction with LLMs

Authors: Jiajun Xu, Qun Wang, Yuhang Cao, Baitao Zeng, Sicheng Liu

Abstract: This paper investigates integrating large language models (LLMs) with advanced hardware, focusing on developing a general-purpose device designed for enhanced interaction with LLMs. Initially, we analyze the current landscape, where virtual assistants and LLMs are reshaping human-technology interactions, highlighting pivotal advancements and setting the stage for a new era of intelligent hardware.… ▽ More This paper investigates integrating large language models (LLMs) with advanced hardware, focusing on developing a general-purpose device designed for enhanced interaction with LLMs. Initially, we analyze the current landscape, where virtual assistants and LLMs are reshaping human-technology interactions, highlighting pivotal advancements and setting the stage for a new era of intelligent hardware. Despite substantial progress in LLM technology, a significant gap exists in hardware development, particularly concerning scalability, efficiency, affordability, and multimodal capabilities. This disparity presents both challenges and opportunities, underscoring the need for hardware that is not only powerful but also versatile and capable of managing the sophisticated demands of modern computation. Our proposed device addresses these needs by emphasizing scalability, multimodal data processing, enhanced user interaction, and privacy considerations, offering a comprehensive platform for LLM integration in various applications. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2408.09787 [pdf, other]

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Authors: Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang

Abstract: Traditional animation generation methods depend on training generative models with human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial human effort and incurs high training costs. Due to limited prompting plans, these methods typically produce brief, information-poor, and context-incoherent animations. To overcome these limitations and automate the animatio… ▽ More Traditional animation generation methods depend on training generative models with human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial human effort and incurs high training costs. Due to limited prompting plans, these methods typically produce brief, information-poor, and context-incoherent animations. To overcome these limitations and automate the animation process, we pioneer the introduction of large multimodal models (LMMs) as the core processor to build an autonomous animation-making agent, named Anim-Director. This agent mainly harnesses the advanced understanding and reasoning capabilities of LMMs and generative AI tools to create animated videos from concise narratives or simple instructions. Specifically, it operates in three main stages: Firstly, the Anim-Director generates a coherent storyline from user inputs, followed by a detailed director's script that encompasses settings of character profiles and interior/exterior descriptions, and context-coherent scene descriptions that include appearing characters, interiors or exteriors, and scene events. Secondly, we employ LMMs with the image generation tool to produce visual images of settings and scenes. These images are designed to maintain visual consistency across different scenes using a visual-language prompting method that combines scene descriptions and images of the appearing character and setting. Thirdly, scene images serve as the foundation for producing animated videos, with LMMs generating prompts to guide this process. The whole process is notably autonomous without manual intervention, as the LMMs interact seamlessly with generative tools to generate prompts, evaluate visual quality, and select the best one to optimize the final output. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: Accepted by SIGGRAPH Asia 2024, Project and Codes: https://github.com/HITsz-TMG/Anim-Director

arXiv:2408.09748 [pdf, other]

doi 10.1145/3637528.3671734

Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method

Authors: Chen Yang, Sunhao Dai, Yupeng Hou, Wayne Xin Zhao, Jun Xu, Yang Song, Hengshu Zhu

Abstract: Reciprocal recommender systems~(RRS), conducting bilateral recommendations between two involved parties, have gained increasing attention for enhancing matching efficiency. However, the majority of existing methods in the literature still reuse conventional ranking metrics to separately assess the performance on each side of the recommendation process. These methods overlook the fact that the rank… ▽ More Reciprocal recommender systems~(RRS), conducting bilateral recommendations between two involved parties, have gained increasing attention for enhancing matching efficiency. However, the majority of existing methods in the literature still reuse conventional ranking metrics to separately assess the performance on each side of the recommendation process. These methods overlook the fact that the ranking outcomes of both sides collectively influence the effectiveness of the RRS, neglecting the necessity of a more holistic evaluation and a capable systemic solution. In this paper, we systemically revisit the task of reciprocal recommendation, by introducing the new metrics, formulation, and method. Firstly, we propose five new evaluation metrics that comprehensively and accurately assess the performance of RRS from three distinct perspectives: overall coverage, bilateral stability, and balanced ranking. These metrics provide a more holistic understanding of the system's effectiveness and enable a comprehensive evaluation. Furthermore, we formulate the RRS from a causal perspective, formulating recommendations as bilateral interventions, which can better model the decoupled effects of potential influencing factors. By utilizing the potential outcome framework, we further develop a model-agnostic causal reciprocal recommendation method that considers the causal effects of recommendations. Additionally, we introduce a reranking strategy to maximize matching outcomes, as measured by the proposed metrics. Extensive experiments on two real-world datasets from recruitment and dating scenarios demonstrate the effectiveness of our proposed metrics and approach. The code and dataset are available at: https://github.com/RUCAIBox/CRRS. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: KDD 2024

arXiv:2408.09539 [pdf, other]

Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets

Authors: Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Li Shen, Puning Zhao, Jie Xu, Han Hu

Abstract: In practical federated learning (FL) systems, the presence of malicious Byzantine attacks and data heterogeneity often introduces biases into the learning process. However, existing Byzantine-robust methods typically only achieve a compromise between adaptability to different loss function types (including both strongly convex and non-convex) and robustness to heterogeneous datasets, but with non-… ▽ More In practical federated learning (FL) systems, the presence of malicious Byzantine attacks and data heterogeneity often introduces biases into the learning process. However, existing Byzantine-robust methods typically only achieve a compromise between adaptability to different loss function types (including both strongly convex and non-convex) and robustness to heterogeneous datasets, but with non-zero optimality gap. Moreover, this compromise often comes at the cost of high computational complexity for aggregation, which significantly slows down the training speed. To address this challenge, we propose a federated learning approach called Federated Normalized Gradients Algorithm (Fed-NGA). Fed-NGA simply normalizes the uploaded local gradients to be unit vectors before aggregation, achieving a time complexity of $\mathcal{O}(pM)$, where $p$ represents the dimension of model parameters and $M$ is the number of participating clients. This complexity scale achieves the best level among all the existing Byzantine-robust methods. Furthermore, through rigorous proof, we demonstrate that Fed-NGA transcends the trade-off between adaptability to loss function type and data heterogeneity and the limitation of non-zero optimality gap in existing literature. Specifically, Fed-NGA can adapt to both non-convex loss functions and non-IID datasets simultaneously, with zero optimality gap at a rate of $\mathcal{O} (1/T^{\frac{1}{2} - δ})$, where T is the iteration number and $δ\in (0,\frac{1}{2})$. In cases where the loss function is strongly convex, the zero optimality gap achieving rate can be improved to be linear. Experimental results provide evidence of the superiority of our proposed Fed-NGA on time complexity and convergence performance over baseline methods. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.09502 [pdf, other]

doi 10.61977/ati2024023

The Electrical Design of a Membrane Antenna for Lunar-based Low-frequency Radio Telescope

Authors: Suonanben, Fengquan Wu, Kai He, Shijie Sun, Wei Zhou, Minquan Zhou, Cong Zhang, Jiaqin Xu, Qisen Yan, Shenzhe Xu, Jiacong Zhu, Zhao Wang, Ke Zhang, Haitao Miao, Jixia Li, Yougang Wang, Tianlu Chen, Xuelei Chen

Abstract: Detecting primordial fluctuations from the cosmic dark ages requires extremely large low-frequency radio telescope arrays deployed on the far side of the Moon. The antenna of such an array must be lightweight, easily storable and transportable, deployable on a large scale, durable, and capable of good electrical performance. A membrane antenna is an excellent candidate to meet these criteria. We s… ▽ More Detecting primordial fluctuations from the cosmic dark ages requires extremely large low-frequency radio telescope arrays deployed on the far side of the Moon. The antenna of such an array must be lightweight, easily storable and transportable, deployable on a large scale, durable, and capable of good electrical performance. A membrane antenna is an excellent candidate to meet these criteria. We study the design of a low-frequency membrane antenna for a lunar-based low-frequency (<30 MHz) radio telescope constructed from polyimide film widely used in aerospace applications, owing to its excellent dielectric properties and high stability as a substrate material. We first design and optimize an antenna in free space through dipole deformation and coupling principles, then simulate an antenna on the lunar surface with a simple lunar soil model, yielding an efficiency greater than 90% in the range of 12-19 MHz and greater than 10% in the range of 5-35 MHz. The antenna inherits the omni-directional radiation pattern of a simple dipole antenna in the 5-30 MHz frequency band, giving a large field of view and allowing detection of the 21 cm global signal when used alone. A demonstration prototype is constructed, and its measured electrical property is found to be consistent with simulated results using |S11| measurements. This membrane antenna can potentially fulfill the requirements of a lunar low-frequency array, establishing a solid technical foundation for future large-scale arrays for exploring the cosmic dark ages. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 14 pages, 19 figures

Journal ref: Astronomical Techniques and Instruments, 1(4): 227-238

arXiv:2408.09476 [pdf, other]

Advances in Multiple Instance Learning for Whole Slide Image Analysis: Techniques, Challenges, and Future Directions

Authors: Jun Wang, Yu Mao, Nan Guan, Chun Jason Xue

Abstract: Whole slide images (WSIs) are gigapixel-scale digital images of H\&E-stained tissue samples widely used in pathology. The substantial size and complexity of WSIs pose unique analytical challenges. Multiple Instance Learning (MIL) has emerged as a powerful approach for addressing these challenges, particularly in cancer classification and detection. This survey provides a comprehensive overview of… ▽ More Whole slide images (WSIs) are gigapixel-scale digital images of H\&E-stained tissue samples widely used in pathology. The substantial size and complexity of WSIs pose unique analytical challenges. Multiple Instance Learning (MIL) has emerged as a powerful approach for addressing these challenges, particularly in cancer classification and detection. This survey provides a comprehensive overview of the challenges and methodologies associated with applying MIL to WSI analysis, including attention mechanisms, pseudo-labeling, transformers, pooling functions, and graph neural networks. Additionally, it explores the potential of MIL in discovering cancer cell morphology, constructing interpretable machine learning models, and quantifying cancer grading. By summarizing the current challenges, methodologies, and potential applications of MIL in WSI analysis, this survey aims to inform researchers about the state of the field and inspire future research directions. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.09439 [pdf, other]

Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Authors: Zeyuan Chen, Haiyan Wu, Kaixin Wu, Wei Chen, Mingjie Zhong, Jia Xu, Zhongyi Liu, Wei Zhang

Abstract: Relevance modeling is a critical component for enhancing user experience in search engines, with the primary objective of identifying items that align with users' queries. Traditional models only rely on the semantic congruence between queries and items to ascertain relevance. However, this approach represents merely one aspect of the relevance judgement, and is insufficient in isolation. Even pow… ▽ More Relevance modeling is a critical component for enhancing user experience in search engines, with the primary objective of identifying items that align with users' queries. Traditional models only rely on the semantic congruence between queries and items to ascertain relevance. However, this approach represents merely one aspect of the relevance judgement, and is insufficient in isolation. Even powerful Large Language Models (LLMs) still cannot accurately judge the relevance of a query and an item from a semantic perspective. To augment LLMs-driven relevance modeling, this study proposes leveraging user interactions recorded in search logs to yield insights into users' implicit search intentions. The challenge lies in the effective prompting of LLMs to capture dynamic search intentions, which poses several obstacles in real-world relevance scenarios, i.e., the absence of domain-specific knowledge, the inadequacy of an isolated prompt, and the prohibitive costs associated with deploying LLMs. In response, we propose ProRBP, a novel Progressive Retrieved Behavior-augmented Prompting framework for integrating search scenario-oriented knowledge with LLMs effectively. Specifically, we perform the user-driven behavior neighbors retrieval from the daily search logs to obtain domain-specific knowledge in time, retrieving candidates that users consider to meet their expectations. Then, we guide LLMs for relevance modeling by employing advanced prompting techniques that progressively improve the outputs of the LLMs, followed by a progressive aggregation with comprehensive consideration of diverse aspects. For online serving, we have developed an industrial application framework tailored for the deployment of LLMs in relevance modeling. Experiments on real-world industry data and online A/B testing demonstrate our proposal achieves promising performance. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.09138 [pdf, other]

StylePrompter: Enhancing Domain Generalization with Test-Time Style Priors

Authors: Jiao Zhang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu

Abstract: In real-world applications, the sample distribution at the inference stage often differs from the one at the training stage, causing performance degradation of trained deep models. The research on domain generalization (DG) aims to develop robust algorithms that can improve the generalized performance in unseen domains by training on a few domains. However, the domain-agnostic vision model, traine… ▽ More In real-world applications, the sample distribution at the inference stage often differs from the one at the training stage, causing performance degradation of trained deep models. The research on domain generalization (DG) aims to develop robust algorithms that can improve the generalized performance in unseen domains by training on a few domains. However, the domain-agnostic vision model, trained on a limited number of domains using traditional domain generalization methods, cannot guarantee its effectiveness in dealing with unseen domains. The introduction of language can break the closed cognition space of the vision model, providing additional semantic information that cannot be inferred from vision-only datasets. In this paper, we propose to overcome the challenge in previous DG methods by introducing the style prompt in the language modality to adapt the trained model dynamically. In particular, we train a style prompter to extract style information of the current image into an embedding in the token embedding space and place it in front of the candidate category words as prior knowledge to prompt the model. Our open space partition of the style token embedding space and the hand-crafted style regularization enable the trained style prompter to handle data from unknown domains effectively. Extensive experiments verify the effectiveness of our method and demonstrate state-of-the-art performances on multiple public datasets. Codes will be available after the acceptance of this paper. △ Less

Submitted 17 August, 2024; originally announced August 2024.

arXiv:2408.08826 [pdf, other]

Search for the rare decay $J/ψ\to γD^0+c.c.$ at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (642 additional authors not shown)

Abstract: Using $(10087\pm44)\times10^6J/ψ$ events collected with the BESIII detector, we search for the rare decay $J/ψ\to γD^0+c.c.$ for the first time. No obvious signal is observed and the upper limit on the branching fraction is determined to be ${\cal B}(J/ψ\to γD^{0}+c.c.)< 9.1 \times 10^{-8}$ at 90\% confidence level. Using $(10087\pm44)\times10^6J/ψ$ events collected with the BESIII detector, we search for the rare decay $J/ψ\to γD^0+c.c.$ for the first time. No obvious signal is observed and the upper limit on the branching fraction is determined to be ${\cal B}(J/ψ\to γD^{0}+c.c.)< 9.1 \times 10^{-8}$ at 90\% confidence level. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.08736 [pdf, other]

Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution

Authors: Tianyi Xu, Yiji Zhou, Xiaotao Hu, Kai Zhang, Anran Zhang, Xingye Qiu, Jun Xu

Abstract: Arbitrary-scale super-resolution (ASSR) aims to learn a single model for image super-resolution at arbitrary magnifying scales. Existing ASSR networks typically comprise an off-the-shelf scale-agnostic feature extractor and an arbitrary scale upsampler. These feature extractors often use fixed network architectures to address different ASSR inference tasks, each of which is characterized by an inp… ▽ More Arbitrary-scale super-resolution (ASSR) aims to learn a single model for image super-resolution at arbitrary magnifying scales. Existing ASSR networks typically comprise an off-the-shelf scale-agnostic feature extractor and an arbitrary scale upsampler. These feature extractors often use fixed network architectures to address different ASSR inference tasks, each of which is characterized by an input image and an upsampling scale. However, this overlooks the difficulty variance of super-resolution on different inference scenarios, where simple images or small SR scales could be resolved with less computational effort than difficult images or large SR scales. To tackle this difficulty variability, in this paper, we propose a Task-Aware Dynamic Transformer (TADT) as an input-adaptive feature extractor for efficient image ASSR. Our TADT consists of a multi-scale feature extraction backbone built upon groups of Multi-Scale Transformer Blocks (MSTBs) and a Task-Aware Routing Controller (TARC). The TARC predicts the inference paths within feature extraction backbone, specifically selecting MSTBs based on the input images and SR scales. The prediction of inference path is guided by a new loss function to trade-off the SR accuracy and efficiency. Experiments demonstrate that, when working with three popular arbitrary-scale upsamplers, our TADT achieves state-of-the-art ASSR performance when compared with mainstream feature extractors, but with relatively fewer computational costs. The code will be publicly released. △ Less

Submitted 25 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

Comments: ECAI 2024

arXiv:2408.08485 [pdf, other]

Generalized code index modulation-aided frequency offset realign multiple-antenna spatial modulation approach for next-generation green communication systems

Authors: Bang Huang, Jiajie Xu, Mohamed-Slim Alouini

Abstract: For next-generation green communication systems, this article proposes an innovative communication system based on frequency-diverse array-multiple-input multiple-output (FDA-MIMO) technology, which aims to achieve high data rates while maintaining low power consumption. This system utilizes frequency offset index realign modulation, multiple-antenna spatial index modulation, and spreading code in… ▽ More For next-generation green communication systems, this article proposes an innovative communication system based on frequency-diverse array-multiple-input multiple-output (FDA-MIMO) technology, which aims to achieve high data rates while maintaining low power consumption. This system utilizes frequency offset index realign modulation, multiple-antenna spatial index modulation, and spreading code index modulation techniques. In the proposed generalized code index modulation-aided frequency offset realign multiple-antenna spatial modulation (GCIM-FORMASM) system, the coming bits are divided into five parts: spatial modulation bits by activating multiple transmit antennas, frequency offset index bits of the FDA antennas, including frequency offset combination bits and frequency offset realign bits, spreading code index modulation bits, and modulated symbol bits. Subsequently, this paper utilizes the orthogonal waveforms transmitted by the FDA to design the corresponding transmitter and receiver structures and provide specific expressions for the received signals. Meanwhile, to reduce the decoding complexity of the maximum likelihood (ML) algorithm, we propose a three-stage despreading-based low complexity (DBLC) algorithm leveraging the orthogonality of the spreading codes. Additionally, a closed-form expression for the upper bound of the average bit error probability (ABEP) of the DBLC algorithm has been derived. Analyzing metrics such as energy efficiency and data rate shows that the proposed system features low power consumption and high data transmission rates, which aligns better with the concept of future green communications. The effectiveness of our proposed methods has been validated through comprehensive numerical results. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.08416 [pdf, other]

Spin Relaxation and Diffusion in Monolayer 1T'-WTe$_2$ from First-Principles

Authors: Junqing Xu, Hiroyuki Takenaka, Andrew Grieder, Jacopo Simoni, Ravishankar Sundraraman, Yuan Ping

Abstract: Understanding spin relaxation in topological systems such as quantum spin-hall (QSH) insulator is critical for realizing coherent transport at high temperature. WTe$_{2}$, known as a QSH insulator with a high transition temperature of 100K, is an important test-bed of unveiling spin relaxation mechanism in topological materials. In this work, we employ our recently-developed \emph{ab initio} densi… ▽ More Understanding spin relaxation in topological systems such as quantum spin-hall (QSH) insulator is critical for realizing coherent transport at high temperature. WTe$_{2}$, known as a QSH insulator with a high transition temperature of 100K, is an important test-bed of unveiling spin relaxation mechanism in topological materials. In this work, we employ our recently-developed \emph{ab initio} density-matrix dynamics approach to investigate spin relaxation mechanism, and calculate spin lifetime and diffusion length of monolayer 1T'-WTe$_{2}$, at finite temperature under an external electric field. We found the spin lifetime of electrons have the largest anisotropy when measuring along the canted-spin-texture direction. Moreover, we found an opposite trend between spin and carrier relaxation against applied electric field. Most importantly, the relaxation mechanism under intermediate electric field around 1V/nm can not be explained by either Eillot-Yafet or Dyakonov-Perel models, which highlights the generality of our \emph{ab initio} density-matrix framework. We then proposed analytical models to explain its mechanism and compare well with \emph{ab initio} results at small and large electric field. We predict that spin lifetime and spin diffusion length of bulk-state electrons are $\sim$1 ps and $\sim$30 nm at room temperature respectively, suggesting its promise for spintronic applications. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.08209 [pdf, other]

Modeling Domain and Feedback Transitions for Cross-Domain Sequential Recommendation

Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Qi Liu, Ruobing Xie, Jun Xu, Ji-Rong Wen

Abstract: Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain… ▽ More Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain sequential recommendation methods typically model user interests by focusing solely on information about domain transitions, often overlooking the valuable insights provided by users' feedback transitions. In this paper, we propose $\text{Transition}^2$, a novel method to model transitions across both domains and types of user feedback. Specifically, $\text{Transition}^2$ introduces a transition-aware graph encoder based on user history, assigning different weights to edges according to the feedback type. This enables the graph encoder to extract historical embeddings that capture the transition information between different domains and feedback types. Subsequently, we encode the user history using a cross-transition multi-head self-attention, incorporating various masks to distinguish different types of transitions. Finally, we integrate these modules to make predictions across different domains. Experimental results on two public datasets demonstrate the effectiveness of $\text{Transition}^2$. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.07644 [pdf, other]

doi 10.13140/RG.2.2.24505.17769

SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning

Authors: Jianye Xu, Pan Hu, Bassam Alrifaee

Abstract: This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL) for motion planning of connected and automated vehicles. Most RL agents exhibit a limited capacity to generalize, often focusing narrowly on specific scenarios, and are usually evaluated in similar or even the same sce… ▽ More This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL) for motion planning of connected and automated vehicles. Most RL agents exhibit a limited capacity to generalize, often focusing narrowly on specific scenarios, and are usually evaluated in similar or even the same scenarios seen during training. Various methods have been proposed to address these challenges, including experience replay and regularization. However, how observation design in RL affects sample efficiency and generalization remains an under-explored area. We address this gap by proposing five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios. We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout. Incorporating these information-dense observations reduces training times to under one hour on a single CPU, and the evaluation results reveal that our RL agents can effectively zero-shot generalize. Code: github.com/cas-lab-munich/SigmaRL △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 8 pages, 5 figures, accepted for presentation at the IEEE International Conference on Intelligent Transportation Systems (ITSC) 2024

arXiv:2408.07060 [pdf, other]

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Authors: Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong

Abstract: Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agent… ▽ More Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agents, we propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise. DEI functions as a meta-module atop existing SWE agent frameworks, managing agent collectives for enhanced problem-solving. Experimental results show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin. For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions. Our best-performing group excels with a 55% resolve rate, securing the highest ranking on SWE-Bench Lite. Our findings contribute to the growing body of research on collaborative AI systems and their potential to solve complex software engineering challenges. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06710 [pdf, other]

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

Authors: Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng

Abstract: Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simpl… ▽ More Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simple data structures, as the generation of an effective proposal distribution can become quite challenging in high-dimensional spaces or with complex data sets. In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. By transforming the posterior into a sequence of intermediate distributions using annealing, we combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. We further propose an efficient algorithm by reparameterizing all variables in the evidence lower bound (ELBO). Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06699 [pdf, other]

Information Geometry and Beta Link for Optimizing Sparse Variational Student-t Processes

Authors: Jian Xu, Delu Zeng, John Paisley

Abstract: Recently, a sparse version of Student-t Processes, termed sparse variational Student-t Processes, has been proposed to enhance computational efficiency and flexibility for real-world datasets using stochastic gradient descent. However, traditional gradient descent methods like Adam may not fully exploit the parameter space geometry, potentially leading to slower convergence and suboptimal performa… ▽ More Recently, a sparse version of Student-t Processes, termed sparse variational Student-t Processes, has been proposed to enhance computational efficiency and flexibility for real-world datasets using stochastic gradient descent. However, traditional gradient descent methods like Adam may not fully exploit the parameter space geometry, potentially leading to slower convergence and suboptimal performance. To mitigate these issues, we adopt natural gradient methods from information geometry for variational parameter optimization of Student-t Processes. This approach leverages the curvature and structure of the parameter space, utilizing tools such as the Fisher information matrix which is linked to the Beta function in our model. This method provides robust mathematical support for the natural gradient algorithm when using Student's t-distribution as the variational distribution. Additionally, we present a mini-batch algorithm for efficiently computing natural gradients. Experimental results across four benchmark datasets demonstrate that our method consistently accelerates convergence speed. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06677 [pdf, other]

Search for $η_c(2S)\toωω$ and $ωφ$ decays and measurements of $χ_{cJ}\toωω$ and $ωφ$ in $ψ(2S)$ radiative processes

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $(2712\pm 14)$ $\times$ 10$^{6}$ $ψ(2S)$ events collected with the BESIII detector at the BEPCII collider, we search for the decays $η_{c}(2S)\toωω$ and $η_{c}(2S)\toωφ$ via the process $ψ(2S)\toγη_{c}(2S)$. Evidence of $η_{c}(2S)\toωω$ is found with a statistical significance of $3.2σ$. The branching fraction is measured to be… ▽ More Using $(2712\pm 14)$ $\times$ 10$^{6}$ $ψ(2S)$ events collected with the BESIII detector at the BEPCII collider, we search for the decays $η_{c}(2S)\toωω$ and $η_{c}(2S)\toωφ$ via the process $ψ(2S)\toγη_{c}(2S)$. Evidence of $η_{c}(2S)\toωω$ is found with a statistical significance of $3.2σ$. The branching fraction is measured to be $\mathcal{B}(η_{c}(2S)\toωω)=(5.65\pm3.77(\rm stat.)\pm5.32(\rm syst.))\times10^{-4}$. No statistically significant signal is observed for the decay $η_{c}(2S)\toωφ$. The upper limit of the branching fraction at the 90\% confidence level is determined to be $\mathcal{B}(ψ(2S)\toγη_{c}(2S),η_{c}(2S)\toωφ)<2.24\times 10^{-7}$. We also update the branching fractions of $χ_{cJ}\to ωω$ and $χ_{cJ}\toωφ$ decays via the $ψ(2S)\toγχ_{cJ}$ transition. The branching fractions are determined to be $\mathcal{B}(χ_{c0}\toωω)=(10.63\pm0.11\pm0.46)\times 10^{-4}$, $\mathcal{B}(χ_{c1}\toωω)=(6.39\pm0.07\pm0.29)\times 10^{-4}$, $\mathcal{B}(χ_{c2}\toωω)=(8.50\pm0.08\pm0.38)\times 10^{-4}$, $\mathcal{B}(χ_{c0}\toωφ)=(1.18\pm0.03\pm0.05)\times 10^{-4}$, $\mathcal{B}(χ_{c1}\toωφ)=(2.03\pm0.15\pm0.12)\times 10^{-5}$, and $\mathcal{B}(χ_{c2}\toωφ)=(9.37\pm1.07\pm0.59)\times 10^{-6}$, where the first uncertainties are statistical and the second are systematic. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06592 [pdf, other]

ActiveNeRF: Learning Accurate 3D Geometry by Active Pattern Projection

Authors: Jianyu Tao, Changping Hu, Edward Yang, Jing Xu, Rui Chen

Abstract: NeRFs have achieved incredible success in novel view synthesis. However, the accuracy of the implicit geometry is unsatisfactory because the passive static environmental illumination has low spatial frequency and cannot provide enough information for accurate geometry reconstruction. In this work, we propose ActiveNeRF, a 3D geometry reconstruction framework, which improves the geometry quality of… ▽ More NeRFs have achieved incredible success in novel view synthesis. However, the accuracy of the implicit geometry is unsatisfactory because the passive static environmental illumination has low spatial frequency and cannot provide enough information for accurate geometry reconstruction. In this work, we propose ActiveNeRF, a 3D geometry reconstruction framework, which improves the geometry quality of NeRF by actively projecting patterns of high spatial frequency onto the scene using a projector which has a constant relative pose to the camera. We design a learnable active pattern rendering pipeline which jointly learns the scene geometry and the active pattern. We find that, by adding the active pattern and imposing its consistency across different views, our proposed method outperforms state of the art geometry reconstruction methods qualitatively and quantitatively in both simulation and real experiments. Code is avaliable at https://github.com/hcp16/active_nerf △ Less