Search | arXiv e-print repository

arXiv:2407.21439 [pdf, other]

MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

Authors: Zhanpeng Chen, Chengjin Xu, Yiyan Qi, Jian Guo

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in processing and generating content across multiple data modalities, including text, images, audio, and video. However, a significant drawback of MLLMs is their reliance on static training data, leading to outdated information and limited contextual awareness. This static nature hampers their ability to provide acc… ▽ More Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in processing and generating content across multiple data modalities, including text, images, audio, and video. However, a significant drawback of MLLMs is their reliance on static training data, leading to outdated information and limited contextual awareness. This static nature hampers their ability to provide accurate, up-to-date responses, particularly in dynamic or rapidly evolving contexts. Integrating Multimodal Retrieval-augmented Generation (Multimodal RAG) offers a promising solution, but the system would inevitably encounter the multi-granularity noisy correspondence (MNC) problem, which involves two types of noise: coarse-grained (query-caption) and fine-grained (query-image). This noise hinders accurate retrieval and generation. In this work, we propose \textbf{RagLLaVA}, a novel framework with knowledge-enhanced reranking and noise-injected training, to address these limitations. We instruction-tune the MLLM with a simple yet effective instruction template to induce its ranking ability and serve it as a reranker to precisely filter the top-k retrieved images. For generation, we inject visual noise during training at the data and token levels to enhance the generator's robustness. Extensive experiments are conducted on the subsets of two datasets that require retrieving and reasoning over images to answer a given query. Our results demonstrate the superiority of RagLLaVA in retrieving accurately and generating robustly. Code and models are available at https://github.com/IDEA-FinAI/RagLLaVA. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.20551 [pdf, ps, other]

Observation of $D^0\to b_1(1235)^- e^+ν_e$ and evidence for $D^+\to b_1(1235)^0 e^+ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (647 additional authors not shown)

Abstract: By analyzing a data sample of $e^+e^-$ collisions with center-of-mass energy $\sqrt{s}=3.773$ GeV, corresponding to an integrated luminosity of $7.9~\rm {fb}^{-1}$ collected with the BESIII detector operating at the BEPCII collider, we study semileptonic decays of the $D^{0(+)}$ mesons into the axial-vector meson $b_1(1235)$ via the decay $b_1(1235)\to ωπ$. The decay… ▽ More By analyzing a data sample of $e^+e^-$ collisions with center-of-mass energy $\sqrt{s}=3.773$ GeV, corresponding to an integrated luminosity of $7.9~\rm {fb}^{-1}$ collected with the BESIII detector operating at the BEPCII collider, we study semileptonic decays of the $D^{0(+)}$ mesons into the axial-vector meson $b_1(1235)$ via the decay $b_1(1235)\to ωπ$. The decay $D^0\to b_1(1235)^-e^{+}ν_{e}$ is observed with a significance of 5.2$σ$ after considering systematic uncertainty, while evidence for the decay $D^+\to b_1(1235)^0 e^+ν_e$ is obtained with a 3.1$σ$ significance. The product branching fractions are determined to be ${\mathcal B}(D^0\to b_{1}(1235)^-e^{+}ν_{e})\times {\mathcal B} (b_1(1235)^-\to ωπ^-) = (0.72\pm0.18^{+0.06}_{-0.08})\times10^{-4}$ and ${\mathcal B}(D^+\to b_{1}(1235)^0e^{+}ν_{e})\times {\mathcal B} (b_1(1235)^0~\to ωπ^0) = (1.16\pm0.44\pm0.16)\times10^{-4}$, where the first uncertainties are statistical and the second systematic. The ratio of their partial decay widths is determined to be $\frac{Γ(D^0\to b_{1}(1235)^-e^{+}ν_{e})}{2Γ(D^+\to b_{1}(1235)^0e^{+}ν_{e})}=0.78\pm0.19^{+0.04}_{-0.05}$, which is consistent with unity, predicted by isospin invariance, within uncertainties. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 9 pages, 2 figures

arXiv:2407.20080 [pdf, other]

UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

Authors: Chaoqun Du, Yulin Wang, Jiayi Guo, Yizeng Han, Jie Zhou, Gao Huang

Abstract: Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributi… ▽ More Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributions. Despite these efforts, a unified and comprehensive benchmark has yet to be established. To this end, we propose a Unified Test-Time Adaptation (UniTTA) benchmark, which is comprehensive and widely applicable. Each scenario within the benchmark is fully described by a Markov state transition matrix for sampling from the original dataset. The UniTTA benchmark considers both domain and class as two independent dimensions of data and addresses various combinations of imbalance/balance and i.i.d./non-i.i.d./continual conditions, covering a total of $ (2 \times 3)^2 = 36 $ scenarios. It establishes a comprehensive evaluation benchmark for realistic TTA and provides a guideline for practitioners to select the most suitable TTA method. Alongside this benchmark, we propose a versatile UniTTA framework, which includes a Balanced Domain Normalization (BDN) layer and a COrrelated Feature Adaptation (COFA) method--designed to mitigate distribution gaps in domain and class, respectively. Extensive experiments demonstrate that our UniTTA framework excels within the UniTTA benchmark and achieves state-of-the-art performance on average. Our code is available at \url{https://github.com/LeapLabTHU/UniTTA}. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.20009 [pdf, ps, other]

Measurement of the $\boldsymbol{e^{+}e^{-}\to K^+K^-ψ(2S)}$ Cross Section at Center-of-Mass Energies from 4.699 to 4.951 GeV and Search for $\boldsymbol{Z_{cs}^{\pm}}$ in the $\boldsymbol{Z_{cs}^\pm\to K^\pmψ(2S)}$ Decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (646 additional authors not shown)

Abstract: We perform the first investigation of the process $e^{+}e^{-}\to K^+K^-ψ(2S)$ and report its Born cross sections over a range of center-of-mass energies from 4.699 to 4.951~GeV. The measurements are carried out using several partial reconstruction techniques using data samples collected by the BESIII detector with a total integrated luminosity of 2.5~fb$^{-1}$. We search for new tetraquark candida… ▽ More We perform the first investigation of the process $e^{+}e^{-}\to K^+K^-ψ(2S)$ and report its Born cross sections over a range of center-of-mass energies from 4.699 to 4.951~GeV. The measurements are carried out using several partial reconstruction techniques using data samples collected by the BESIII detector with a total integrated luminosity of 2.5~fb$^{-1}$. We search for new tetraquark candidates $Z_{cs}^\pm$ in the decays $Z_{cs}^\pm\to K^\pmψ(2S)$. No significant $Z_{cs}^\pm$ signals are observed. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 9 pages, 4 figures

arXiv:2407.19768 [pdf, other]

Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network

Authors: Wenjie Li, Heng Guo, Xuannan Liu, Kongming Liang, Jiani Hu, Zhanyu Ma, Jun Guo

Abstract: Face super-resolution aims to reconstruct a high-resolution face image from a low-resolution face image. Previous methods typically employ an encoder-decoder structure to extract facial structural features, where the direct downsampling inevitably introduces distortions, especially to high-frequency features such as edges. To address this issue, we propose a wavelet-based feature enhancement netwo… ▽ More Face super-resolution aims to reconstruct a high-resolution face image from a low-resolution face image. Previous methods typically employ an encoder-decoder structure to extract facial structural features, where the direct downsampling inevitably introduces distortions, especially to high-frequency features such as edges. To address this issue, we propose a wavelet-based feature enhancement network, which mitigates feature distortion by losslessly decomposing the input feature into high and low-frequency components using the wavelet transform and processing them separately. To improve the efficiency of facial feature extraction, a full domain Transformer is further proposed to enhance local, regional, and global facial features. Such designs allow our method to perform better without stacking many modules as previous methods did. Experiments show that our method effectively balances performance, model size, and speed. Code link: https://github.com/PRIS-CV/WFEN. △ Less

Submitted 30 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19421 [pdf, other]

Improved physics-informed neural network in mitigating gradient related failures

Authors: Pancheng Niu, Yongming Chen, Jun Guo, Yuqian Zhou, Minfu Feng, Yanchao Shi

Abstract: Physics-informed neural networks (PINNs) integrate fundamental physical principles with advanced data-driven techniques, driving significant advancements in scientific computing. However, PINNs face persistent challenges with stiffness in gradient flow, which limits their predictive capabilities. This paper presents an improved PINN (I-PINN) to mitigate gradient-related failures. The core of I-PIN… ▽ More Physics-informed neural networks (PINNs) integrate fundamental physical principles with advanced data-driven techniques, driving significant advancements in scientific computing. However, PINNs face persistent challenges with stiffness in gradient flow, which limits their predictive capabilities. This paper presents an improved PINN (I-PINN) to mitigate gradient-related failures. The core of I-PINN is to combine the respective strengths of neural networks with an improved architecture and adaptive weights containingupper bounds. The capability to enhance accuracy by at least one order of magnitude and accelerate convergence, without introducing extra computational complexity relative to the baseline model, is achieved by I-PINN. Numerical experiments with a variety of benchmarks illustrate the improved accuracy and generalization of I-PINN. The supporting data and code are accessible at https://github.com/PanChengN/I-PINN.git, enabling broader research engagement. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: Elsevier-LaTeX v1.2, 26 pages with 12 figures

MSC Class: 35Q68; 35Q90 ACM Class: G.4

arXiv:2407.19360 [pdf]

Ultralow-loss spiral resonators for precise LiDAR

Authors: Osama Terra, Warren Jin, Hussein Kotb, Joel Guo, John E. Bowers

Abstract: Swept laser interferometry is an extremely powerful solution embedded in several recent technologies such as absolute distance measurement, light detection and ranging, optical frequency domain reflectometry, optical coherence tomography, microresonator characterization, and gas spectroscopy. Nonlinearity in the optical frequency sweeping of tunable lasers is a fatal drawback in gaining the expect… ▽ More Swept laser interferometry is an extremely powerful solution embedded in several recent technologies such as absolute distance measurement, light detection and ranging, optical frequency domain reflectometry, optical coherence tomography, microresonator characterization, and gas spectroscopy. Nonlinearity in the optical frequency sweeping of tunable lasers is a fatal drawback in gaining the expected outcome from these technologies. Here, we introduce an onchip, millimeter scale, 7 m spiral resonator that is made of ultralow loss silicon nitride to act as a frequency ruler for correction of the tunable lasers sweeping nonlinearities. The sharp 2 MHz frequency lines of the 85 M high-quality resonator and the narrow spaced 25.57 MHz frequency ticks of the 7 m spiral allow unprecedented precise nonlinearity correction on an integrated photonics platform. Accurate measurements of the rulers frequency spacing, linewidth, and temperature and wavelength sensitivities of the frequency ticks are performed here to demonstrate the quality of the frequency ruler. In addition, the spiral resonator is implemented in an FMCW LiDAR experiment to demonstrate a potential application of the proposed onchip frequency ruler. △ Less

Submitted 27 July, 2024; originally announced July 2024.

Comments: 12 pages

arXiv:2407.18929 [pdf, other]

THEA-Code: an Autoencoder-Based IDS-correcting Code for DNA Storage

Authors: Alan J. X. Guo, Mengyi Wei, Yufan Dai, Yali Wei, Pengchen Zhang

Abstract: The insertion, deletion, substitution (IDS) correcting code has garnered increased attention due to significant advancements in DNA storage that emerged recently. Despite this, the pursuit of optimal solutions in IDS-correcting codes remains an open challenge, drawing interest from both theoretical and engineering perspectives. This work introduces a pioneering approach named THEA-code. The propos… ▽ More The insertion, deletion, substitution (IDS) correcting code has garnered increased attention due to significant advancements in DNA storage that emerged recently. Despite this, the pursuit of optimal solutions in IDS-correcting codes remains an open challenge, drawing interest from both theoretical and engineering perspectives. This work introduces a pioneering approach named THEA-code. The proposed method follows a heuristic idea of employing an end-to-end autoencoder for the integrated encoding and decoding processes. To address the challenges associated with deploying an autoencoder as an IDS-correcting code, we propose innovative techniques, including the differentiable IDS channel, the entropy constraint on the codeword, and the auxiliary reconstruction of the source sequence. These strategies contribute to the successful convergence of the autoencoder, resulting in a deep learning-based IDS-correcting code with commendable performance. Notably, THEA-Code represents the first instance of a deep learning-based code that is independent of conventional coding frameworks in the IDS-correcting domain. Comprehensive experiments, including an ablation study, provide a detailed analysis and affirm the effectiveness of THEA-Code. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.18556 [pdf, other]

Look Globally and Reason: Two-stage Path Reasoning over Sparse Knowledge Graphs

Authors: Saiping Guan, Jiyao Wei, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

Abstract: Sparse Knowledge Graphs (KGs), frequently encountered in real-world applications, contain fewer facts in the form of (head entity, relation, tail entity) compared to more populated KGs. The sparse KG completion task, which reasons answers for given queries in the form of (head entity, relation, ?) for sparse KGs, is particularly challenging due to the necessity of reasoning missing facts based on… ▽ More Sparse Knowledge Graphs (KGs), frequently encountered in real-world applications, contain fewer facts in the form of (head entity, relation, tail entity) compared to more populated KGs. The sparse KG completion task, which reasons answers for given queries in the form of (head entity, relation, ?) for sparse KGs, is particularly challenging due to the necessity of reasoning missing facts based on limited facts. Path-based models, known for excellent explainability, are often employed for this task. However, existing path-based models typically rely on external models to fill in missing facts and subsequently perform path reasoning. This approach introduces unexplainable factors or necessitates meticulous rule design. In light of this, this paper proposes an alternative approach by looking inward instead of seeking external assistance. We introduce a two-stage path reasoning model called LoGRe (Look Globally and Reason) over sparse KGs. LoGRe constructs a relation-path reasoning schema by globally analyzing the training data to alleviate the sparseness problem. Based on this schema, LoGRe then aggregates paths to reason out answers. Experimental results on five benchmark sparse KG datasets demonstrate the effectiveness of the proposed LoGRe model. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: Accepted to CIKM 2024

arXiv:2407.18137 [pdf, other]

XS-VID: An Extremely Small Video Object Detection Dataset

Authors: Jiahao Guo, Ziyang Xu, Lianjun Wu, Fei Gao, Wenyu Liu, Xinggang Wang

Abstract: Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection. However, existing SVOD datasets are scarce and suffer from issues such as insufficiently small objects, limited object categories, and lack of scene diversity, leading to unitary application scenarios for corresponding methods. To address this gap, we develop the… ▽ More Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection. However, existing SVOD datasets are scarce and suffer from issues such as insufficiently small objects, limited object categories, and lack of scene diversity, leading to unitary application scenarios for corresponding methods. To address this gap, we develop the XS-VID dataset, which comprises aerial data from various periods and scenes, and annotates eight major object categories. To further evaluate existing methods for detecting extremely small objects, XS-VID extensively collects three types of objects with smaller pixel areas: extremely small (\textit{es}, $0\sim12^2$), relatively small (\textit{rs}, $12^2\sim20^2$), and generally small (\textit{gs}, $20^2\sim32^2$). XS-VID offers unprecedented breadth and depth in covering and quantifying minuscule objects, significantly enriching the scene and object diversity in the dataset. Extensive validations on XS-VID and the publicly available VisDrone2019VID dataset show that existing methods struggle with small object detection and significantly underperform compared to general object detectors. Leveraging the strengths of previous methods and addressing their weaknesses, we propose YOLOFT, which enhances local feature associations and integrates temporal motion features, significantly improving the accuracy and stability of SVOD. Our datasets and benchmarks are available at \url{https://gjhhust.github.io/XS-VID/}. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.17800 [pdf, other]

Design of a LYSO Crystal Electromagnetic Calorimeter for DarkSHINE Experiment

Authors: Zhiyu Zhao, Qibin Liu, Jiyuan Chen, Jing Chen, Junfeng Chen, Xiang Chen, Changbo Fu, Jun Guo, Kim Siang Khaw, Liang Li, Shu Li, Danning Liu, Kun Liu, Siyuan Song, Tong Sun, Jiannan Tang, Yufeng Wang, Zhen Wang, Weihao Wu, Haijun Yang, Yuming Lin, Rui Yuan, Yulei Zhang, Yunlong Zhang, Baihong Zhou , et al. (2 additional authors not shown)

Abstract: This paper presents the design and optimization of a LYSO crystal-based electromagnetic calorimeter (ECAL) for the DarkSHINE experiment, which aims to search for dark photon as potential dark force mediator. The ECAL design has been meticulously evaluated through comprehensive simulations, focusing on optimizing dimensions, material choices, and placement within the detector array to enhance sensi… ▽ More This paper presents the design and optimization of a LYSO crystal-based electromagnetic calorimeter (ECAL) for the DarkSHINE experiment, which aims to search for dark photon as potential dark force mediator. The ECAL design has been meticulously evaluated through comprehensive simulations, focusing on optimizing dimensions, material choices, and placement within the detector array to enhance sensitivity in search for dark photon signatures while balancing cost and performance. The concluded ECAL design, comprising 2.5$\times$2.5$\times$4 cm$^3$ LYSO crystals arranged in a 52.5$\times$52.5$\times$44 cm$^3$ structure, ensures high energy resolution and effective energy containment. The study also explored the energy distribution across different ECAL regions and established a dynamic range for energy measurements, with a 4 GeV limit per crystal deemed sufficient. Additionally, the radiation tolerance of ECAL components was assessed, confirming the sustainability of LYSO crystals and radiation-resistant silicon sensors. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.17723 [pdf, other]

Your Graph Recommender is Provably a Single-view Graph Contrastive Learning

Authors: Wenjie Yang, Shengzhong Zhang, Jiaxing Guo, Zengfeng Huang

Abstract: Graph recommender (GR) is a type of graph neural network (GNNs) encoder that is customized for extracting information from the user-item interaction graph. Due to its strong performance on the recommendation task, GR has gained significant attention recently. Graph contrastive learning (GCL) is also a popular research direction that aims to learn, often unsupervised, GNNs with certain contrastive… ▽ More Graph recommender (GR) is a type of graph neural network (GNNs) encoder that is customized for extracting information from the user-item interaction graph. Due to its strong performance on the recommendation task, GR has gained significant attention recently. Graph contrastive learning (GCL) is also a popular research direction that aims to learn, often unsupervised, GNNs with certain contrastive objectives. As a general graph representation learning method, GCLs have been widely adopted with the supervised recommendation loss for joint training of GRs. Despite the intersection of GR and GCL research, theoretical understanding of the relationship between the two fields is surprisingly sparse. This vacancy inevitably leads to inefficient scientific research. In this paper, we aim to bridge the gap between the field of GR and GCL from the perspective of encoders and loss functions. With mild assumptions, we theoretically show an astonishing fact that graph recommender is equivalent to a commonly-used single-view graph contrastive model. Specifically, we find that (1) the classic encoder in GR is essentially a linear graph convolutional network with one-hot inputs, and (2) the loss function in GR is well bounded by a single-view GCL loss with certain hyperparameters. The first observation enables us to explain crucial designs of GR models, e.g., the removal of self-loop and nonlinearity. And the second finding can easily prompt many cross-field research directions. We empirically show a remarkable result that the recommendation loss and the GCL loss can be used interchangeably. The fact that we can train GR models solely with the GCL loss is particularly insightful, since before this work, GCLs were typically viewed as unsupervised methods that need fine-tuning. We also discuss some potential future works inspired by our theory. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.17184 [pdf, other]

Search for $η_{c}(2S)\to K^+ K^- η^{\prime}$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Using $(2.712\pm0.014)\times10^{9}$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII, we find an evidence of the $η_{c}(2S)\to K^+ K^- η^{\prime}$ decay with a statistical significance of 3.1$σ$. Its decay branching fraction is measured to be $(12.24\pm4.60(\mathrm{stat.})\pm2.37(\mathrm{syst.})\pm4.68(\mathrm{extr.}))\times 10^{-4}$, where the first uncertainty is stati… ▽ More Using $(2.712\pm0.014)\times10^{9}$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII, we find an evidence of the $η_{c}(2S)\to K^+ K^- η^{\prime}$ decay with a statistical significance of 3.1$σ$. Its decay branching fraction is measured to be $(12.24\pm4.60(\mathrm{stat.})\pm2.37(\mathrm{syst.})\pm4.68(\mathrm{extr.}))\times 10^{-4}$, where the first uncertainty is statistical, the second is systematic, and the third uncertainty is from the branching fraction of the $ψ(3686)\toγη_{c}(2S)$ decay. The upper limit on the product branching fraction $B[ψ(3686)\toγη_{c}(2S)] \times$ $B[η_{c}(2S)\to K^+ K^- η^{\prime}]$ is set to be $1.14 \times 10^{-6}$ at $90\%$ confidence level. In addition, the branching fractions of $χ_{c1}\to K^+ K^- η^{\prime}$ and $χ_{c2}\to K^+ K^- η^{\prime}$ are updated to be $(8.47\pm0.09(\mathrm{stat.})\pm0.47(\mathrm{syst.}))\times 10^{-4}$ and $(1.53\pm0.04(\mathrm{stat.})\pm0.08(\mathrm{syst.}))\times 10^{-4}$, respectively. The precision is improved by twofold. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.16924 [pdf]

Real-space topology-engineering of skyrmionic spin textures in a van der Waals ferromagnet Fe3GaTe2

Authors: Shuo Mi, Jianfeng Guo, Guojing Hu, Guangcheng Wang, Songyang Li, Zizhao Gong, Shuaizhao Jin, Rui Xu, Fei Pang, Wei Ji, Weiqiang Yu, Xiaolei Wang, Xueyun Wang, Haitao Yang, Zhihai Cheng

Abstract: Realizing magnetic skyrmions in two-dimensional (2D) van der Waals (vdW) ferromagnets offers unparalleled prospects for future spintronic applications. The room-temperature ferromagnet Fe3GaTe2 provides an ideal platform for tailoring these magnetic solitons. Here, skyrmions of distinct topological charges are artificially introduced and spatially engineered using magnetic force microscopy (MFM).… ▽ More Realizing magnetic skyrmions in two-dimensional (2D) van der Waals (vdW) ferromagnets offers unparalleled prospects for future spintronic applications. The room-temperature ferromagnet Fe3GaTe2 provides an ideal platform for tailoring these magnetic solitons. Here, skyrmions of distinct topological charges are artificially introduced and spatially engineered using magnetic force microscopy (MFM). The skyrmion lattice is realized by specific field-cooling process, and can be further controllably erased and painted via delicate manipulation of tip stray field. The skyrmion lattice with opposite topological charges (S = +1 or -1) can be tailored at the target regions to form topological skyrmion junctions (TSJs) with specific configurations. The delicate interplay of TSJs and spin-polarized device current were finally investigated via the in-situ transport measurements, alongside the topological stability of TSJs. Our results demonstrate that Fe3GaTe2 not only serves as a potential building block for room-temperature skyrmion-based spintronic devices, but also presents promising prospects for Fe3GaTe2-based heterostructures with the engineered topological spin textures. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.16664 [pdf, other]

Towards scalable efficient on-device ASR with transfer learning

Authors: Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay Mahadeokar, Xuedong Zhang

Abstract: Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition… ▽ More Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition compared to non-rare words. Our finding suggests that RNNT-loss pretraining, followed by monolingual fine-tuning with Minimum Word Error Rate (MinWER) loss, consistently reduces Word Error Rates (WER) across languages like Italian and French. WER Reductions (WERR) reach 36.2% and 42.8% compared to monolingual baselines for MLS and in-house datasets. Out-of-domain pretraining leads to 28% higher WERR than in-domain pretraining. Both rare and non-rare words benefit, with rare words showing greater improvements with out-of-domain pretraining, and non-rare words with in-domain pretraining. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.16273 [pdf, other]

Backdoor Attacks against Hybrid Classical-Quantum Neural Networks

Authors: Ji Guo, Wenbo Jiang, Rui Zhang, Wenshu Fan, Jiachen Li, Guoming Lu

Abstract: Hybrid Quantum Neural Networks (HQNNs) represent a promising advancement in Quantum Machine Learning (QML), yet their security has been rarely explored. In this paper, we present the first systematic study of backdoor attacks on HQNNs. We begin by proposing an attack framework and providing a theoretical analysis of the generalization bounds and minimum perturbation requirements for backdoor attac… ▽ More Hybrid Quantum Neural Networks (HQNNs) represent a promising advancement in Quantum Machine Learning (QML), yet their security has been rarely explored. In this paper, we present the first systematic study of backdoor attacks on HQNNs. We begin by proposing an attack framework and providing a theoretical analysis of the generalization bounds and minimum perturbation requirements for backdoor attacks on HQNNs. Next, we employ two classic backdoor attack methods on HQNNs and Convolutional Neural Networks (CNNs) to further investigate the robustness of HQNNs. Our experimental results demonstrate that HQNNs are more robust than CNNs, requiring more significant image modifications for successful attacks. Additionally, we introduce the Qcolor backdoor, which utilizes color shifts as triggers and employs the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to optimize hyperparameters. Through extensive experiments, we demonstrate the effectiveness, stealthiness, and robustness of the Qcolor backdoor. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.16154 [pdf, other]

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Authors: Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

Abstract: Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques… ▽ More Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques in LLM distillation typically use a black-box model API to generate high-quality pretrained and aligned datasets, or utilize white-box distillation by altering the loss function to better transfer knowledge from the teacher LLM. However, these methods ignore the knowledge differences between the student and teacher LLMs across domains. This results in excessive focus on domains with minimal performance gaps and insufficient attention to domains with large gaps, reducing overall performance. In this paper, we introduce a new LLM distillation framework called DDK, which dynamically adjusts the composition of the distillation dataset in a smooth manner according to the domain performance differences between the teacher and student models, making the distillation process more stable and effective. Extensive evaluations show that DDK significantly improves the performance of student models, outperforming both continuously pretrained baselines and existing knowledge distillation methods by a large margin. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.16134 [pdf, other]

Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data

Authors: Hengyu Fu, Zehao Dou, Jiawei Guo, Mengdi Wang, Minshuo Chen

Abstract: Diffusion Transformer, the backbone of Sora for video generation, successfully scales the capacity of diffusion models, pioneering new avenues for high-fidelity sequential data generation. Unlike static data such as images, sequential data consists of consecutive data frames indexed by time, exhibiting rich spatial and temporal dependencies. These dependencies represent the underlying dynamic mode… ▽ More Diffusion Transformer, the backbone of Sora for video generation, successfully scales the capacity of diffusion models, pioneering new avenues for high-fidelity sequential data generation. Unlike static data such as images, sequential data consists of consecutive data frames indexed by time, exhibiting rich spatial and temporal dependencies. These dependencies represent the underlying dynamic model and are critical to validate the generated data. In this paper, we make the first theoretical step towards bridging diffusion transformers for capturing spatial-temporal dependencies. Specifically, we establish score approximation and distribution estimation guarantees of diffusion transformers for learning Gaussian process data with covariance functions of various decay patterns. We highlight how the spatial-temporal dependencies are captured and affect learning efficiency. Our study proposes a novel transformer approximation theory, where the transformer acts to unroll an algorithm. We support our theoretical results by numerical experiments, providing strong evidence that spatial-temporal dependencies are captured within attention layers, aligning with our approximation theory. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 52 pages, 8 figures

arXiv:2407.15199 [pdf, other]

Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

Authors: Jingwei Guo, Meihui Wang, Ilya Ilyankou, Natchapon Jongwiriyanurak, Xiaowei Gao, Nicola Christie, James Haworth

Abstract: Panoramic cycling videos can record 360° views around the cyclists. Thus, it is essential to conduct automatic road user analysis on them using computer vision models to provide data for studies on cycling safety. However, the features of panoramic data such as severe distortions, large number of small objects and boundary continuity have brought great challenges to the existing CV models, includi… ▽ More Panoramic cycling videos can record 360° views around the cyclists. Thus, it is essential to conduct automatic road user analysis on them using computer vision models to provide data for studies on cycling safety. However, the features of panoramic data such as severe distortions, large number of small objects and boundary continuity have brought great challenges to the existing CV models, including poor performance and evaluation methods that are no longer applicable. In addition, due to the lack of data with annotations, it is not easy to re-train the models. In response to these problems, the project proposed and implemented a three-step methodology: (1) improve the prediction performance of the pre-trained object detection models on panoramic data by projecting the original image into 4 perspective sub-images; (2) introduce supports for boundary continuity and category information into DeepSORT, a commonly used multiple object tracking model, and set an improved detection model as its detector; (3) using the tracking results, develop an application for detecting the overtaking behaviour of the surrounding vehicles. Evaluated on the panoramic cycling dataset built by the project, the proposed methodology improves the average precision of YOLO v5m6 and Faster RCNN-FPN under any input resolution setting. In addition, it raises MOTA and IDF1 of DeepSORT by 7.6\% and 9.7\% respectively. When detecting the overtakes in the test videos, it achieves the F-score of 0.88. The code is available on GitHub at github.com/cuppp1998/360_object_tracking to ensure the reproducibility and further improvements of results. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.12270 [pdf, other]

Observation of $Λ_c^+ \to Λa_0(980)^+$ and Evidence for $Σ(1380)^+$ in $Λ_c^+ \to Λπ^+ η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Based on $6.1~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at center-of-mass energies from 4.600~GeV to 4.843~GeV with the BESIII detector at the BEPCII collider, a partial wave analysis of $Λ_c^+\toΛπ^+η$ is performed, and branching fractions and decay asymmetry parameters of intermediate processes are determined. The process $Λ_c^+\toΛa_0(980)^+$ is observed for the first time, and… ▽ More Based on $6.1~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at center-of-mass energies from 4.600~GeV to 4.843~GeV with the BESIII detector at the BEPCII collider, a partial wave analysis of $Λ_c^+\toΛπ^+η$ is performed, and branching fractions and decay asymmetry parameters of intermediate processes are determined. The process $Λ_c^+\toΛa_0(980)^+$ is observed for the first time, and evidence for the pentaquark candidate $Σ(1380)^+$ decaying into $Λπ^+$ is found with statistical significance larger than $3σ$. The branching fraction product $\mathcal{B}(Λ_{c}^{+} \to Λa_0(980)^+) \; \mathcal{B}( a_0(980)^+ \to π^{+}η)$ is determined to be $(1.05 \pm 0.16_{\mathrm{stat}} \pm 0.05_{\mathrm{syst}} \pm 0.07_{\mathrm{ext}})\%$, which is larger than theoretical calculations by $1 - 2$ orders of magnitude. Here the third (external) systematic is from $\mathcal{B}(Λ_{c}^{+} \to Λπ^+ η)$. Finally, we precisely obtain the absolute branching fraction $\mathcal{B}(Λ_{c}^{+} \to Λπ^+ η) = (1.94 \pm 0.07_{\mathrm{stat}} \pm 0.11_{\mathrm{syst}})\%$. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 16 pages, 8 figures

arXiv:2407.12186 [pdf, other]

Directly Optimizing for Synthesizability in Generative Molecular Design using Retrosynthesis Models

Authors: Jeff Guo, Philippe Schwaller

Abstract: Synthesizability in generative molecular design remains a pressing challenge. Existing methods to assess synthesizability span heuristics-based methods, retrosynthesis models, and synthesizability-constrained molecular generation. The latter has become increasingly prevalent and proceeds by defining a set of permitted actions a model can take when generating molecules, such that all generations ar… ▽ More Synthesizability in generative molecular design remains a pressing challenge. Existing methods to assess synthesizability span heuristics-based methods, retrosynthesis models, and synthesizability-constrained molecular generation. The latter has become increasingly prevalent and proceeds by defining a set of permitted actions a model can take when generating molecules, such that all generations are anchored in "synthetically-feasible" chemical transformations. To date, retrosynthesis models have been mostly used as a post-hoc filtering tool as their inference cost remains prohibitive to use directly in an optimization loop. In this work, we show that with a sufficiently sample-efficient generative model, it is straightforward to directly optimize for synthesizability using retrosynthesis models in goal-directed generation. Under a heavily-constrained computational budget, our model can generate molecules satisfying a multi-parameter drug discovery optimization task while being synthesizable, as deemed by the retrosynthesis model. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11727 [pdf, ps, other]

Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(0.547\pm0.026_{\rm stat}\pm0.016_{\rm syst})\%$ a… ▽ More Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(0.547\pm0.026_{\rm stat}\pm0.016_{\rm syst})\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(5.60\pm0.16_{\rm stat}\pm0.20_{\rm syst})\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(246.5\pm5.9_{\rm stat}\pm3.6_{\rm syst}\pm0.5_{\rm input})_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(252.7\pm3.6_{\rm stat}\pm4.5_{\rm syst}\pm0.6_{\rm input}))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(252.8\pm6.0_{\rm stat}\pm3.7_{\rm syst}\pm0.6_{\rm input})_{μν}$ MeV and ${f_{D^+_s}}=(259.2\pm3.6_{\rm stat}\pm4.5_{\rm syst}\pm0.6_{\rm input})_{τν}$ MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(0.986\pm0.023_{\rm stat}\pm0.014_{\rm syst}\pm0.003_{\rm input})_{μν}$ and $|V_{cs}| = (1.011\pm0.014_{\rm stat}\pm0.018_{\rm syst}\pm0.003_{\rm input})_{τν}$, respectively. △ Less

Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 13 figures

arXiv:2407.11585 [pdf, other]

QVD: Post-training Quantization for Video Diffusion Models

Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency. Unlike image diffusion, we observe that the temporal features, which are integrated into all frame features, exhibit pronounced skewness. Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization. To address these issues, we introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD. Specifically, we propose the High Temporal Discriminability Quantization (HTDQ) method, designed for temporal features, which retains the high discriminability of quantized features, providing precise temporal guidance for all video frames. In addition, we present the Scattered Channel Range Integration (SCRI) method which aims to improve the coverage of quantization levels across individual channels. Experimental validations across various models, datasets, and bit-width settings demonstrate the effectiveness of our QVD in terms of diverse metrics. In particular, we achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD. △ Less

Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: accepted by ACMMM2024

arXiv:2407.11504 [pdf, other]

Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval

Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

Abstract: Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query. Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning. However, the full power of pre-training for generative retrieval remains unde… ▽ More Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query. Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning. However, the full power of pre-training for generative retrieval remains underexploited due to its reliance on pre-defined static document identifiers, which may not align with evolving model parameters. In this work, we introduce BootRet, a bootstrapped pre-training method for generative retrieval that dynamically adjusts document identifiers during pre-training to accommodate the continuing memorization of the corpus. BootRet involves three key training phases: (i) initial identifier generation, (ii) pre-training via corpus indexing and relevance prediction tasks, and (iii) bootstrapping for identifier updates. To facilitate the pre-training phase, we further introduce noisy documents and pseudo-queries, generated by large language models, to resemble semantic connections in both indexing and retrieval tasks. Experimental results demonstrate that BootRet significantly outperforms existing pre-training generative retrieval baselines and performs well even in zero-shot settings. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted by ACL Findings 2024

arXiv:2407.11431 [pdf]

MRIo3DS-Net: A Mutually Reinforcing Images to 3D Surface RNN-like framework for model-adaptation indoor 3D reconstruction

Authors: Chang Li, Jiao Guo, Yufei Zhao, Yongjun Zhang

Abstract: This paper is the first to propose an end-to-end framework of mutually reinforcing images to 3D surface recurrent neural network-like for model-adaptation indoor 3D reconstruction,where multi-view dense matching and point cloud surface optimization are mutually reinforced by a RNN-like structure rather than being treated as a separate issue.The characteristics are as follows:In the multi-view dens… ▽ More This paper is the first to propose an end-to-end framework of mutually reinforcing images to 3D surface recurrent neural network-like for model-adaptation indoor 3D reconstruction,where multi-view dense matching and point cloud surface optimization are mutually reinforced by a RNN-like structure rather than being treated as a separate issue.The characteristics are as follows:In the multi-view dense matching module, the model-adaptation strategy is used to fine-tune and optimize a Transformer-based multi-view dense matching DNN,so that it has the higher image feature for matching and detail expression capabilities;In the point cloud surface optimization module,the 3D surface reconstruction network based on 3D implicit field is optimized by using model-adaptation strategy,which solves the problem of point cloud surface optimization without knowing normal vector of 3D surface.To improve and finely reconstruct 3D surfaces from point cloud,smooth loss is proposed and added to this module;The MRIo3DS-Net is a RNN-like framework,which utilizes the finely optimized 3D surface obtained by PCSOM to recursively reinforce the differentiable warping for optimizing MVDMM.This refinement leads to achieving better dense matching results, and better dense matching results leads to achieving better 3D surface results recursively and mutually.Hence, model-adaptation strategy can better collaborate the differences between the two network modules,so that they complement each other to achieve the better effect;To accelerate the transfer learning and training convergence from source domain to target domain,a multi-task loss function based on Bayesian uncertainty is used to adaptively adjust the weights between the two networks loss functions of MVDMM and PCSOM;In this multi-task cascade network framework,any modules can be replaced by any state-of-the-art networks to achieve better 3D reconstruction results. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10805 [pdf, other]

Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval

Authors: Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Jian Guo

Abstract: Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with… ▽ More Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with the knowledge graph and uses it as a navigational tool, which deepens and refines the RAG paradigm for information collection and integration. The KG-guided navigation fosters deep and long-range associations to uphold logical consistency and optimize the scope of retrieval for precision and interoperability. In conjunction, factual consistency can be better ensured through semantic similarity guided by precise directives. ToG${2.0}$ not only improves the accuracy and reliability of LLMs' responses but also demonstrates the potential of hybrid structured knowledge systems to significantly advance LLM reasoning, aligning it closer to human-like performance. We conducted extensive experiments on four public datasets to demonstrate the advantages of our method compared to the baseline. △ Less

Submitted 6 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10576 [pdf, ps, other]

Vector spaces over finite commutative rings

Authors: Jun Guo, Junli Liu, Qiuli Xu

Abstract: Vector spaces over finite fields and Anzahl formulas of subspaces were studied by Wan (Geometry of Classical Groups over Finite Fields, Science Press, 2002). As a generalization, we study vector spaces and singular linear spaces over commutative rings and obtain some Anzahl formulas of subspaces. Moreover, we discuss arcs and caps by using these formulas. Vector spaces over finite fields and Anzahl formulas of subspaces were studied by Wan (Geometry of Classical Groups over Finite Fields, Science Press, 2002). As a generalization, we study vector spaces and singular linear spaces over commutative rings and obtain some Anzahl formulas of subspaces. Moreover, we discuss arcs and caps by using these formulas. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 20 pages

arXiv:2407.10570 [pdf]

Multiple Peg-in-Hole Assembly of Tightly Coupled Multi-manipulator Using Learning-based Visual Servo

Authors: Jiawei Zhang, Chengchao Bai, Jifeng Guo

Abstract: Multiple peg-in-hole assembly is one of the fundamental tasks in robotic assembly. In the multiple peg-in-hole task for large-sized parts, it is challenging for a single manipulator to simultaneously align multiple distant pegs and holes, necessitating tightly coupled multi-manipulator systems. For such Multi-manipulator Multiple Peg-in-Hole (MMPiH) tasks, we proposes a collaborative visual servo… ▽ More Multiple peg-in-hole assembly is one of the fundamental tasks in robotic assembly. In the multiple peg-in-hole task for large-sized parts, it is challenging for a single manipulator to simultaneously align multiple distant pegs and holes, necessitating tightly coupled multi-manipulator systems. For such Multi-manipulator Multiple Peg-in-Hole (MMPiH) tasks, we proposes a collaborative visual servo control framework that uses only the monocular in-hand cameras of each manipulator to reduce positioning errors. Initially, we train a state classification neural network and a positioning neural network. The former is used to divide the states of peg and hole in the image into three categories: obscured, separated and overlapped, while the latter determines the position of the peg and hole in the image. Based on these findings, we propose a method to integrate the visual features of multiple manipulators using virtual forces, which can naturally combine with the cooperative controller of the multi-manipulator system. To generalize our approach to holes of different appearances, we varied the appearance of the holes during the dataset generation process. The results confirm that by considering the appearance of the holes, classification accuracy and positioning precision can be improved. Finally, the results show that our method achieves an 85% success rate in dual-manipulator dual peg-in-hole tasks with a clearance of 0.2 mm. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.09979 [pdf, other]

PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models

Authors: Can Cui, Ruining Deng, Junlin Guo, Quan Liu, Tianyuan Yao, Haichun Yang, Yuankai Huo

Abstract: The Vision Foundation Model has recently gained attention in medical image analysis. Its zero-shot learning capabilities accelerate AI deployment and enhance the generalizability of clinical applications. However, segmenting pathological images presents a special focus on the flexibility of segmentation targets. For instance, a single click on a Whole Slide Image (WSI) could signify a cell, a func… ▽ More The Vision Foundation Model has recently gained attention in medical image analysis. Its zero-shot learning capabilities accelerate AI deployment and enhance the generalizability of clinical applications. However, segmenting pathological images presents a special focus on the flexibility of segmentation targets. For instance, a single click on a Whole Slide Image (WSI) could signify a cell, a functional unit, or layers, adding layers of complexity to the segmentation tasks. Current models primarily predict potential outcomes but lack the flexibility needed for physician input. In this paper, we explore the potential of enhancing segmentation model flexibility by introducing various task prompts through a Large Language Model (LLM) alongside traditional task tokens. Our contribution is in four-fold: (1) we construct a computational-efficient pipeline that uses finetuned language prompts to guide flexible multi-class segmentation; (2) We compare segmentation performance with fixed prompts against free-text; (3) We design a multi-task kidney pathology segmentation dataset and the corresponding various free-text prompts; and (4) We evaluate our approach on the kidney pathology dataset, assessing its capacity to new cases during inference. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.09880 [pdf, other]

doi 10.1021/acs.nanolett.4c01612

Inferior interfacial superconductivity in 1 UC FeSe/SrVO$_3$/SrTiO$_3$ with screened interfacial electron-phonon coupling

Authors: Nan Guo, Xiaoyang Chen, Tianlun Yu, Yu Fan, Qinghua Zhang, Minyinan Lei, Xiaofeng Xu, Xuetao Zhu, Jiandong Guo, Lin Gu, Haichao Xu, Rui Peng, Donglai Feng

Abstract: Monolayer FeSe/TiO$_x$ and FeSe/FeO$_x$ interfaces exhibit significant superconductivity enhancement compared to bulk FeSe, with interfacial electron-phonon coupling (EPC) playing a crucial role. However, the reduced dimensionality in monolayer FeSe, which may drive superconducting fluctuations, complicates the understanding of the enhancement mechanisms. Here we construct a new superconducting in… ▽ More Monolayer FeSe/TiO$_x$ and FeSe/FeO$_x$ interfaces exhibit significant superconductivity enhancement compared to bulk FeSe, with interfacial electron-phonon coupling (EPC) playing a crucial role. However, the reduced dimensionality in monolayer FeSe, which may drive superconducting fluctuations, complicates the understanding of the enhancement mechanisms. Here we construct a new superconducting interface: monolayer FeSe/SrVO$_3$/SrTiO$_3$, in which the itinerant electrons of highly metallic SrVO$_3$ films can screen all the high-energy Fuchs-Kliewer phonons, including those of SrTiO$_3$, making it the first FeSe/oxide system with screened interfacial EPC while maintaining the monolayer FeSe thickness. Despite comparable doping levels, the heavily electron-doped monolayer FeSe/SrVO$_3$ exhibits a lower pairing temperature ($T_\mathrm{g}$ $\sim$ 48 K) than FeSe/SrTiO$_3$ and FeSe/LaFeO$_3$. Our findings disentangle the contributions of interfacial EPC from dimensionality on enhancing $T_\mathrm{g}$ in FeSe/oxide interfaces, underscoring the importance of interfacial EPC in $T_\mathrm{g}$ enhancement. This FeSe/VO$_x$ interface also provides a platform for studying the interfacial superconductivity. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Published in Nano Letters, 11 pages, 4 figures, 1 table

arXiv:2407.09457 [pdf, other]

How coronal mass ejections are influenced by the morphology and toroidal flux of their source magnetic flux ropes?

Authors: J. H. Guo, L. Linan, S. Poedts, Y. Guo, B. Schmieder, A. Lani, Y. W. Ni, M. Brchnelova, B. Perri, T. Baratashvili, S. T. Li, P. F. Chen

Abstract: Coronal mass ejections (CMEs) stand as intense eruptions of magnetized plasma from the Sun, playing a pivotal role in driving significant changes of the heliospheric environment. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space… ▽ More Coronal mass ejections (CMEs) stand as intense eruptions of magnetized plasma from the Sun, playing a pivotal role in driving significant changes of the heliospheric environment. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. The primary objective of this paper is to establish a connection between CMEs and their progenitors in solar source regions, enabling us to infer the magnetic structures of CMEs before their full development. To this end, we create a dataset comprising a magnetic flux rope series with varying projection shapes, sizes and toroidal fluxes, using the Regularized Biot-Savart Laws (RBSL). Thereafter, we simulate the propagation of these flux ropes from the solar surface to a distance of 25$R_{\odot}$ with our global coronal MHD model which is named COCONUT. Our parametric survey reveals significant impacts of source flux ropes on the consequent CMEs. We find that the projection shape can influence the magnetic structures of CMEs at 20$R_{\odot}$, albeit with minimal impacts on the propagation speed. However, these impacts diminish as source flux ropes become fat. In terms of toroidal flux, our simulation results demonstrate a pronounced correlation with the propagation speed of CMEs, as well as the successfulness in erupting. This work builds the bridge between the CMEs in the outer corona and their progenitors in solar source regions. Our parametric survey suggests that the projection shape, cross-section radius and toroidal flux of source flux ropes are crucial parameters in predicting magnetic structures and propagation speed of CMEs, providing valuable insights for space weather prediction. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 11 pages, 10 figrues, accepted for publication by A&A

arXiv:2407.08500 [pdf, other]

Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Model

Authors: Yuxing Tian, Yiyan Qi, Aiwen Jiang, Qi Huang, Jian Guo

Abstract: Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and strug… ▽ More Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and struggle to effectively address the dynamics inherent in CTDGs. Moreover, these methods often demand substantial domain expertise for parameter tuning and lack theoretical guarantees for augmentation efficacy. To address these issues, we propose Conda, a novel latent diffusion-based GDA method tailored for CTDGs. Conda features a sandwich-like architecture, incorporating a Variational Auto-Encoder (VAE) and a conditional diffusion model, aimed at generating enhanced historical neighbor embeddings for target nodes. Unlike conventional diffusion models trained on entire graphs via pre-training, Conda requires historical neighbor sequence embeddings of target nodes for training, thus facilitating more targeted augmentation. We integrate Conda into the CTDG model and adopt an alternating training strategy to optimize performance. Extensive experimentation across six widely used real-world datasets showcases the consistent performance improvement of our approach, particularly in scenarios with limited historical data. △ Less

Submitted 20 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted by KDD 2024

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07520 [pdf, other]

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Authors: Mingjin Zhang, Yuchun Wang, Jie Guo, Yunsong Li, Xinbo Gao, Jing Zhang

Abstract: The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared i… ▽ More The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images. Unlike a visible light camera, a thermal imager reveals an object's temperature distribution by capturing infrared radiation. Small targets often show a subtle temperature transition at the object's boundaries. To address this issue, we propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects. Specifically, we design a Perona-Malik diffusion (PMD)-based block and incorporate it into multiple levels of SAM's encoder to help it capture essential structural features while suppressing noise. Additionally, we devise a Granularity-Aware Decoder (GAD) to fuse the multi-granularity feature from the encoder to capture structural information that may be lost in long-distance modeling. Extensive experiments on the public datasets, including NUAA-SIRST, NUDT-SIRST, and IRSTD-1K, validate the design choice of IRSAM and its significant superiority over representative state-of-the-art methods. The source code are available at: github.com/IPIC-Lab/IRSAM. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 18 pages, 8 figures, to be published in ECCV2024

arXiv:2407.07356 [pdf, other]

Video In-context Learning

Authors: Wentao Zhang, Junliang Guo, Tianyu He, Li Zhao, Linli Xu, Jiang Bian

Abstract: In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single image guided by demonstrations. In this paper, we propose and study video in-context learning, where the model starts from an existing video clip and generates diverse potential future sequences, each semantically gu… ▽ More In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single image guided by demonstrations. In this paper, we propose and study video in-context learning, where the model starts from an existing video clip and generates diverse potential future sequences, each semantically guided by the prompted video demonstrations. To achieve this, we provide a clear definition of the task, and train an autoregressive Transformer on video datasets. We thoroughly analyze the effect of different datasets and represent frames as discrete tokens, and then model them by next token predictions. We design various evaluation metrics, including both objective and subjective measures, to demonstrate the visual quality and semantic accuracy of generation results. Our model follows the scaling law and generates high-quality video clips that accurately align with the semantic guidance provided by in-context examples. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.06992 [pdf, other]

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

Authors: Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

Abstract: Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks. The robustness of these models, essential for ensuring their reliability in practice, has also garnered significant attention. With a wide array of research on robust IR being proposed, we believe it is the opportune moment to consolidate the current status, glean insi… ▽ More Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks. The robustness of these models, essential for ensuring their reliability in practice, has also garnered significant attention. With a wide array of research on robust IR being proposed, we believe it is the opportune moment to consolidate the current status, glean insights from existing methodologies, and lay the groundwork for future development. We view the robustness of IR to be a multifaceted concept, emphasizing its necessity against adversarial attacks, out-of-distribution (OOD) scenarios and performance variance. With a focus on adversarial and OOD robustness, we dissect robustness solutions for dense retrieval models (DRMs) and neural ranking models (NRMs), respectively, recognizing them as pivotal components of the neural IR pipeline. We provide an in-depth discussion of existing methods, datasets, and evaluation metrics, shedding light on challenges and future directions in the era of large language models. To the best of our knowledge, this is the first comprehensive survey on the robustness of neural IR models, and we will also be giving our first tutorial presentation at SIGIR 2024 \url{https://sigir2024-robust-information-retrieval.github.io}. Along with the organization of existing work, we introduce a Benchmark for robust IR (BestIR), a heterogeneous evaluation benchmark for robust neural information retrieval, which is publicly available at \url{https://github.com/Davion-Liu/BestIR}. We hope that this study provides useful clues for future research on the robustness of IR models and helps to develop trustworthy search engines \url{https://github.com/Davion-Liu/Awesome-Robustness-in-Information-Retrieval}. △ Less

Submitted 16 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: Survey paper

arXiv:2407.06529 [pdf]

Advanced Financial Fraud Detection Using GNN-CL Model

Authors: Yu Cheng, Junjie Guo, Shiqing Long, You Wu, Mengfang Sun, Rong Zhang

Abstract: The innovative GNN-CL model proposed in this paper marks a breakthrough in the field of financial fraud detection by synergistically combining the advantages of graph neural networks (gnn), convolutional neural networks (cnn) and long short-term memory (LSTM) networks. This convergence enables multifaceted analysis of complex transaction patterns, improving detection accuracy and resilience agains… ▽ More The innovative GNN-CL model proposed in this paper marks a breakthrough in the field of financial fraud detection by synergistically combining the advantages of graph neural networks (gnn), convolutional neural networks (cnn) and long short-term memory (LSTM) networks. This convergence enables multifaceted analysis of complex transaction patterns, improving detection accuracy and resilience against complex fraudulent activities. A key novelty of this paper is the use of multilayer perceptrons (MLPS) to estimate node similarity, effectively filtering out neighborhood noise that can lead to false positives. This intelligent purification mechanism ensures that only the most relevant information is considered, thereby improving the model's understanding of the network structure. Feature weakening often plagues graph-based models due to the dilution of key signals. In order to further address the challenge of feature weakening, GNN-CL adopts reinforcement learning strategies. By dynamically adjusting the weights assigned to central nodes, it reinforces the importance of these influential entities to retain important clues of fraud even in less informative data. Experimental evaluations on Yelp datasets show that the results highlight the superior performance of GNN-CL compared to existing methods. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05905 [pdf, other]

Deep Learning-based CSI Feedback in Wi-Fi Systems

Authors: Fan Qi, Jiajia Guo, Yiming Cui, Xiangyi Li, Chao-Kai Wen, Shi Jin

Abstract: In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the r… ▽ More In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the realm of mobile communications, this paper introduces a DL-based CSI feedback framework, named EFNet, tailored for Wi-Fi systems. The proposed framework leverages an autoencoder to achieve precise feedback with minimal overhead. The process involves the station utilizing the encoder to compress and quantize a series of matrices into codeword bit streams, which are then fed back to the access point. Subsequently, the decoder installed at the AP reconstructs beamforming matrices from these bit streams. We implement the EFNet system using standard Wi-Fi equipment operating in the 2.4 GHz band. Experimental findings in an office environment reveal a remarkable 80.77% reduction in feedback overhead compared to the 802.11ac standard, alongside a significant boost in net throughput of up to 30.72%. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05666 [pdf, other]

Enhancing Neural Radiance Fields with Depth and Normal Completion Priors from Sparse Views

Authors: Jiawei Guo, HungChyun Chou, Ning Ding

Abstract: Neural Radiance Fields (NeRF) are an advanced technology that creates highly realistic images by learning about scenes through a neural network model. However, NeRF often encounters issues when there are not enough images to work with, leading to problems in accurately rendering views. The main issue is that NeRF lacks sufficient structural details to guide the rendering process accurately. To add… ▽ More Neural Radiance Fields (NeRF) are an advanced technology that creates highly realistic images by learning about scenes through a neural network model. However, NeRF often encounters issues when there are not enough images to work with, leading to problems in accurately rendering views. The main issue is that NeRF lacks sufficient structural details to guide the rendering process accurately. To address this, we proposed a Depth and Normal Dense Completion Priors for NeRF (CP\_NeRF) framework. This framework enhances view rendering by adding depth and normal dense completion priors to the NeRF optimization process. Before optimizing NeRF, we obtain sparse depth maps using the Structure from Motion (SfM) technique used to get camera poses. Based on the sparse depth maps and a normal estimator, we generate sparse normal maps for training a normal completion prior with precise standard deviations. During optimization, we apply depth and normal completion priors to transform sparse data into dense depth and normal maps with their standard deviations. We use these dense maps to guide ray sampling, assist distance sampling and construct a normal loss function for better training accuracy. To improve the rendering of NeRF's normal outputs, we incorporate an optical centre position embedder that helps synthesize more accurate normals through volume rendering. Additionally, we employ a normal patch matching technique to choose accurate rendered normal maps, ensuring more precise supervision for the model. Our method is superior to leading techniques in rendering detailed indoor scenes, even with limited input views. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05558 [pdf]

Hidden Convexity-Based Distributed Operation of Integrated Electricity-Gas Systems

Authors: Rong-Peng Liu, Yue Song, Junhong Liu, Xiaozhe Wang, Jinpeng Guo, Yunhe Hou

Abstract: We propose a hidden convexity-based method to address distributed optimal energy flow (OEF) problems for transmission-level integrated electricity-gas systems. First, we develop a node-wise decoupling method to de-compose an OEF problem into multiple OEF subproblems. Then, we propose a hidden convexity-based method to equivalently reformulate nonconvex OEF subproblems as semi-definite programs. Th… ▽ More We propose a hidden convexity-based method to address distributed optimal energy flow (OEF) problems for transmission-level integrated electricity-gas systems. First, we develop a node-wise decoupling method to de-compose an OEF problem into multiple OEF subproblems. Then, we propose a hidden convexity-based method to equivalently reformulate nonconvex OEF subproblems as semi-definite programs. This method differs from any ap-proximation and convexification methods that may incur infeasible solutions. Since all OEF subproblems are origi-nally convex or equivalently convexified, we adopt an ADMM to solve the hidden convexity-based distributed OEF problem with convergence analysis. Test results validate the effectiveness of the proposed method, especially in handling a large number of agents. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 7 pages

arXiv:2407.05376 [pdf, other]

Rethinking Closed-loop Planning Framework for Imitation-based Model Integrating Prediction and Planning

Authors: Jiayu Guo, Mingyue Feng, Pengfei Zhu, Chengjun Li, Jian Pu

Abstract: In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and pl… ▽ More In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and planning. The framework contains two running modes, namely planning and safety monitoring, wherein the neural network performs Motion Prediction and Planning (MPP) and Conditional Motion Prediction (CMP) correspondingly without altering architecture. We evaluate the efficacy of our framework using the nuPlan dataset and its simulator, conducting closed-loop experiments across diverse scenarios. The results demonstrate that the proposed framework ensures the feasibility and local stability of the planning process while maintaining safety with CMP safety monitoring. Compared to other learning-based methods, our approach achieves substantial improvement. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 7 pages,5 figures

arXiv:2407.05005 [pdf, other]

Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching

Authors: Yichen Li, Wenchao Xu, Haozhao Wang, Ruixuan Li, Yining Qi, Jingcai Guo

Abstract: This paper focuses on Federated Domain-Incremental Learning (FDIL) where each client continues to learn incremental tasks where their domain shifts from each other. We propose a novel adaptive knowledge matching-based personalized FDIL approach (pFedDIL) which allows each client to alternatively utilize appropriate incremental task learning strategy on the correlation with the knowledge from previ… ▽ More This paper focuses on Federated Domain-Incremental Learning (FDIL) where each client continues to learn incremental tasks where their domain shifts from each other. We propose a novel adaptive knowledge matching-based personalized FDIL approach (pFedDIL) which allows each client to alternatively utilize appropriate incremental task learning strategy on the correlation with the knowledge from previous tasks. More specifically, when a new task arrives, each client first calculates its local correlations with previous tasks. Then, the client can choose to adopt a new initial model or a previous model with similar knowledge to train the new task and simultaneously migrate knowledge from previous tasks based on these correlations. Furthermore, to identify the correlations between the new task and previous tasks for each client, we separately employ an auxiliary classifier to each target classification model and propose sharing partial parameters between the target classification model and the auxiliary classifier to condense model parameters. We conduct extensive experiments on several datasets of which results demonstrate that pFedDIL outperforms state-of-the-art methods by up to 14.35\% in terms of average accuracy of all tasks. △ Less

Submitted 18 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.03316 [pdf, other]

An Upper Limit on the Photoproduction Cross Section of the Spin-Exotic $π_1(1600)$

Authors: F. Afzal, C. S. Akondi, M. Albrecht, M. Amaryan, S. Arrigo, V. Arroyave, A. Asaturyan, A. Austregesilo, Z. Baldwin, F. Barbosa, J. Barlow, E. Barriga, R. Barsotti, D. Barton, V. Baturin, V. V. Berdnikov, T. Black, W. Boeglin, M. Boer, W. J. Briscoe, T. Britton, S. Cao, E. Chudakov, G. Chung, P. L. Cole , et al. (124 additional authors not shown)

Abstract: The spin-exotic hybrid meson $π_{1}(1600)$ is predicted to have a large decay rate to the $ωππ$ final state. Using 76.6~pb$^{-1}$ of data collected with the GlueX detector, we measure the cross sections for the reactions $γp \to ωπ^+ π^- p$, $γp \to ωπ^0 π^0 p$, and $γp\toωπ^-π^0Δ^{++}$ in the range $E_γ=$ 8-10 GeV. Using isospin conservation, we set the first upper limits on the photoproduction c… ▽ More The spin-exotic hybrid meson $π_{1}(1600)$ is predicted to have a large decay rate to the $ωππ$ final state. Using 76.6~pb$^{-1}$ of data collected with the GlueX detector, we measure the cross sections for the reactions $γp \to ωπ^+ π^- p$, $γp \to ωπ^0 π^0 p$, and $γp\toωπ^-π^0Δ^{++}$ in the range $E_γ=$ 8-10 GeV. Using isospin conservation, we set the first upper limits on the photoproduction cross sections of the $π^{0}_{1}(1600)$ and $π^{-}_{1}(1600)$. We combine these limits with lattice calculations of decay widths and find that photoproduction of $η'π$ is the most sensitive two-body system to search for the $π_1(1600)$. △ Less