Search | arXiv e-print repository

Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

Authors: Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

Abstract: Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF… ▽ More Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic lesion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates innovative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patchNCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: The paper has been accepted by Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2024

arXiv:2408.14955 [pdf, other]

De-excitations of highly excited $^{11}$B$^*$ and $^{15}$N$^*$ based on the GEMINI++ code

Authors: Yujie Niu, Wan-Lei Guo, Miao He, Jun Su

Abstract: Nuclear de-excitations associated with neutrino-nucleus interactions and nucleon decays are playing an increasingly significant role in neutrino experiments. We explore the GEMINI++ code and estimate its ability to account for the de-excitation processes of highly excited $^{11}$B$^*$ and $^{15}$N$^*$, which can be created in the liquid scintillator and water Cherenkov detectors respectively. It i… ▽ More Nuclear de-excitations associated with neutrino-nucleus interactions and nucleon decays are playing an increasingly significant role in neutrino experiments. We explore the GEMINI++ code and estimate its ability to account for the de-excitation processes of highly excited $^{11}$B$^*$ and $^{15}$N$^*$, which can be created in the liquid scintillator and water Cherenkov detectors respectively. It is found that GEMINI++ can not describe the nuclear experimental data of $^{11}$B$^*$ and $^{15}$N$^*$ well. To improve its performance for de-excitations of light nuclei, we modify GEMINI++ and then develop a code of GEMINI++4$ν$, which can give the best predictions compared with experimental measurements among some widely used statistical model codes. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 7 pages, 4 figures, 2 tables

arXiv:2408.13401 [pdf, ps, other]

Relative train tracks and endperiodic graph maps

Authors: Yan Mary He, Chenxi Wu

Abstract: We study endperiodic maps of an infinite graph with finitely many ends. We prove that any such map is homotopic to an endperiodic relative train track map. Moreover, we show that the (largest) Perron-Frobenius eigenvalue of the transition matrix is a canonical quantity associated to the map. We study endperiodic maps of an infinite graph with finitely many ends. We prove that any such map is homotopic to an endperiodic relative train track map. Moreover, we show that the (largest) Perron-Frobenius eigenvalue of the transition matrix is a canonical quantity associated to the map. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12910 [pdf, other]

What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance

Authors: Yilun Liu, Minggui He, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou Du, Duan Li, Jian Gao, Li Zhang, Hao Yang, Boxing Chen, Osamu Yoshie

Abstract: The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model… ▽ More The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model-preferred prompt generation from user queries. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. To address these issues, we propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity. DialPrompt is designed to follow a multi-turn guidance workflow, where in each round of dialogue the model queries user with their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt can improve interpretability by allowing users to understand the correlation between specific phrases and image attributes. Additionally, it enables greater user control and engagement in the prompt generation process, leading to more personalized and visually satisfying outputs. Experiments indicate that DialPrompt achieves a competitive result in the quality of synthesized images, outperforming existing prompt engineering approaches by 5.7%. Furthermore, in our user evaluation, DialPrompt outperforms existing approaches by 46.5% in user-centricity score and is rated 7.9/10 by 19 human reviewers. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12725 [pdf, other]

DUNE Phase II: Scientific Opportunities, Detector Concepts, Technological Solutions

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, C. Andreopoulos, M. Andreotti , et al. (1347 additional authors not shown)

Abstract: The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I… ▽ More The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the European Strategy for Particle Physics. While the construction of the DUNE Phase I is well underway, this White Paper focuses on DUNE Phase II planning. DUNE Phase-II consists of a third and fourth far detector (FD) module, an upgraded near detector complex, and an enhanced 2.1 MW beam. The fourth FD module is conceived as a "Module of Opportunity", aimed at expanding the physics opportunities, in addition to supporting the core DUNE science program, with more advanced technologies. This document highlights the increased science opportunities offered by the DUNE Phase II near and far detectors, including long-baseline neutrino oscillation physics, neutrino astrophysics, and physics beyond the standard model. It describes the DUNE Phase II near and far detector technologies and detector design concepts that are currently under consideration. A summary of key R&D goals and prototyping phases needed to realize the Phase II detector technical designs is also provided. DUNE's Phase II detectors, along with the increased beam power, will complete the full scope of DUNE, enabling a multi-decadal program of groundbreaking science with neutrinos. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Report number: FERMILAB-TM-2833-LBNF

arXiv:2408.11787 [pdf, other]

NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

Authors: Zhenye Lou, Qing Xu, Zekun Jiang, Xiangjian He, Zhen Chen, Yi Wang, Chenxin Li, Maggie M. He, Wenting Duan

Abstract: Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, th… ▽ More Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, the original SAM presents limited adaptation to medical images. Moreover, SAM requires providing manual bounding box prompts for each object to produce satisfactory segmentation masks, so it is laborious in nuclei segmentation scenarios. To address these limitations, we propose a domain-generalizable framework for nuclei image segmentation, abbreviated to NuSegDG. Specifically, we first devise a Heterogeneous Space Adapter (HS-Adapter) to learn multi-dimensional feature representations of different nuclei domains by injecting a small number of trainable parameters into the image encoder of SAM. To alleviate the labor-intensive requirement of manual prompts, we introduce a Gaussian-Kernel Prompt Encoder (GKP-Encoder) to generate density maps driven by a single point, which guides segmentation predictions by mixing position prompts and semantic prompts. Furthermore, we present a Two-Stage Mask Decoder (TSM-Decoder) to effectively convert semantic masks to instance maps without the manual demand for morphological shape refinement. Based on our experimental evaluations, the proposed NuSegDG demonstrates state-of-the-art performance in nuclei instance segmentation, exhibiting superior domain generalization capabilities. The source code is available at https://github.com/xq141839/NuSegDG. △ Less

Submitted 24 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: Under Reivew

arXiv:2408.10636 [pdf]

UWF-RI2FA: Generating Multi-frame Ultrawide-field Fluorescein Angiography from Ultrawide-field Retinal Imaging Improves Diabetic Retinopathy Stratification

Authors: Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He

Abstract: Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its… ▽ More Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its effectiveness in DR screening. A total of 18,321 UWF-FA images of different phases were registered with corresponding UWF-RI images and fed into a generative adversarial networks (GAN)-based model for training. The quality of generated UWF-FA images was evaluated through quantitative metrics and human evaluation. The DeepDRiD dataset was used to externally assess the contribution of generated UWF-FA images to DR classification, using area under the receiver operating characteristic curve (AUROC) as outcome metrics. The generated early, mid, and late phase UWF-FA images achieved high authenticity, with multi-scale similarity scores ranging from 0.70 to 0.91 and qualitative visual scores ranging from 1.64 to 1.98 (1=real UWF-FA quality). In fifty randomly selected images, 56% to 76% of the generated images were difficult to distinguish from real images in the Turing test. Moreover, adding these generated UWF-FA images for DR classification significantly increased the AUROC from 0.869 to 0.904 compared to the baseline model using UWF-RI images (P < .001). The model successfully generates realistic multi-frame UWF-FA images for enhancing DR stratification without intravenous dye injection. △ Less

Submitted 27 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: 22 pages, 2 figures

arXiv:2408.09671 [pdf, other]

GANPrompt: Enhancing Robustness in LLM-Based Recommendations with GAN-Enhanced Diversity Prompts

Authors: Xinyu Li, Chuang Zhao, Hongke Zhao, Likang Wu, Ming HE

Abstract: In recent years, LLM has demonstrated remarkable proficiency in comprehending and generating natural language, with a growing prevalence in the domain of recommender systems. However, LLM continues to face a significant challenge in that it is highly susceptible to the influence of prompt words. This inconsistency in response to minor alterations in prompt input may compromise the accuracy and res… ▽ More In recent years, LLM has demonstrated remarkable proficiency in comprehending and generating natural language, with a growing prevalence in the domain of recommender systems. However, LLM continues to face a significant challenge in that it is highly susceptible to the influence of prompt words. This inconsistency in response to minor alterations in prompt input may compromise the accuracy and resilience of recommendation models. To address this issue, this paper proposes GANPrompt, a multi-dimensional large language model prompt diversity framework based on Generative Adversarial Networks (GANs). The framework enhances the model's adaptability and stability to diverse prompts by integrating GAN generation techniques with the deep semantic understanding capabilities of LLMs. GANPrompt first trains a generator capable of producing diverse prompts by analysing multidimensional user behavioural data. These diverse prompts are then used to train the LLM to improve its performance in the face of unseen prompts. Furthermore, to ensure a high degree of diversity and relevance of the prompts, this study introduces a mathematical theory-based diversity constraint mechanism that optimises the generated prompts to ensure that they are not only superficially distinct, but also semantically cover a wide range of user intentions. Through extensive experiments on multiple datasets, we demonstrate the effectiveness of the proposed framework, especially in improving the adaptability and robustness of recommender systems in complex and dynamic environments. The experimental results demonstrate that GANPrompt yields substantial enhancements in accuracy and robustness relative to existing state-of-the-art methodologies. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.07301 [pdf]

Imaginary Poynting momentum driven particle rotation by cylindrically polarized Gaussian beams

Authors: Xue Yun, Yansheng Liang, Linquan Guo, Minru He, Tianyu Zhao, Shaowei Wang, Ming Lei

Abstract: Imaginary Poynting momentum (IPM) provides a new degree of freedom for particle manipulation. However, the application of IPM in experiments has been largely unexplored. Here, we demonstrate the IPM driven particle rotation by cylindrically polarized Gaussian beams with no spin or orbital angular momentum. Theoretical analysis and experimental measurements demonstrate that gold microparticles will… ▽ More Imaginary Poynting momentum (IPM) provides a new degree of freedom for particle manipulation. However, the application of IPM in experiments has been largely unexplored. Here, we demonstrate the IPM driven particle rotation by cylindrically polarized Gaussian beams with no spin or orbital angular momentum. Theoretical analysis and experimental measurements demonstrate that gold microparticles will be rotated in the azimuthal direction while confined in the radial direction. We achieved controllable rotation of the particle by tuning the cylindrical polarization state. Interestingly, the transfer of IPM to a gold particle is demonstrated to be competitive with that of spin angular momentum. These findings hold promising in light-matter interactions and particle manipulations. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 10 pages, 6 figures

MSC Class: 78A10 Physical optics

arXiv:2408.01599 [pdf]

Strongly interacting Hofstadter states in magic-angle twisted bilayer graphene

Authors: Minhao He, Xiaoyu Wang, Jiaqi Cai, Jonah Herzog-Arbeitman, Takashi Taniguchi, Kenji Watanabe, Ady Stern, B. Andrei Bernevig, Matthew Yankowitz, Oskar Vafek, Xiaodong Xu

Abstract: Magic-angle twisted bilayer graphene (MATBG) hosts a multitude of strongly correlated states at partial fillings of its flat bands. In a magnetic field, these flat bands further evolve into a unique Hofstadter spectrum renormalized by strong Coulomb interactions. Here, we study the interacting Hofstadter states spontaneously formed within the topological magnetic subbands of an ultraclean MATBG de… ▽ More Magic-angle twisted bilayer graphene (MATBG) hosts a multitude of strongly correlated states at partial fillings of its flat bands. In a magnetic field, these flat bands further evolve into a unique Hofstadter spectrum renormalized by strong Coulomb interactions. Here, we study the interacting Hofstadter states spontaneously formed within the topological magnetic subbands of an ultraclean MATBG device, notably including symmetry-broken Chern insulator (SBCI) states and fractional quantum Hall (FQH) states. The observed SBCI states form a cascade with their Chern numbers mimicking the main sequence correlated Chern insulators. The FQH states in MATBG form in Jain sequence; however, they disappear at high magnetic field, distinct from conventional FQH states which strengthen with increasing magnetic field. We reveal a unique magnetic field-driven phase transition from composite fermion phases to a dissipative Fermi liquid. Our theoretical analysis of the magnetic subbands hosting FQH states predicts non uniform quantum geometric properties far from the lowest Landau level. This points towards a more natural interpretation of these FQH states as in-field fractional Chern insulators of the magnetic subbands. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2408.00582 [pdf, other]

First Measurement of the Total Inelastic Cross-Section of Positively-Charged Kaons on Argon at Energies Between 5.0 and 7.5 GeV

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, C. Andreopoulos, M. Andreotti , et al. (1341 additional authors not shown)

Abstract: ProtoDUNE Single-Phase (ProtoDUNE-SP) is a 770-ton liquid argon time projection chamber that operated in a hadron test beam at the CERN Neutrino Platform in 2018. We present a measurement of the total inelastic cross section of charged kaons on argon as a function of kaon energy using 6 and 7 GeV/$c$ beam momentum settings. The flux-weighted average of the extracted inelastic cross section at each… ▽ More ProtoDUNE Single-Phase (ProtoDUNE-SP) is a 770-ton liquid argon time projection chamber that operated in a hadron test beam at the CERN Neutrino Platform in 2018. We present a measurement of the total inelastic cross section of charged kaons on argon as a function of kaon energy using 6 and 7 GeV/$c$ beam momentum settings. The flux-weighted average of the extracted inelastic cross section at each beam momentum setting was measured to be 380$\pm$26 mbarns for the 6 GeV/$c$ setting and 379$\pm$35 mbarns for the 7 GeV/$c$ setting. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Report number: CERN-EP-2024-211, FERMILAB-PUB-24-0216-V

arXiv:2407.21333 [pdf, other]

Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM

Authors: Can Wang, Hongliang Zhong, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

Abstract: Automatic furniture layout is long desired for convenient interior design. Leveraging the remarkable visual reasoning capabilities of multimodal large language models (MLLMs), recent methods address layout generation in a static manner, lacking the feedback-driven refinement essential for interactive user engagement. We introduce Chat2Layout, a novel interactive furniture layout generation system… ▽ More Automatic furniture layout is long desired for convenient interior design. Leveraging the remarkable visual reasoning capabilities of multimodal large language models (MLLMs), recent methods address layout generation in a static manner, lacking the feedback-driven refinement essential for interactive user engagement. We introduce Chat2Layout, a novel interactive furniture layout generation system that extends the functionality of MLLMs into the realm of interactive layout design. To achieve this, we establish a unified vision-question paradigm for in-context learning, enabling seamless communication with MLLMs to steer their behavior without altering model weights. Within this framework, we present a novel training-free visual prompting mechanism. This involves a visual-text prompting technique that assist MLLMs in reasoning about plausible layout plans, followed by an Offline-to-Online search (O2O-Search) method, which automatically identifies the minimal set of informative references to provide exemplars for visual-text prompting. By employing an agent system with MLLMs as the core controller, we enable bidirectional interaction. The agent not only comprehends the 3D environment and user requirements through linguistic and visual perception but also plans tasks and reasons about actions to generate and arrange furniture within the virtual space. Furthermore, the agent iteratively updates based on visual feedback from execution results. Experimental results demonstrate that our approach facilitates language-interactive generation and arrangement for diverse and complex 3D furniture. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: Main paper with supplemental materials

arXiv:2407.18460 [pdf, other]

Large Nernst Effect in a layered metallic antiferromagnet EuAl$_2$Si$_2$

Authors: Kunya Yang, Wei Xia, Xinrun Mi, Yiyue zhang, Long zhang, Aifeng Wang, Yisheng Chai, Xiaoyuan Zhou, Yanfeng Guo, Mingquan He

Abstract: The large Nernst effect is advantageous for developing transverse Nernst thermoelectric generators or Ettingshausen coolers within a single component, avoiding the complexity of electron- and hole-modules in longitudinal Seebeck thermoelectric devices. We report a large Nernst signal reaching 130 uV/K at 8 K and 13 T in the layered metallic antiferromagnet EuAl$_2$Si$_2$. Notably, this large trans… ▽ More The large Nernst effect is advantageous for developing transverse Nernst thermoelectric generators or Ettingshausen coolers within a single component, avoiding the complexity of electron- and hole-modules in longitudinal Seebeck thermoelectric devices. We report a large Nernst signal reaching 130 uV/K at 8 K and 13 T in the layered metallic antiferromagnet EuAl$_2$Si$_2$. Notably, this large transverse Nernst thermopower is two orders of magnitude greater than its longitudinal counterpart. The Nernst coefficient peaks around 4 K and 8 K at 3 T and 13 T, respectively. At similar temperatures, both the Hall coefficient and the Seebeck signal change sign. Additionally, nearly compensated electron- and hole-like carriers with high mobility ($\sim$ 4000 cm$^2$/Vs at 4 K) are revealed from the magnetoconductivity. These findings suggest that the large Nernst effect and vanishing Seebeck thermopower in EuAl$_2$Si$_2$ are due to the compensated electron- and hole-like bands, along with the high mobility of the Weyl band near the Fermi level. Our results underscore the importance of band compensation and topological fermiology in achieving large Nernst thermopower and exploring potential Nernst thermoelectric applications at low temperatures. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: 13 pages, 3 figures

arXiv:2407.18441 [pdf, ps, other]

Pressure metrics in geometry and dynamics

Authors: Yan Mary He, Homin Lee, Insung Park

Abstract: In this article, we first provide a survey of pressure metrics on various deformation spaces in geometry, topology, and dynamics. Then we discuss pressure metrics and their degeneracy loci on the space of quasi-Blaschke products In this article, we first provide a survey of pressure metrics on various deformation spaces in geometry, topology, and dynamics. Then we discuss pressure metrics and their degeneracy loci on the space of quasi-Blaschke products △ Less

Submitted 29 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

Comments: 19 pages

MSC Class: 37F10; 37F30; 32G15

arXiv:2407.18043 [pdf, other]

YOCO: You Only Calibrate Once for Accurate Extrinsic Parameter in LiDAR-Camera Systems

Authors: Tianle Zeng, Dengke He, Feifan Yan, Meixi He

Abstract: In a multi-sensor fusion system composed of cameras and LiDAR, precise extrinsic calibration contributes to the system's long-term stability and accurate perception of the environment. However, methods based on extracting and registering corresponding points still face challenges in terms of automation and precision. This paper proposes a novel fully automatic extrinsic calibration method for LiDA… ▽ More In a multi-sensor fusion system composed of cameras and LiDAR, precise extrinsic calibration contributes to the system's long-term stability and accurate perception of the environment. However, methods based on extracting and registering corresponding points still face challenges in terms of automation and precision. This paper proposes a novel fully automatic extrinsic calibration method for LiDAR-camera systems that circumvents the need for corresponding point registration. In our approach, a novel algorithm to extract required LiDAR correspondence point is proposed. This method can effectively filter out irrelevant points by computing the orientation of plane point clouds and extracting points by applying distance- and density-based thresholds. We avoid the need for corresponding point registration by introducing extrinsic parameters between the LiDAR and camera into the projection of extracted points and constructing co-planar constraints. These parameters are then optimized to solve for the extrinsic. We validated our method across multiple sets of LiDAR-camera systems. In synthetic experiments, our method demonstrates superior performance compared to current calibration techniques. Real-world data experiments further confirm the precision and robustness of the proposed algorithm, with average rotation and translation calibration errors between LiDAR and camera of less than 0.05 degree and 0.015m, respectively. This method enables automatic and accurate extrinsic calibration in a single one step, emphasizing the potential of calibration algorithms beyond using corresponding point registration to enhance the automation and precision of LiDAR-camera system calibration. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Journal ref: IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT2024

arXiv:2407.17267 [pdf, other]

M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis

Authors: Junyu Li, Ye Zhang, Wen Shu, Xiaobing Feng, Yingchun Wang, Pengju Yan, Xiaolin Li, Chulin Sha, Min He

Abstract: Multiple instance learning (MIL) has been successfully applied for whole slide images (WSIs) analysis in computational pathology, enabling a wide range of prediction tasks from tumor subtyping to inferring genetic mutations and multi-omics biomarkers. However, existing MIL methods predominantly focus on single-task learning, resulting in not only overall low efficiency but also the overlook of int… ▽ More Multiple instance learning (MIL) has been successfully applied for whole slide images (WSIs) analysis in computational pathology, enabling a wide range of prediction tasks from tumor subtyping to inferring genetic mutations and multi-omics biomarkers. However, existing MIL methods predominantly focus on single-task learning, resulting in not only overall low efficiency but also the overlook of inter-task relatedness. To address these issues, we proposed an adapted architecture of Multi-gate Mixture-of-experts with Multi-proxy for Multiple instance learning (M4), and applied this framework for simultaneous prediction of multiple genetic mutations from WSIs. The proposed M4 model has two main innovations: (1) utilizing a mixture of experts with multiple gating strategies for multi-genetic mutation prediction on a single pathological slide; (2) constructing multi-proxy expert network and gate network for comprehensive and effective modeling of pathological image information. Our model achieved significant improvements across five tested TCGA datasets in comparison to current state-of-the-art single-task methods. The code is available at:https://github.com/Bigyehahaha/M4. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 25pages,5figures

arXiv:2407.15926 [pdf, other]

Thermalization and hotspot formation around small primordial black holes

Authors: Minxi He, Kazunori Kohri, Kyohei Mukaida, Masaki Yamada

Abstract: We quantitatively analyze a basic question: what is the stationary solution of the background plasma temperature profile around a black hole (BH)? One may naively expect that the temperature profile continuously decreases from the Hawking temperature at the surface of the BH towards an outer region. We show analytically and numerically that this is not the case because local thermal equilibrium ca… ▽ More We quantitatively analyze a basic question: what is the stationary solution of the background plasma temperature profile around a black hole (BH)? One may naively expect that the temperature profile continuously decreases from the Hawking temperature at the surface of the BH towards an outer region. We show analytically and numerically that this is not the case because local thermal equilibrium cannot be maintained near the surface of the BH and also because the high-energy particles emitted from Hawking radiation cannot be instantaneously thermalized into the background plasma. The temperature profile has a plateau within a finite distance from the BH, and even the overall amplitude of background temperature at a distance far away from the BH is significantly suppressed compared with the naive expectation. The main reason for these counterintuitive results comes from the fact that the size of the BH is too small that particles of Hawking radiation goes far away within the typical time scale of interactions. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 24 pages, 9 figures

Report number: KEK-TH-2639, TU-1238, CTPU-PTC-24-22, KEK-Cosmo-0351, KEK-QUP-2024-0018

arXiv:2407.14459 [pdf, other]

doi 10.1145/3637528.3671849

PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer

Authors: Jiahong Ma, Mingguo He, Zhewei Wei

Abstract: Spectral Graph Neural Networks have demonstrated superior performance in graph representation learning. However, many current methods focus on employing shared polynomial coefficients for all nodes, i.e., learning node-unified filters, which limits the filters' flexibility for node-level tasks. The recent DSF attempts to overcome this limitation by learning node-wise coefficients based on position… ▽ More Spectral Graph Neural Networks have demonstrated superior performance in graph representation learning. However, many current methods focus on employing shared polynomial coefficients for all nodes, i.e., learning node-unified filters, which limits the filters' flexibility for node-level tasks. The recent DSF attempts to overcome this limitation by learning node-wise coefficients based on positional encoding. However, the initialization and updating process of the positional encoding are burdensome, hindering scalability on large-scale graphs. In this work, we propose a scalable node-wise filter, PolyAttn. Leveraging the attention mechanism, PolyAttn can directly learn node-wise filters in an efficient manner, offering powerful representation capabilities. Building on PolyAttn, we introduce the whole model, named PolyFormer. In the lens of Graph Transformer models, PolyFormer, which calculates attention scores within nodes, shows great scalability. Moreover, the model captures spectral information, enhancing expressiveness while maintaining efficiency. With these advantages, PolyFormer offers a desirable balance between scalability and expressiveness for node-level tasks. Extensive experiments demonstrate that our proposed methods excel at learning arbitrary node-wise filters, showing superior performance on both homophilic and heterophilic graphs, and handling graphs containing up to 100 million nodes. The code is available at https://github.com/air029/PolyFormer. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: ACM SIGKDD 2024

arXiv:2407.14153 [pdf, other]

ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

Authors: Qing Xu, Jiaxuan Li, Xiangjian He, Ziyu Liu, Zhen Chen, Wenting Duan, Chenxin Li, Maggie M. He, Fiseha B. Tesema, Wooi P. Cheah, Yi Wang, Rong Qu, Jonathan M. Garibaldi

Abstract: The universality of deep neural networks across different modalities and their generalization capabilities to unseen domains play an essential role in medical image segmentation. The recent Segment Anything Model (SAM) has demonstrated its potential in both settings. However, the huge computational costs, demand for manual annotations as prompts and conflict-prone decoding process of SAM degrade i… ▽ More The universality of deep neural networks across different modalities and their generalization capabilities to unseen domains play an essential role in medical image segmentation. The recent Segment Anything Model (SAM) has demonstrated its potential in both settings. However, the huge computational costs, demand for manual annotations as prompts and conflict-prone decoding process of SAM degrade its generalizability and applicability in clinical scenarios. To address these issues, we propose an efficient self-prompting SAM for universal domain-generalized medical image segmentation, named ESP-MedSAM. Specifically, we first devise the Multi-Modal Decoupled Knowledge Distillation (MMDKD) strategy to construct a lightweight semi-parameter sharing image encoder that produces discriminative visual features for diverse modalities. Further, we introduce the Self-Patch Prompt Generator (SPPG) to automatically generate high-quality dense prompt embeddings for guiding segmentation decoding. Finally, we design the Query-Decoupled Modality Decoder (QDMD) that leverages a one-to-one strategy to provide an independent decoding channel for every modality. Extensive experiments indicate that ESP-MedSAM outperforms state-of-the-arts in diverse medical imaging segmentation tasks, displaying superior modality universality and generalization capabilities. Especially, ESP-MedSAM uses only 4.5\% parameters compared to SAM-H. The source code is available at https://github.com/xq141839/ESP-MedSAM. △ Less

Submitted 17 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: Under Review

arXiv:2407.10339 [pdf, other]

Supernova Pointing Capabilities of DUNE

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electr… ▽ More The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electron-neutrino charged-current absorption on $^{40}$Ar and elastic scattering of neutrinos on electrons. Procedures to reconstruct individual interactions, including a newly developed technique called ``brems flipping'', as well as the burst direction from an ensemble of interactions are described. Performance of the burst direction reconstruction is evaluated for supernovae happening at a distance of 10 kpc for a specific supernova burst flux model. The pointing resolution is found to be 3.4 degrees at 68% coverage for a perfect interaction-channel classification and a fiducial mass of 40 kton, and 6.6 degrees for a 10 kton fiducial mass respectively. Assuming a 4% rate of charged-current interactions being misidentified as elastic scattering, DUNE's burst pointing resolution is found to be 4.3 degrees (8.7 degrees) at 68% coverage. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 25 pages, 16 figures

Report number: FERMILAB-PUB-24-0319-LBNF

arXiv:2407.08150 [pdf, other]

Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within the videos are excessively monotonous, transmitting allegories and emotions that are overly simplistic. To bridge the gap to real-world applications, we introduce a large-scale Subjective Response Indicators for Advertisement Videos dataset, namely SRI-ADV. Specifically, we collected real changes in Electroencephalographic (EEG) and eye-tracking regions from different demographics while they viewed identical video content. Utilizing this multi-modal dataset, we developed tasks and protocols to analyze and evaluate the extent of cognitive understanding of video content among different users. Along with the dataset, we designed a Hypergraph Multi-modal Large Language Model (HMLLM) to explore the associations among different demographics, video elements, EEG, and eye-tracking indicators. HMLLM could bridge semantic gaps across rich modalities and integrate information beyond different modalities to perform logical reasoning. Extensive experimental evaluations on SRI-ADV and other additional video-based generative performance benchmarks demonstrate the effectiveness of our method. The codes and dataset will be released at https://github.com/suay1113/HMLLM. △ Less

Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ACM MULTIMEDIA 2024

arXiv:2407.07053 [pdf, other]

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In light of this, we design a multi-modal self-instruct, utilizing large language models and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios. Our strategy effortlessly creates a multimodal benchmark with 11,193 instructions for eight visual scenarios: charts, tables, simulated maps, dashboards, flowcharts, relation graphs, floor plans, and visual puzzles. \textbf{This benchmark, constructed with simple lines and geometric elements, exposes the shortcomings of most advanced LMMs} like Claude-3.5-Sonnet and GPT-4o in abstract image understanding, spatial relations reasoning, and visual element induction. Besides, to verify the quality of our synthetic data, we fine-tune an LMM using 62,476 synthetic chart, table and road map instructions. The results demonstrate improved chart understanding and map navigation performance, and also demonstrate potential benefits for other visual reasoning tasks. Our code is available at: \url{https://github.com/zwq2018/Multi-modal-Self-instruct}. △ Less

Submitted 8 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

arXiv:2407.05234 [pdf, ps, other]

Statistical Production of $B_c$ Mesons in Heavy-Ion Collisions at the LHC Energy

Authors: Shouxing Zhao, Min He

Abstract: The recombination production of $B_c$ mesons in heavy-ion collisions at the LHC energy is facilitated by the abundant and highly thermalized charm ($c$) quarks transported in the deconfined medium created. We study the production of $B_c$ mesons via $c$ and bottom ($b$) quark recombination in a statistical fashion by placing $B_c$ in the position of a member of the family of open $b$ hadrons, whic… ▽ More The recombination production of $B_c$ mesons in heavy-ion collisions at the LHC energy is facilitated by the abundant and highly thermalized charm ($c$) quarks transported in the deconfined medium created. We study the production of $B_c$ mesons via $c$ and bottom ($b$) quark recombination in a statistical fashion by placing $B_c$ in the position of a member of the family of open $b$ hadrons, which allows us to make quantitative predictions for the modifications of the production fraction ($f_c$) of $B_c$ mesons and its relative production to $B$ mesons in $\sqrt{s_{\rm NN}}=5.02$ TeV Pb-Pb collisions with respect to proton-proton ($pp$) collisions at the same energy. The statistical production yield of $B_c$ mesons is converted into the transverse momentum ($p_T$) distribution with the shape computed from resonance recombination using the $c$- and $b$-quark phase space distributions that have been simulated via Langevin diffusion and constrained by open $c$- and $b$-hadron observables. Supplemented with the component fragmented from $b$-quark spectrum that dominates at high $p_T$, the total $p_T$ spectrum of $B_c$ mesons is obtained and converted into the $p_T$ dependent nuclear modification factor ($R_{\rm AA}$). Both $f_c$ and the integrated $R_{\rm AA}$ exhibit a $\sim5$-fold enhancement in central Pb-Pb collisions relative to the $pp$ reference. Comparison with data measured by the CMS experiment shows decent agreement within theoretical and experimental uncertainties. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures

arXiv:2407.03913 [pdf, other]

MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices

Authors: Jiayi Zhang, Chuang Zhao, Yihan Zhao, Zhaoyang Yu, Ming He, Jianping Fan

Abstract: The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement… ▽ More The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement in handling complex tasks and reducing high reasoning costs. In this paper, we introduce MobileExperts, which for the first time introduces tool formulation and multi-agent collaboration to address the aforementioned challenges. More specifically, MobileExperts dynamically assembles teams based on the alignment of agent portraits with the human requirements. Following this, each agent embarks on an independent exploration phase, formulating its tools to evolve into an expert. Lastly, we develop a dual-layer planning mechanism to establish coordinate collaboration among experts. To validate our effectiveness, we design a new benchmark of hierarchical intelligence levels, offering insights into algorithm's capability to address tasks across a spectrum of complexity. Experimental results demonstrate that MobileExperts performs better on all intelligence levels and achieves ~ 22% reduction in reasoning costs, thus verifying the superiority of our design. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01903 [pdf, other]

Text-Aware Diffusion for Policy Learning

Authors: Calvin Luo, Mandy He, Zilai Zeng, Chen Sun

Abstract: Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware… ▽ More Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware Diffusion for Policy Learning (TADPoLe), which uses a pretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning. We hypothesize that large-scale pretrained generative models encode rich priors that can supervise a policy to behave not only in a text-aligned manner, but also in alignment with a notion of naturalness summarized from internet-scale training data. In our experiments, we demonstrate that TADPoLe is able to learn policies for novel goal-achievement and continuous locomotion behaviors specified by natural language, in both Humanoid and Dog environments. The behaviors are learned zero-shot without ground-truth rewards or expert demonstrations, and are qualitatively more natural according to human evaluation. We further show that TADPoLe performs competitively when applied to robotic manipulation tasks in the Meta-World environment. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00390 [pdf, other]

Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

Authors: Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu

Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales.Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, t… ▽ More Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales.Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, thereby limiting the effectiveness of the feedback provided. To overcome this limitation, we propose Tree-based Preference Learning Verifier (Tree-PLV), a novel approach that constructs reasoning trees via a best-first search algorithm and collects step-level paired data for preference training. Compared to traditional binary classification, step-level preferences more finely capture the nuances between reasoning steps, allowing for a more precise evaluation of the complete reasoning path. We empirically evaluate Tree-PLV across a range of arithmetic and commonsense reasoning tasks, where it significantly outperforms existing benchmarks. For instance, Tree-PLV achieved substantial performance gains over the Mistral-7B self-consistency baseline on GSM8K (67.55% to 82.79%), MATH (17.00% to 26.80%), CSQA (68.14% to 72.97%), and StrategyQA (82.86% to 83.25%).Additionally, our study explores the appropriate granularity for applying preference learning, revealing that step-level guidance provides feedback that better aligns with the evaluation of the reasoning process. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.17555 [pdf, ps, other]

A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al

Authors: Ji Yan, Jiwei Li, X. T. He, Lifeng Wang, Yaohua Chen, Feng Wang, Xiaoying Han, Kaiqiang Pan, Juxi Liang, Yulong Li, Zanyang Guan, Xiangming Liu, Xingsen Che, Zhongjing Chen, Xing Zhang, Yan Xu, Bin Li, Minging He, Hongbo Cai, Liang. Hao, Zhanjun Liu, Chunyang Zheng, Zhensheng Dai, Zhengfeng Fan, Bin Qiao , et al. (4 additional authors not shown)

Abstract: A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17475 [pdf, other]

doi 10.1145/3637528.3671786

Performative Debias with Fair-exposure Optimization Driven by Strategic Agents in Recommender Systems

Authors: Zhichen Xiang, Hongke Zhao, Chuang Zhao, Ming He, Jianping Fan

Abstract: Data bias, e.g., popularity impairs the dynamics of two-sided markets within recommender systems. This overshadows the less visible but potentially intriguing long-tail items that could capture user interest. Despite the abundance of research surrounding this issue, it still poses challenges and remains a hot topic in academic circles. Along this line, in this paper, we developed a re-ranking appr… ▽ More Data bias, e.g., popularity impairs the dynamics of two-sided markets within recommender systems. This overshadows the less visible but potentially intriguing long-tail items that could capture user interest. Despite the abundance of research surrounding this issue, it still poses challenges and remains a hot topic in academic circles. Along this line, in this paper, we developed a re-ranking approach in dynamic settings with fair-exposure optimization driven by strategic agents. Designed for the producer side, the execution of agents assumes content creators can modify item features based on strategic incentives to maximize their exposure. This iterative process entails an end-to-end optimization, employing differentiable ranking operators that simultaneously target accuracy and fairness. Joint objectives ensure the performance of recommendations while enhancing the visibility of tail items. We also leveraged the performativity nature of predictions to illustrate how strategic learning influences content creators to shift towards fairness efficiently, thereby incentivizing features of tail items. Through comprehensive experiments on both public and industrial datasets, we have substantiated the effectiveness and dominance of the proposed method especially on unveiling the potential of tail items. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: SIGKDD 2024 accepted paper

arXiv:2406.16494 [pdf, other]

Cross-domain Transfer of Valence Preferences via a Meta-optimization Approach

Authors: Chuang Zhao, Hongke Zhao, Ming He, Xiaomeng Li, Jianping Fan

Abstract: Cross-domain recommendation offers a potential avenue for alleviating data sparsity and cold-start problems. Embedding and mapping, as a classic cross-domain research genre, aims to identify a common mapping function to perform representation transformation between two domains. Nevertheless, previous coarse-grained preference representations, non-personalized mapping functions, and excessive relia… ▽ More Cross-domain recommendation offers a potential avenue for alleviating data sparsity and cold-start problems. Embedding and mapping, as a classic cross-domain research genre, aims to identify a common mapping function to perform representation transformation between two domains. Nevertheless, previous coarse-grained preference representations, non-personalized mapping functions, and excessive reliance on overlapping users limit their performance, especially in scenarios where overlapping users are sparse. To address aforementioned challenges, we propose a novel cross-domain approach, namely CVPM. CVPM formalizes cross-domain interest transfer as a hybrid architecture of parametric meta-learning and self-supervised learning, which not only transfers user preferences at a finer level, but also enables signal enhancement with the knowledge of non-overlapping users. Specifically, with deep insights into user preferences and valence preference theory, we believe that there exists significant difference between users' positive preferences and negative behaviors, and thus employ differentiated encoders to learn their distributions. In particular, we further utilize the pre-trained model and item popularity to sample pseudo-interaction items to ensure the integrity of both distributions. To guarantee the personalization of preference transfer, we treat each user's mapping as two parts, the common transformation and the personalized bias, where the network used to generate the personalized bias is output by a meta-learner. Furthermore, in addition to the supervised loss for overlapping users, we design contrastive tasks for non-overlapping users from both group and individual-levels to avoid model skew and enhance the semantics of representations. Exhaustive data analysis and extensive experimental results demonstrate the effectiveness and advancement of our proposed framework. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16251 [pdf, other]

Probing critical spin fluctuations with a composite magnetoelectric method: A case study on a Kitaev spin liquid candidate Na$_3$Co$_2$SbO$_6$

Authors: Xinrun Mi, Xintong Li, Long Zhang, Aifeng Wang, Yuan Li, Yisheng Chai, Mingquan He

Abstract: In correlated quantum materials, divergent critical fluctuations near the quantum critical point are often closely associated with exotic quantum phases of matter, such as unconventional superconductivity and quantum spin liquids. Here we present a simple yet highly sensitive composite magnetoelectric (ME) method for detecting the critical spin fluctuations in quantum magnets. The ME signal is pro… ▽ More In correlated quantum materials, divergent critical fluctuations near the quantum critical point are often closely associated with exotic quantum phases of matter, such as unconventional superconductivity and quantum spin liquids. Here we present a simple yet highly sensitive composite magnetoelectric (ME) method for detecting the critical spin fluctuations in quantum magnets. The ME signal is proportional the magnetostriction coefficient, which directly probes the product of magnetization and spin-spin correlation. As a demonstration, the composite ME method is applied to a Kitaev quantum spin liquid candidate Na$_3$Co$_2$SbO$_6$, which shows signs of magnetic field-induced quantum criticality. Notably, the ME signal prominently diverges at the magnetic field-induced tricritical points, particularly at a tricritical point that lies in close proximity to a zero-temperature quantum critical point. A crucial aspect of these tricritical points is their tunability through the modification of the in-plane magnetic field's direction. The direction of magnetic field can thus serve as a handful yet important tuning parameter, alongside pressure and chemical doping, for searching quantum critical points in quantum magnets with pronounced magnetic anisotropy. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

arXiv:2406.15504 [pdf, other]

Dr.E Bridges Graphs with Large Language Models through Words

Authors: Zipeng Liu, Likang Wu, Ming He, Zhong Guan, Hongke Zhao, Nan Feng

Abstract: Significant efforts have been dedicated to integrating the powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of language, vision and audio data. However, the graph-structured data, which is inherently rich in structural and domain-specific knowledge, has not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw tex… ▽ More Significant efforts have been dedicated to integrating the powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of language, vision and audio data. However, the graph-structured data, which is inherently rich in structural and domain-specific knowledge, has not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffering the loss of graph structural information, or feed Graph Neural Network (GNN) embeddings into LLMs at the cost of losing explainable prompt semantics. To bridge this gap, we introduce an end-to-end modality-aligning framework for LLM-graph alignment: Dual-Residual Vector Quantized-Variational AutoEncoder, namely Dr.E. Our approach is purposefully designed to facilitate token-level alignment with LLMs, enabling an effective translation of the intrinsic `language' of graphs into comprehensible natural language. We also manage to enhance LLMs' more robust structural understanding of graphs by incorporating multiple views of the central nodes based on their surrounding nodes at various distances. Our experimental evaluations on standard graph tasks demonstrate competitive performance against other state-of-the-art (SOTA) approaches. Additionally, our framework ensures certain visual interpretability, efficiency, and robustness, marking the promising successful endeavor to achieve token-level alignment between LLMs and GNNs. Our code is available at: https://anonymous.4open.science/r/dre-817. △ Less

Submitted 27 August, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13250 [pdf, other]

LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological Modeling

Authors: Zhong Guan, Hongke Zhao, Likang Wu, Ming He, Jianpin Fan

Abstract: Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand… ▽ More Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand and process graph-structured data, fine-tuned LLMs perform even worse than some traditional GNN models on graph tasks, lacking inherent modeling capabilities for graph structures. Existing research overly emphasizes LLMs' understanding of semantic information captured by external models, while inadequately exploring graph topological structure modeling, thereby overlooking the genuine capabilities that LLMs lack. Consequently, in this paper, we introduce a new framework, LangTopo, which aligns graph structure modeling with natural language understanding at the token level. LangTopo quantifies the graph structure modeling capabilities of GNNs and LLMs by constructing a codebook for the graph modality and performs consistency maximization. This process aligns the text description of LLM with the topological modeling of GNN, allowing LLM to learn the ability of GNN to capture graph structures, enabling LLM to handle graph-structured data independently. We demonstrate the effectiveness of our proposed method on multiple datasets. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13235 [pdf, other]

Enhancing Collaborative Semantics of Language Model-Driven Recommendations via Graph-Aware Learning

Authors: Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan

Abstract: Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the ad… ▽ More Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the adequate capturing ability of collaborative information, existing modeling paradigms struggle to capture behavior patterns within community groups, leading to LLMs' ineffectiveness in discerning implicit interaction semantic in recommendation scenarios. To address this, we consider enhancing the learning capability of language model-driven recommendation models for structured data, specifically by utilizing interaction graphs rich in collaborative semantics. We propose a Graph-Aware Learning for Language Model-Driven Recommendations (GAL-Rec). GAL-Rec enhances the understanding of user-item collaborative semantics by imitating the intent of Graph Neural Networks (GNNs) to aggregate multi-hop information, thereby fully exploiting the substantial learning capacity of LLMs to independently address the complex graphs in the recommendation system. Sufficient experimental results on three real-world datasets demonstrate that GAL-Rec significantly enhances the comprehension of collaborative semantics, and improves recommendation performance. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 10pages

arXiv:2406.10988 [pdf, other]

Quantum coupon collector with mixed-state encoding

Authors: Jing-Peng Zhang, Min-Quan He, Dan-Bo Zhang

Abstract: The coupon collector is a prototypical model for evaluating the number of samples for identifying a set. By superposing all elements in the set as a pure quantum state, a quantum version of the coupon collector aims to learn the state, which is shown to reduce the sample complexity. Here we propose a quantum coupon collector by encoding the set into a mixed state, where the information of missing… ▽ More The coupon collector is a prototypical model for evaluating the number of samples for identifying a set. By superposing all elements in the set as a pure quantum state, a quantum version of the coupon collector aims to learn the state, which is shown to reduce the sample complexity. Here we propose a quantum coupon collector by encoding the set into a mixed state, where the information of missing elements are labelled with Pauli strings. Remarkably, the encoded mixed state has no quantum entangled state and is easy to prepare. With such mixed-state encoding, it can be efficient to learn the set by performing Bell measurements on two copies and then extracting the missing element by solving a series of equations obtained from the measurements. Our protocol further reduces the sample complexity from $O(n)$ in the case of pure-state encoding to $O(\log n)$ when the missing element is one, where $n$ is the number of elements in the set. The mixed-state encoding scheme provides a new avenue for quantum learning and enlarges the realm for exploring quantum advantages. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10638 [pdf, other]

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions

Authors: Yexin Liu, Zhengyang Liang, Yueze Wang, Muyang He, Jian Li, Bo Zhao

Abstract: Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in visual understanding and reasoning, providing sightly reasonable answers, such as image descriptions. This has spurred extensive research on the evaluation of MLLMs. Most evaluation benchmarks assume that incorrect answers indicate a lack of understanding of the visual content. However, our findings reveal that, in… ▽ More Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in visual understanding and reasoning, providing sightly reasonable answers, such as image descriptions. This has spurred extensive research on the evaluation of MLLMs. Most evaluation benchmarks assume that incorrect answers indicate a lack of understanding of the visual content. However, our findings reveal that, in many cases, MLLMs answer questions incorrectly despite correctly understanding the visual content. This suggests that incorrect answers do not necessarily imply a lack of comprehension but may instead result from lacking robustness to leading questions. To comprehensively measure MLLMs' understanding capability and robustness to leading questions, we introduce a MultiModal Robustness benchmark (MMR). MMR contains paired positive and negative questions across 12 categories, meticulously annotated by humans. We evaluate 18 leading MLLMs on the MMB benchmark, revealing that MLLMs suffer from fragility to leading questions despite understanding the visual content. To enhance MLLMs' understanding capability and robustness, we further present a training set with paired positive and negative visual question-answer samples. Experiments verify that MLLMs' robustness can be significantly enhanced by tuning on this new training set. The benchmark, training set, and code can be found at https://github.com/BAAI-DCAI/Multimodal-Robustness-Benchmark. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.09755 [pdf, other]

Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning

Authors: Xiaojun Bi, Mingjie He, Yiwen Sun

Abstract: Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but als… ▽ More Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but also hinders the vehicle's own normal driving in the long run. To address the aforementioned issue, this paper proposes a method named Mix Q-learning for Lane Changing(MQLC) that integrates a hybrid value Q network, taking into account both collective and individual benefits for the greater good. At the collective level, our method coordinates the individual Q and global Q networks by utilizing global information. This enables agents to effectively balance their individual interests with the collective benefit. At the individual level, we integrated a deep learning-based intent recognition module into our observation and enhanced the decision network. These changes provide agents with richer decision information and more accurate feature extraction for improved lane-changing decisions. This strategy enables the multi-agent system to learn and formulate optimal decision-making strategies effectively. Our MQLC model, through extensive experimental results, impressively outperforms other state-of-the-art multi-agent decision-making methods, achieving significantly safer and faster lane-changing decisions. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07546 [pdf, other]

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Authors: Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

Abstract: We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that align with commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2… ▽ More We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that align with commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2I models can conduct visual-commonsense reasoning, e.g. produce images that fit "the lightbulb is unlit" vs. "the lightbulb is lit" correspondingly. Commonsense-T2I presents an adversarial challenge, providing pairwise text prompts along with expected outputs. The dataset is carefully hand-curated by experts and annotated with fine-grained labels, such as commonsense type and likelihood of the expected outputs, to assist analyzing model behavior. We benchmark a variety of state-of-the-art (sota) T2I models and surprisingly find that, there is still a large gap between image synthesis and real life photos--even the DALL-E 3 model could only achieve 48.92% on Commonsense-T2I, and the stable diffusion XL model only achieves 24.92% accuracy. Our experiments show that GPT-enriched prompts cannot solve this challenge, and we include a detailed analysis about possible reasons for such deficiency. We aim for Commonsense-T2I to serve as a high-quality evaluation benchmark for T2I commonsense checking, fostering advancements in real life image generation. △ Less

Submitted 12 August, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: COLM 2024, Project Url: https://zeyofu.github.io/CommonsenseT2I/

arXiv:2406.05848 [pdf, other]

Nonlinear Interactions of Planetary-Scale Waves in Mesospheric Winds Observed at 52°N Latitude and Two Longitudes

Authors: Maosheng He, Jeffrey M. Forbes, Gunter Stober, Christoph Jacobi, Guozhu Li, Libo Liu, Jiyao Xu

Abstract: Nine years of mesospheric wind data from two meteor radars at 52°N latitude were analyzed to investigate planetary waves (PWs) and tides by estimating their zonal wavenumber through longitudinal phase differences. Our results reveal that PW normal modes (NMs) primarily drive multi-day oscillations, showing seasonal variability and statistical associations with Sudden Stratospheric Warming (SSW) ev… ▽ More Nine years of mesospheric wind data from two meteor radars at 52°N latitude were analyzed to investigate planetary waves (PWs) and tides by estimating their zonal wavenumber through longitudinal phase differences. Our results reveal that PW normal modes (NMs) primarily drive multi-day oscillations, showing seasonal variability and statistical associations with Sudden Stratospheric Warming (SSW) events. Specifically, a significant 6-day NM emerges in April, followed by predominant 4- and 2-day NMs until June, with peaks of 2-, 4-, and 6-day NMs spanning July to October. Furthermore, our study provides the first observational verification of frequency and zonal wavenumber of over ten secondary waves from nonlinear interactions among planetary-scale waves. One notable finding is the prevalence of non-migrating components in winter 24-hour and summer 8-hour tides, attributed to these nonlinear interactions. Our findings underscore the diverse nonlinear dynamics of planetary-scale waves, triggering a variety of periodic oscillations. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04180 [pdf, other]

Cogenesis by a sliding pNGB with symmetry non-restoration

Authors: Eung Jin Chun, Suruj Jyoti Das, Minxi He, Tae Hyun Jung, Jin Sun

Abstract: We show that a pseudo-Nambu-Goldstone boson (pNGB) with an initial misalignment angle can drive successful spontaneous baryogenesis, and become a good dark matter candidate if the corresponding global symmetry is non-restored at high temperatures. Considering a dimension-five explicit breaking operator, we find that the pNGB starts its motion with a sliding across rapidly decreasing potential barr… ▽ More We show that a pseudo-Nambu-Goldstone boson (pNGB) with an initial misalignment angle can drive successful spontaneous baryogenesis, and become a good dark matter candidate if the corresponding global symmetry is non-restored at high temperatures. Considering a dimension-five explicit breaking operator, we find that the pNGB starts its motion with a sliding across rapidly decreasing potential barriers during which the baryon asymmetry is generated and frozen, and later it oscillates as dark matter. It is predicted that the pNGB mass and decay constant are around $5\,{\rm eV}$ and $3\times10^6\,{\rm GeV}$, respectively, while the radial mode has a light mass $O(10)\,{\rm MeV}$ and a small mixing $O(10^{-4})$ with the Higgs boson. Applied to the Majoron in the type-I seesaw model, the heaviest right-handed neutrino is required to be as light as $100\,{\rm GeV}$. These predictions can be tested at kaon experiments, heavy neutral lepton searches, LHC, and future colliders. △ Less

Submitted 21 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 5 pages, 2 figures with supplemental material, v2: discussion on the isocurvature perturbation constraint and references added

Report number: CTPU-PTC-24-16

arXiv:2406.01993 [pdf]

Choroidal Vessel Segmentation on Indocyanine Green Angiography Images via Human-in-the-Loop Labeling

Authors: Ruoyu Chen, Ziwei Zhao, Mayinuer Yusufu, Xianwen Shang, Danli Shi, Mingguang He

Abstract: Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the st… ▽ More Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the study aims to develop a high-precision choroidal vessel segmentation model with limited labor using HITL framework. We utilized a multi-source ICGA dataset, including 55 degree view and ultra-widefield ICGA (UWF-ICGA) images for model development. The choroidal vessel network was pre-segmented by a pre-trained vessel segmentation model, and then manually modified by two ophthalmologists. Choroidal vascular diameter, density, complexity, tortuosity, and branching angle were automatically quantified based on the segmentation. We finally conducted four cycles of HITL. One hundred and fifty 55 degree view ICGA images were used for the first three cycles (50 images per cycle), and twenty UWF-ICGA images for the last cycle. The average time needed to manually correct a pre-segmented ICGA image per cycle reduced from 20 minutes to 1 minute. High segmentation accuracy has been achieved on both 55 degree view ICGA and UWF-ICGA images. Additionally, the multi-dimensional choroidal vascular parameters were significantly associated with various chorioretinal diseases. Our study not only demonstrated the feasibility of the HITL strategy in improving segmentation performance with reduced manual labeling, but also innovatively introduced several risk predictors for choroidal abnormalities. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 25 pages,4 figures

arXiv:2406.01435 [pdf, other]

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Authors: Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens

Abstract: Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels,… ▽ More Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.05236

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20659 [pdf]

Realization of a cold atom gyroscope in space

Authors: Jinting Li, Xi Chen, Danfang Zhang, Wenzhang Wang, Yang Zhou, Meng He, Jie Fang, Lin Zhou, Chuan He, Junjie Jiang, Huanyao Sun, Qunfeng Chen, Lei Qin, Xiao Li, Yibo Wang, Xiaowei Zhang, Jiaqi Zhong, Runbing Li, Meizhen An, Long Zhang, Shuquan Wang, Zongfeng Li, Jin Wang, Mingsheng Zhan

Abstract: High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this… ▽ More High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this AI. By applying point source interferometry, spatial fringes are obtained and acceleration and rotation are extracted. The angles of the Raman lasers are precisely calibrated to avoid measurement error, and other systematic errors are also considered for the rotation measurement. The evaluated rotation measurement is (-115.64+/-1.71)*10^-5 rad/s in space, and an acceleration measurement resolution of 1.03*10^-6 m/s^2 is also obtained for a single image. This study conducts the first AI-based gyroscope in space and paves a way for future space-based AI experiments. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 12 pages, 5 figures

arXiv:2405.20621 [pdf, other]

A critical comparison of the implementation of granular pressure gradient term in Euler-Euler simulation of gas-solid flows

Authors: Yige Liu, Mingming He, Jianhua Chen, Wen Li, Bidan Zhao, Ji Xu, Junwu Wang

Abstract: Numerical solution of Euler-Euler model using different in-house, open source and commercial software can generate significantly different results, even when the governing equations and the initial and boundary conditions are exactly same. Unfortunately, the underlying reasons have not been identified yet. In this article, three methods for calculating the granular pressure gradient term are prese… ▽ More Numerical solution of Euler-Euler model using different in-house, open source and commercial software can generate significantly different results, even when the governing equations and the initial and boundary conditions are exactly same. Unfortunately, the underlying reasons have not been identified yet. In this article, three methods for calculating the granular pressure gradient term are presented for two-fluid model of gas-solid flows and implemented implicitly or explicitly into the solver in OpenFOAM: Method I assumes that the granular pressure gradient is equal to the elastic modulus plus the solid concentration gradient; Method II directly calculates the gradient using a difference scheme; Method III, which is proposed in this work, calculates the gradient as the sum of two partial derivatives: one related to the solid volume fraction and the other related to the granular energy. Obviously, only Methods II and III are consistent with kinetic theory of granular flow. It was found that the difference between all methods is small for bubbling fluidization. While for circulating fluidization, both methods II and III are capable of capturing non-uniform structures and producing superior results over Method I. The contradictory conclusions made from the simulation of different fluidization regimes is due to the different contribution of the term related to the granular energy gradient. Present study concludes that the implementation method of granular pressure gradient may have a significant impact on hydrodynamics and is probably a key factor contributing to the observed differences between different simulation software. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.11338 [pdf]

EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

Authors: Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He

Abstract: Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separa… ▽ More Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separate weights for each imaging modality, preventing a comprehensive representation of multi-modal features. This highlights the need for versatile foundation models capable of handling various tasks and modalities in ophthalmology. To address this gap, we present EyeFound, a multimodal foundation model for ophthalmic images. Unlike existing models, EyeFound learns generalizable representations from unlabeled multimodal retinal images, enabling efficient model adaptation across multiple applications. Trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities, EyeFound facilitates generalist representations and diverse multimodal downstream tasks, even for detecting challenging rare diseases. It outperforms previous work RETFound in diagnosing eye diseases, predicting systemic disease incidents, and zero-shot multimodal VQA. EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging. △ Less

Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: 21 pages, 2 figures, 4 tables

arXiv:2405.11236 [pdf, other]

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

Authors: Chengcheng Feng, Mu He, Qiuyu Tian, Haojie Yin, Xiaofang Zhao, Hongwei Tang, Xingqiang Wei

Abstract: As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process.… ▽ More As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process. In response to these challenges, we propose an innovative method that integrates Singular Value Decomposition (SVD) into the Low-Rank Adaptation (LoRA) parameter update strategy, aimed at enhancing the fine-tuning efficiency and output quality of image generation models. By incorporating SVD within the LoRA framework, our method not only effectively reduces the risk of overfitting but also enhances the stability of model outputs, and captures subtle, creator-desired feature adjustments more accurately. We evaluated our method on multiple datasets, and the results show that, compared to traditional fine-tuning methods, our approach significantly improves the model's generalization ability and creative flexibility while maintaining the quality of generation. Moreover, this method maintains LoRA's excellent performance under resource-constrained conditions, allowing for significant improvements in image generation quality without sacrificing the original efficiency and resource advantages. △ Less

Submitted 13 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.10739 [pdf, other]

Efficient Multimodal Large Language Models: A Survey

Authors: Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

Abstract: In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, e… ▽ More In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, we summarize the timeline of representative efficient MLLMs, research state of efficient structures and strategies, and the applications. Finally, we discuss the limitations of current efficient MLLM research and promising future directions. Please refer to our GitHub repository for more details: https://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey. △ Less

Submitted 9 August, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10676 [pdf, other]

Identifying L-H transition in HL-2A through deep learning

Authors: Meihuizi He, Songfen Liu, Fan Xia, Zongyu Yang, Wulyu Zhong

Abstract: During the operation of tokamak devices, addressing the thermal load issues caused by Edge Localized Modes (ELMs) eruption is crucial. Ideally, mitigation and suppression measures for ELMs should be promptly initiated as soon as the first low-to-high confinement (L-H) transition occurs, which necessitates the real-time monitoring and accurate identification of the L-H transition process. Motivated… ▽ More During the operation of tokamak devices, addressing the thermal load issues caused by Edge Localized Modes (ELMs) eruption is crucial. Ideally, mitigation and suppression measures for ELMs should be promptly initiated as soon as the first low-to-high confinement (L-H) transition occurs, which necessitates the real-time monitoring and accurate identification of the L-H transition process. Motivated by this, and by recent deep learning boom, we propose a deep learning-based L-H transition identification algorithm on HL-2A tokamak. In this work, we have constructed a neural network comprising layers of Residual Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN). Unlike previous work based on recognition for ELMs by slice, this method implements recognition on L-H transition process before the first ELMs crash. Therefore the mitigation techniques can be triggered in time to suppress the initial ELMs bursts. In order to further explain the effectiveness of the algorithm, we developed a series of evaluation indicators by shots, and the results show that this algorithm can provide necessary reference for the mitigation and suppression system. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09059 [pdf, other]

Task-adaptive Q-Face

Authors: Haomiao Sun, Mingjie He, Shiguang Shan, Hu Han, Xilin Chen

Abstract: Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging. Most face analysis tasks are studied as separate problems and do not benefit from the synergy among related tasks. In this work, we propose a novel task-adaptive multi-task face analysis method named as Q-Face, which simultaneously performs multiple fa… ▽ More Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging. Most face analysis tasks are studied as separate problems and do not benefit from the synergy among related tasks. In this work, we propose a novel task-adaptive multi-task face analysis method named as Q-Face, which simultaneously performs multiple face analysis tasks with a unified model. We fuse the features from multiple layers of a large-scale pre-trained model so that the whole model can use both local and global facial information to support multiple tasks. Furthermore, we design a task-adaptive module that performs cross-attention between a set of query vectors and the fused multi-stage features and finally adaptively extracts desired features for each face analysis task. Extensive experiments show that our method can perform multiple tasks simultaneously and achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation. Compared to conventional methods, our method opens up new possibilities for multi-task face analysis and shows the potential for both accuracy and efficiency. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Ever submitted to ECCV2024

Showing 1–50 of 714 results for author: He, M