Search | arXiv e-print repository

A hidden AGN powering bright [O III] nebulae in a protocluster core at $z=4.5$ revealed by JWST

Authors: M. Solimano, J. González-López, M. Aravena, B. Alcalde Pampliega, R. J. Assef, M. Béthermin, M. Boquien, S. Bovino, C. M. Casey, P. Cassata, E. da Cunha, R. L. Davies, I. De Looze, X. Ding, T. Díaz-Santos, A. L. Faisst, A. Ferrara, D. B. Fisher, N. M. Förster-Schreiber, S. Fujimoto, M. Ginolfi, C. Gruppioni, L. Guaita, N. Hathi, R. Herrera-Camus , et al. (26 additional authors not shown)

Abstract: We present new JWST/NIRSpec IFU observations of the J1000+0234 system at $z=4.54$, the dense core of a galaxy protocluster hosting a massive, dusty star forming galaxy (DSFG) with a low luminosity radio counterpart. The new data reveals two extended, high equivalent width (EW$_0 > 1000$ Å) nebulae at each side of the DSFG disk along its minor axis (namely O3-N and O3-S). On one hand, O3-N's spectr… ▽ More We present new JWST/NIRSpec IFU observations of the J1000+0234 system at $z=4.54$, the dense core of a galaxy protocluster hosting a massive, dusty star forming galaxy (DSFG) with a low luminosity radio counterpart. The new data reveals two extended, high equivalent width (EW$_0 > 1000$ Å) nebulae at each side of the DSFG disk along its minor axis (namely O3-N and O3-S). On one hand, O3-N's spectrum shows a prominent FWHM $\sim1300$ km s$^{-1}$ broad and blueshifted component, suggesting an outflow origin. On the other hand, O3-S stretches over parsec and has a velocity gradient that spans $800$ km s$^{-1}$ but no evidence of a broad component. Both sources, however, seem to be powered at least partially by an active galactic nucleus (AGN), so we classify them as extended emission-line regions (EELRs). The strongest evidence comes from the detection of the high-ionization [Ne V] $\lambda3427$ line toward O3-N, which paired with the non-detection of hard X-rays implies an obscuring column density above the Compton-thick regime. In O3-S, the [Ne V] line is not detected, but we measure a He II well above the expectation for star formation. We interpret this as O3-S being externally irradiated by the AGN, akin to the famous Hanny's Voorwerp object in the local Universe. In addition, more classical line ratio diagnostics (e.g. [O III]/H$β$ vs [N II]/H$α$) put the DSFG itself in the AGN region of the diagrams, and hence the most probable host of the AGN. These results showcase the ability of JWST of unveiling highly obscured AGN at high redshifts. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures plus 5 appendices (incl. 3 extra figures and one table). Submitted to A&A on July 17th 2024

arXiv:2407.12888 [pdf]

Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models

Authors: Alexander R. Pelletier, Joseph Ramirez, Irsyad Adam, Simha Sankar, Yu Yan, Ding Wang, Dylan Steinecke, Wei Wang, Peipei Ping

Abstract: The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial… ▽ More The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial for achieving accurate information. In this protocol, we present RUGGED (Retrieval Under Graph-Guided Explainable disease Distinction), a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation, identifying validated paths forward. Relevant biomedical information from publications and knowledge bases are reviewed, integrated, and extracted via text-mining association analysis and explainable graph prediction models on disease nodes, forecasting potential links among drugs and diseases. These analyses, along with biomedical texts, are integrated into a framework that facilitates user-directed mechanism elucidation as well as hypothesis exploration through RAG-enabled LLMs. A clinical use-case demonstrates RUGGED's ability to evaluate and recommend therapeutics for Arrhythmogenic Cardiomyopathy (ACM) and Dilated Cardiomyopathy (DCM), analyzing prescribed drugs for molecular interactions and unexplored uses. The platform minimizes LLM hallucinations, offers actionable insights, and improves the investigation of novel therapeutics. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12867 [pdf, other]

Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run

Authors: Gayathri Raman, Samuele Ronchini, James Delaunay, Aaron Tohuvavohu, Jamie A. Kennea, Tyler Parsotan, Elena Ambrosi, Maria Grazia Bernardini, Sergio Campana, Giancarlo Cusumano, Antonino D'Ai, Paolo D'Avanzo, Valerio D'Elia, Massimiliano De Pasquale, Simone Dichiara, Phil Evans, Dieter Hartmann, Paul Kuin, Andrea Melandri, Paul O'Brien, Julian P. Osborne, Kim Page, David M. Palmer, Boris Sbarufatti, Gianpiero Tagliaferri , et al. (1797 additional authors not shown)

Abstract: We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav… ▽ More We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 50 pages, 10 figures, 4 tables

arXiv:2407.12842 [pdf, other]

MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

Authors: Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

Abstract: Sign language understanding has made significant strides; however, there is still no viable solution for generating sign sequences directly from entire spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users. In particular, a sequence diffusion model, utilizing embeddi… ▽ More Sign language understanding has made significant strides; however, there is still no viable solution for generating sign sequences directly from entire spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users. In particular, a sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step. Moreover, by creating a joint embedding space for text, audio, and sign, we bind these modalities and leverage the semantic consistency among them to provide informative feedback for the model training. This embedding-consistency learning strategy minimizes the reliance on sign triplets and ensures continuous model refinement, even with a missing audio modality. Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted to ACL 2024 Findings; Project Page: https://hechang25.github.io/MS2SL

arXiv:2407.12797 [pdf, other]

CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines

Authors: Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai

Abstract: Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities. However, due to data-sharing restrictions, sectors such as healthcare and finance prefer to deploy local LLM applications using costly hardware resources. This scenario requires a balance between the effectiveness advantages… ▽ More Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities. However, due to data-sharing restrictions, sectors such as healthcare and finance prefer to deploy local LLM applications using costly hardware resources. This scenario requires a balance between the effectiveness advantages of LLMs and significant financial burdens. Additionally, the rapid evolution of models increases the frequency and redundancy of benchmarking efforts. Existing benchmarking toolkits, which typically focus on effectiveness, often overlook economic considerations, making their findings less applicable to practical scenarios. To address these challenges, we introduce CEBench, an open-source toolkit specifically designed for multi-objective benchmarking that focuses on the critical trade-offs between expenditure and effectiveness required for LLM deployments. CEBench allows for easy modifications through configuration files, enabling stakeholders to effectively assess and optimize these trade-offs. This strategic capability supports crucial decision-making processes aimed at maximizing effectiveness while minimizing cost impacts. By streamlining the evaluation process and emphasizing cost-effectiveness, CEBench seeks to facilitate the development of economically viable AI solutions across various industries and research fields. The code and demonstration are available in \url{https://github.com/amademicnoboday12/CEBench}. △ Less

Submitted 20 June, 2024; originally announced July 2024.

arXiv:2407.12475 [pdf, other]

Amplitude analysis of $B^+ \to ψ(2S) K^+ π^+ π^-$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1092 additional authors not shown)

Abstract: The first full amplitude analysis of $B^+ \to ψ(2S) K^+ π^+ π^-$ decays is performed using proton-proton collision data corresponding to an integrated luminosity of $9\,\text{fb}^{-1}$ recorded with the LHCb detector. The rich $K^+ π^+ π^-$ spectrum is studied and the branching fractions of the resonant substructure associated with the prominent $K_1(1270)^+$ contribution are measured. The data ca… ▽ More The first full amplitude analysis of $B^+ \to ψ(2S) K^+ π^+ π^-$ decays is performed using proton-proton collision data corresponding to an integrated luminosity of $9\,\text{fb}^{-1}$ recorded with the LHCb detector. The rich $K^+ π^+ π^-$ spectrum is studied and the branching fractions of the resonant substructure associated with the prominent $K_1(1270)^+$ contribution are measured. The data cannot be described by conventional strange and charmonium resonances only. An amplitude model with 53 components is developed comprising 11 hidden-charm exotic hadrons. New production mechanisms for charged charmonium-like states are observed. Significant resonant activity with spin-parity $J^P = 1^+$ in the $ψ(2S) π^+$ system is confirmed and a multi-pole structure is demonstrated. The spectral decomposition of the $ψ(2S) π^+ π^-$ invariant-mass structure, dominated by $X^0 \to ψ(2S) ρ(770)^0$ decays, broadly resembles the $J/ψφ$ spectrum observed in $B^+ \to J/ψφK^+$ decays. Exotic $ψ(2S) K^+ π^-$ resonances are observed for the first time. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-014.html (LHCb public pages)

Report number: LHCb-PAPER-2024-014, CERN-EP-2024-177

arXiv:2407.12334 [pdf, other]

Cabin: Confining Untrusted Programs within Confidential VMs

Authors: Benshan Mei, Saisai Xia, Wenhao Wang, Dongdai Lin

Abstract: Confidential computing safeguards sensitive computations from untrusted clouds, with Confidential Virtual Machines (CVMs) providing a secure environment for guest OS. However, CVMs often come with large and vulnerable operating system kernels, making them susceptible to attacks exploiting kernel weaknesses. The imprecise control over the read/write access in the page table has allowed attackers to… ▽ More Confidential computing safeguards sensitive computations from untrusted clouds, with Confidential Virtual Machines (CVMs) providing a secure environment for guest OS. However, CVMs often come with large and vulnerable operating system kernels, making them susceptible to attacks exploiting kernel weaknesses. The imprecise control over the read/write access in the page table has allowed attackers to exploit vulnerabilities. The lack of security hierarchy leads to insufficient separation between untrusted applications and guest OS, making the kernel susceptible to direct threats from untrusted programs. This study proposes Cabin, an isolated execution framework within guest VM utilizing the latest AMD SEV-SNP technology. Cabin shields untrusted processes to the user space of a lower virtual machine privilege level (VMPL) by introducing a proxy-kernel between the confined processes and the guest OS. Furthermore, we propose execution protection mechanisms based on fine-gained control of VMPL privilege for vulnerable programs and the proxy-kernel to minimize the attack surface. We introduce asynchronous forwarding mechanism and anonymous memory management to reduce the performance impact. The evaluation results show that the Cabin framework incurs a modest overhead (5% on average) on Nbench and WolfSSL benchmarks. △ Less

Submitted 17 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: ICICS 2024

arXiv:2407.12270 [pdf, other]

Observation of $Λ_c^+ \to Λa_0(980)^+$ and Evidence for $Σ(1380)^+$ in $Λ_c^+ \to Λπ^+ η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Based on $6.1~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at center-of-mass energies from 4.600~GeV to 4.843~GeV with the BESIII detector at the BEPCII collider, a partial wave analysis of $Λ_c^+\toΛπ^+η$ is performed, and branching fractions and decay asymmetry parameters of intermediate processes are determined. The process $Λ_c^+\toΛa_0(980)^+$ is observed for the first time, and… ▽ More Based on $6.1~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at center-of-mass energies from 4.600~GeV to 4.843~GeV with the BESIII detector at the BEPCII collider, a partial wave analysis of $Λ_c^+\toΛπ^+η$ is performed, and branching fractions and decay asymmetry parameters of intermediate processes are determined. The process $Λ_c^+\toΛa_0(980)^+$ is observed for the first time, and evidence for the pentaquark candidate $Σ(1380)^+$ decaying into $Λπ^+$ is found with statistical significance larger than $3σ$. The branching fraction product $\mathcal{B}(Λ_{c}^{+} \to Λa_0(980)^+) \; \mathcal{B}( a_0(980)^+ \to π^{+}η)$ is determined to be $(1.05 \pm 0.16_{\mathrm{stat}} \pm 0.05_{\mathrm{syst}} \pm 0.07_{\mathrm{ext}})\%$, which is larger than theoretical calculations by $1 - 2$ orders of magnitude. Here the third (external) systematic is from $\mathcal{B}(Λ_{c}^{+} \to Λπ^+ η)$. Finally, we precisely obtain the absolute branching fraction $\mathcal{B}(Λ_{c}^{+} \to Λπ^+ η) = (1.94 \pm 0.07_{\mathrm{stat}} \pm 0.11_{\mathrm{syst}})\%$. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 16 pages, 8 figures

arXiv:2407.12180 [pdf, other]

A UAV-assisted Wireless Localization Challenge on AERPAW

Authors: Paul Kudyba, Jaya Sravani Mandapaka, Weijie Wang, Logan McCorkendale, Zachary McCorkendale, Mathias Kidane, Haijian Sun, Eric Adams, Kamesh Namuduri, Fraida Fund, Mihail Sichitiu, Ozgur Ozdemir

Abstract: As wireless researchers are tasked to enable wireless communication as infrastructure in more dynamic aerial settings, there is a growing need for large-scale experimental platforms that provide realistic, reproducible, and reliable experimental validation. To bridge the research-to-implementation gap, the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) offers open-sour… ▽ More As wireless researchers are tasked to enable wireless communication as infrastructure in more dynamic aerial settings, there is a growing need for large-scale experimental platforms that provide realistic, reproducible, and reliable experimental validation. To bridge the research-to-implementation gap, the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) offers open-source tools, reference experiments, and hardware to facilitate and evaluate the development of wireless research in controlled digital twin environments and live testbed flights. The inaugural AERPAW Challenge, "Find a Rover," was issued to spark collaborative efforts and test the platform's capabilities. The task involved localizing a narrowband wireless signal, with teams given ten minutes to find the "rover" within a twenty-acre area. By engaging in this exercise, researchers can validate the platform's value as a tool for innovation in wireless communications research within aerial robotics. This paper recounts the methods and experiences of the top three teams in automating and rapidly locating a wireless signal by automating and controlling an aerial drone in a realistic testbed scenario. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: submitted to IEEE magazine paper

arXiv:2407.11787 [pdf, other]

Delayed luminescence and thermoluminescence in laboratory-grown diamonds

Authors: Jiahui Zhao, Ben L. Green, Ben G. Breeze, Hengxin Yuan, Troy Ardon, Wuyi Wang, Mark E. Newton

Abstract: The blue-green phosphorescence/thermoluminescence is most commonly observed in diamonds following excitation at or above the indirect band gap and has been explained by a substitutional nitrogen-boron donor-acceptor pair recombination model. Orange and red phosphorescence have also been frequently observed in lab-grown near-colourless high-pressure high-temperature diamonds following optical excit… ▽ More The blue-green phosphorescence/thermoluminescence is most commonly observed in diamonds following excitation at or above the indirect band gap and has been explained by a substitutional nitrogen-boron donor-acceptor pair recombination model. Orange and red phosphorescence have also been frequently observed in lab-grown near-colourless high-pressure high-temperature diamonds following optical excitation, and their luminescence mechanisms are shown to be different from that of the blue-green phosphorescence. The physics of the orange and red luminescence and phosphorescence bands including the optical-excitation dependency (UV-NIR), temperature dependency (20 - 573 K), and related charge transfer process are investigated by a combination of self-built time-resolved imaging/spectroscopic techniques. In this paper, an alternative model for long-lived phosphorescence based on charge trapping is proposed to explain the orange phosphorescence/ thermoluminescence band. Additionally, the red phosphorescence band are attributed to point defect which possibly has a three-level phosphorescence system. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 11 pages, 9 figures

arXiv:2407.11745 [pdf, other]

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Authors: Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Abstract: Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we… ▽ More Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we propose integrating a self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into a universal sound separation system to enhance its separation performance. We employ two strategies to utilize SSL embeddings: freezing or updating the parameters of A-MAE during fine-tuning. The SSL embeddings are concatenated with the short-time Fourier transform (STFT) to serve as input features for the separation model. We evaluate our methods on the AudioSet dataset, and the experimental results indicate that the proposed methods successfully enhance the separation performance of a state-of-the-art ResUNet-based USS model. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11727 [pdf, ps, other]

Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(0.547\pm0.026_{\rm stat}\pm0.016_{\rm syst})\%$ a… ▽ More Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(0.547\pm0.026_{\rm stat}\pm0.016_{\rm syst})\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(5.60\pm0.16_{\rm stat}\pm0.20_{\rm syst})\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(246.5\pm5.9_{\rm stat}\pm3.6_{\rm syst}\pm0.5_{\rm input})_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(252.7\pm3.6_{\rm stat}\pm4.5_{\rm syst}\pm0.6_{\rm input}))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(252.8\pm6.0_{\rm stat}\pm3.7_{\rm syst}\pm0.6_{\rm input})_{μν}$ MeV and ${f_{D^+_s}}=(259.2\pm3.6_{\rm stat}\pm4.5_{\rm syst}\pm0.6_{\rm input})_{τν}$ MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(0.986\pm0.023_{\rm stat}\pm0.014_{\rm syst}\pm0.003_{\rm input})_{μν}$ and $|V_{cs}| = (1.011\pm0.014_{\rm stat}\pm0.018_{\rm syst}\pm0.003_{\rm input})_{τν}$, respectively. △ Less

Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 13 figures

arXiv:2407.11721 [pdf, other]

Inferring the mass content of galaxy clusters with satellite kinematics and Jeans Anisotropic modeling

Authors: Rui Shi, Wenting Wang, Zhaozhou Li, Ling Zhu, Alexander Smith, Shaun Cole, Hongyu Gao, Xiaokai Chen, Qingyang Li, Jiaxin Han

Abstract: Satellite galaxies can be used to indicate the dynamical mass of galaxy groups and clusters. In this study, we apply the axis-symmetric Jeans Anisotropic Multi-Gaussian Expansion JAM modeling to satellite galaxies in 28 galaxy clusters selected from the TNG300-1 simulation with halo mass of $\log_{10}M_{200}/M_\odot>14.3$. If using true bound satellites as tracers, the best constrained total mass… ▽ More Satellite galaxies can be used to indicate the dynamical mass of galaxy groups and clusters. In this study, we apply the axis-symmetric Jeans Anisotropic Multi-Gaussian Expansion JAM modeling to satellite galaxies in 28 galaxy clusters selected from the TNG300-1 simulation with halo mass of $\log_{10}M_{200}/M_\odot>14.3$. If using true bound satellites as tracers, the best constrained total mass within the half-mass radius of satellites, $M(<r_\mathrm{half})$, and the virial mass, $M_{200}$, have average biases of -0.01 and $0.03$~dex, with average scatters of 0.11~dex and 0.15~dex. If selecting companions in redshift space with line-of-sight depth of 2,000~km/s, the biases are -0.06 and $0.01$~dex, while the scatters are 0.12 and 0.18~dex for $M(<r_\mathrm{half})$ and $M_{200}$. By comparing the best-fitting and actual density profiles, we find $\sim$29% of best-fitting density profiles show very good agreement with the truth, $\sim$32% display over or under estimates at most of the radial range with biased $M(<r_\mathrm{half})$, and 39% show under/over estimates in central regions and over/under estimates in the outskirts, with good constraints on $M(<r_\mathrm{half})$, yet most of the best constraints are still consistent with the true profiles within 1-$σ$ statistical uncertainties for the three circumstances. Using a mock DESI Bright Galaxy Survey catalog with the effect of fiber incompleteness, we find DESI fiber assignments and the choice of flux limits barely modify the velocity dispersion profiles and are thus unlikely to affect the dynamical modeling outcomes. Our results show that with current and future deep spectroscopic surveys, JAM can be a powerful tool to constrain the underlying density profiles of individual massive galaxy clusters. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: accepted by ApJ

arXiv:2407.11610 [pdf, other]

MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

Authors: Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei

Abstract: This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by… ▽ More This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by directly forming the face from points. Nevertheless, the challenge of selecting appropriate faces from enormous candidates often leads to undesirable faces and holes. Moreover, the reconstruction performance of both approaches tends to degrade when the point cloud gets sparse. To this end, we propose MEsh Reconstruction via edGE~(MergeNet), which converts mesh reconstruction into local connectivity prediction problems. Specifically, MergeNet learns to extract the features of candidate edges and regress their distances to the underlying surface. Consequently, the predicted distance is utilized to filter out edges that lay on surfaces. Finally, the meshes are reconstructed by refining the triangulations formed by these edges. Extensive experiments on synthetic and real-scanned datasets demonstrate the superiority of MergeNet to SoTA explicit methods. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11474 [pdf, other]

Search for the rare $Λ_c^+ \to p μ^+ μ^-$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branchi… ▽ More A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branching fraction of the $Λ_c^+ \to p μ^+ μ^-$ decay is determined to be $2.9~(3.2) \times 10^{-8}$ at 90% (95%) confidence level. The branching fractions in the dimuon invariant-mass regions dominated by the $η$, $ρ$ and $ω$ resonances are also determined. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-005.html (LHCb public pages)

Report number: LHCb-PAPER-2024-005, CERN-EP-2024-158

arXiv:2407.11410 [pdf, other]

High-energy neutrino emission from tidal disruption event outflow-cloud interactions

Authors: Hanji Wu, Kai Wang, Wei Wang

Abstract: Tidal disruption events (TDEs), characterized by their luminous transients and high-velocity outflows, have emerged as plausible sources of high-energy neutrinos contributing to the diffuse neutrino. In this study, we calculate the contribution of TDEs to the diffuse neutrino by employing the outflow-cloud model within the TDE framework. Our analysis indicates that the contribution of TDEs becomes… ▽ More Tidal disruption events (TDEs), characterized by their luminous transients and high-velocity outflows, have emerged as plausible sources of high-energy neutrinos contributing to the diffuse neutrino. In this study, we calculate the contribution of TDEs to the diffuse neutrino by employing the outflow-cloud model within the TDE framework. Our analysis indicates that the contribution of TDEs becomes negligible when the redshift $Z$ exceeds 2. Employing a set of fiducial values, which includes outflow energy $E_{\rm kin}=10^{51}$ erg, a proton spectrum cutoff energy $E_{\rm p,max}=100$ PeV, a volume TDE rate $\dot{N}=8 \times 10^{-7}\ \rm Mpc^{-3}\ year^{-1}$, covering fraction of clouds $C_V=0.1$, energy conversion efficiency in the shock $η=0.1$, and a proton spectrum index $Γ=-1.7$, we find that TDEs can account for approximately 80\% of the contribution at energies around 0.3 PeV. Additionally, TDEs still contribute around 18\% to the IceCube data below 0.1 PeV and the total contribution is $\sim 24^{+2}_{-15}\%$. In addition, we also discuss the potential influence of various parameter values on the results in detail. With the IceCube data, we impose constraints on the combination of the physical parameters, i.e., $C_{f}=\dot{N}E_{\rm kin}C_{\rm v}η$. Future observations or theoretical considerations would fix some physical parameters, which will help to constrain some individual parameters of TDEs. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 12 pages, 10 figures, accept for the publication in PRD

arXiv:2407.10892 [pdf, other]

First Measurement of Solar $^8$B Neutrino Flux through Coherent Elastic Neutrino-Nucleus Scattering in PandaX-4T

Authors: PandaX Collaboration, Zihao Bo, Wei Chen, Xun Chen, Yunhua Chen, Zhaokan Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Zhixing Gao, Lisheng Geng, Karl Giboni, Xunan Guo, Xuyuan Guo, Zichao Guo, Chencheng Han, Ke Han, Changda He, Jinrong He, Di Huang, Houqi Huang, Junting Huang, Ruquan Hou, Yu Hou, Xiangdong Ji , et al. (77 additional authors not shown)

Abstract: The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (… ▽ More The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (0.33 keV) nuclear recoil energy. Combining the commissioning run and the first science run of PandaX-4T, a total exposure of 1.25 and 1.04 tonne$\cdot$year are collected for the paired and US2, respectively. After unblinding, 3 and 332 events are observed with an expectation of 2.8$\pm$0.5 and 251$\pm$32 background events, for the paired and US2 data, respectively. A combined analysis yields a best-fit $^8$B neutrino signal of 3.5 (75) events from the paired (US2) data sample, with $\sim$37\% uncertainty, and the background-only hypothesis is disfavored at 2.64$σ$ significance. This gives a solar $^8$B neutrino flux of ($8.4\pm3.1$)$\times$10$^6$ cm$^{-2}$s$^{-1}$, consistent with the standard solar model prediction. This is the first indication of solar $^8$B neutrino ``fog'' in a dark matter direct detection experiment. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10540 [pdf, other]

Sudden polarization angle jumps of the repeating fast radio burst FRB 20201124A

Authors: J. R. Niu, W. Y. Wang, J. C. Jiang, Y. Qu, D. J. Zhou, W. W. Zhu, K. J. Lee, J. L. Han, B. Zhang, D. Li, S. Cao, Z. Y. Fang, Y. Feng, Q. Y. Fu, P. Jiang, W. C. Jing, J. Li, Y. Li, R. Luo, L. Q. Meng, C. C. Miao, X. L. Miao, C. H. Niu, Y. C. Pan, B. J. Wang , et al. (19 additional authors not shown)

Abstract: We report the first detection of polarization angle (PA) orthogonal jumps, a phenomenon previously only observed from radio pulsars, from a fast radio burst (FRB) source FRB 20201124A. We find three cases of orthogonal jumps in over two thousand bursts, all resembling those observed in pulsar single pulses. We propose that the jumps are due to the superposition of two orthogonal emission modes tha… ▽ More We report the first detection of polarization angle (PA) orthogonal jumps, a phenomenon previously only observed from radio pulsars, from a fast radio burst (FRB) source FRB 20201124A. We find three cases of orthogonal jumps in over two thousand bursts, all resembling those observed in pulsar single pulses. We propose that the jumps are due to the superposition of two orthogonal emission modes that could only be produced in a highly magnetized plasma, and they are caused by the line of sight sweeping across a rotating magnetosphere. The shortest jump timescale is of the order of one-millisecond, which hints that the emission modes come from regions smaller than the light cylinder of most pulsars or magnetars. This discovery provides convincing evidence that FRB emission originates from the complex magnetosphere of a magnetar, suggesting an FRB emission mechanism that is analogous to radio pulsars despite a huge luminosity difference between two types of objects. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 10 pages, 5 figures, submitted to APJL

arXiv:2407.10404 [pdf, ps, other]

On the higher-rank Askey-Wilson algebras

Authors: Wanxia Wang, Shilin Yang

Abstract: In the paper, a new algebra ${\mathcal A}(n)$, which is generated by an upper triangular generating matrix with triple relations, is introduced. It is shown that there exists an isomorphism between the algebra ${\mathcal A}(n)$ and the higher Askey-Wilson algebra ${\mathfrak{aw}}(n)$ introduced by Crampé, Frappat et al. Furthermore, we establish a series of automorphisms of ${\mathcal A}(n),$ whic… ▽ More In the paper, a new algebra ${\mathcal A}(n)$, which is generated by an upper triangular generating matrix with triple relations, is introduced. It is shown that there exists an isomorphism between the algebra ${\mathcal A}(n)$ and the higher Askey-Wilson algebra ${\mathfrak{aw}}(n)$ introduced by Crampé, Frappat et al. Furthermore, we establish a series of automorphisms of ${\mathcal A}(n),$ which satisfy braid group relations and coincide with those in ${\mathfrak{aw}}(n).$ △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 36 pages

MSC Class: 16T10; 33D45; 81R12

arXiv:2407.10373 [pdf, other]

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Authors: Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

Abstract: Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired da… ▽ More Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired data. In this paper, we introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks and overcome data scarcity. Furthermore, we employ the diffusion model as foundational conditional converters to circumvent the training instability and over-smoothing drawbacks of conventional GAN architectures. Specifically, MVSD employs two converters: one for VAM called reverberator and one for dereverberation called dereverberator. The dereverberator judges whether the reverberation audio generated by reverberator sounds like being in the conditional visual scenario, and vice versa. By forming a closed loop, these two converters can generate informative feedback signals to optimize the inverse tasks, even with easily acquired one-way unpaired data. Extensive experiments on two standard benchmarks, i.e., SoundSpaces-Speech and Acoustic AVSpeech, exhibit that our framework can improve the performance of the reverberator and dereverberator and better match specified visual scenarios. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://hechang25.github.io/MVSD

arXiv:2407.10324 [pdf, other]

Stability and dynamics of massive vortices in two-component Bose-Einstein condensates

Authors: J. D'Ambroise, W. Wang, C. Ticknor, R. Carretero-González, P. G. Kevrekidis

Abstract: The study of structures involving vortices in one component and bright solitary waves in another has a time-honored history in two-component atomic Bose-Einstein condensates. In the present work, we revisit this topic extending considerations well-past the near-integrable regime of nearly equal scattering lengths. Instead, we focus on stationary states and spectral stability of such structures for… ▽ More The study of structures involving vortices in one component and bright solitary waves in another has a time-honored history in two-component atomic Bose-Einstein condensates. In the present work, we revisit this topic extending considerations well-past the near-integrable regime of nearly equal scattering lengths. Instead, we focus on stationary states and spectral stability of such structures for large values of the inter-component interaction coefficient. We find that the state can manifest dynamical instabilities for suitable parameter values. We also explore a phenomenological, yet quantitatively accurate upon suitable tuning, particle model which, in line also with earlier works, offers the potential of accurately following the associated stability and dynamical features. Finally, we probe the dynamics of the unstable vortex-bright structure, observing an unprecedented, to our knowledge, instability scenario in which the oscillatory instability leads to a patch of vorticity that harbors and eventually ejects multiple vortex-bright structures. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10271 [pdf, other]

Building holographic code from the boundary

Authors: Wei Wang

Abstract: Holographic quantum error-correcting code, the quantum-information structure hypothesized for the AdS/CFT correspondence, has being attracting increasing attention in new directions interrelating the studies of quantum gravity and quantum simulation. In this work, we initiate a novel approach for building holographic code that can be generally applied in potentially broad and interdisciplinary con… ▽ More Holographic quantum error-correcting code, the quantum-information structure hypothesized for the AdS/CFT correspondence, has being attracting increasing attention in new directions interrelating the studies of quantum gravity and quantum simulation. In this work, we initiate a novel approach for building holographic code that can be generally applied in potentially broad and interdisciplinary contexts. Our approach takes an "opposite" route to the conventional paradigm that is based on bulk tensor-networks. As illustrated in an exact model, we start from scalable descriptions of boundary qudits which can guide succinct quantum-circuit simulations, and rigorously show how the bulk qudits and the encoding structure emerge from boundary entanglement patterns. By investigating the entanglement patterns, we systematically unfold the hypothetical structure for bulk reconstruction and the details of the Ryu-Takayanagi formula in the formalism of operator-algebra quantum error correction, demonstrating desired properties that are not yet proved in the established models. Our work might offer a fresh perspective for the study of holographic code. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10213 [pdf]

Spatio-temporal breather dynamics in microcomb soliton crystals

Authors: Futai Hu, Abhinav Kumar Vinod, Wenting Wang, Hsiao-Hsuan Chin, James F. McMillan, Ziyu Zhan, Yuan Meng, Mali Gong, Chee Wei Wong

Abstract: Solitons, the distinct balance between nonlinearity and dispersion, provide a route toward ultrafast electromagnetic pulse shaping, high-harmonic generation, real-time image processing, and RF photonic communications. Here we newly explore and observe the spatio-temporal breather dynamics of optical soliton crystals in frequency microcombs, examining spatial breathers, chaos transitions, and dynam… ▽ More Solitons, the distinct balance between nonlinearity and dispersion, provide a route toward ultrafast electromagnetic pulse shaping, high-harmonic generation, real-time image processing, and RF photonic communications. Here we newly explore and observe the spatio-temporal breather dynamics of optical soliton crystals in frequency microcombs, examining spatial breathers, chaos transitions, and dynamical deterministic switching in nonlinear measurements and theory. To understand the breather solitons, we describe their dynamical routes and two example transitional maps of the ensemble spatial breathers, with and without chaos initiation. We elucidate the physical mechanisms of the breather dynamics in the soliton crystal microcombs, in the interaction plane limit cycles and in the domain-wall understanding with parity symmetry breaking from third order dispersion. We present maps of the accessible nonlinear regions, the breather frequency dependences on third order dispersion and avoided mode crossing strengths, and the transition between the collective breather spatiotemporal states. Our range of measurements matches well with our first-principles theory and nonlinear modeling. To image these soliton ensembles and their breathers, we further constructed panoramic temporal imaging for simultaneous fast and slow axis two dimensional mapping of the breathers. In the phase differential sampling, we present two dimensional evolution maps of soliton crystal breathers, including with defects, in both stable breathers and breathers with drift. Our fundamental studies contribute to the understanding of nonlinear dynamics in soliton crystal complexes, their spatiotemporal dependences, and their stability-existence zones. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10200 [pdf, other]

Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

Authors: Tuo Feng, Wenguan Wang, Ruijie Quan, Yi Yang

Abstract: Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these… ▽ More Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these challenges, we propose Shape2Scene (S2S), a novel method that learns representations of large-scale 3D scenes from 3D shape data. We first design multiscale and high-resolution backbones for shape and scene level 3D tasks, i.e., MH-P (point-based) and MH-V (voxel-based). MH-P/V establishes direct paths to highresolution features that capture deep semantic information across multiple scales. This pivotal nature makes them suitable for a wide range of 3D downstream tasks that tightly rely on high-resolution features. We then employ a Shape-to-Scene strategy (S2SS) to amalgamate points from various shapes, creating a random pseudo scene (comprising multiple objects) for training data, mitigating disparities between shapes and scenes. Finally, a point-point contrastive loss (PPC) is applied for the pre-training of MH-P/V. In PPC, the inherent correspondence (i.e., point pairs) is naturally obtained in S2SS. Extensive experiments have demonstrated the transferability of 3D representations learned by MH-P/V across shape-level and scene-level 3D tasks. MH-P achieves notable performance on well-known point cloud datasets (93.8% OA on ScanObjectNN and 87.6% instance mIoU on ShapeNetPart). MH-V also achieves promising performance in 3D semantic segmentation and 3D object detection. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://github.com/FengZicai/S2S

arXiv:2407.10167 [pdf, other]

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

Authors: Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang

Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these small… ▽ More Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these smaller models often suffer from errors in calculation and semantic understanding. Prior work has proposed Program-of-Thought Distillation (PoTD) to avoid calculation error. To further address semantic understanding errors, we propose Key-Point-Driven Mathematical Reasoning Distillation (KPDD). KPDD enhances the reasoning performance of SLMs by breaking down the problem-solving process into three stages: Core Question Extraction, Problem-Solving Information Extraction, and Step-by-Step Solution. This method is further divided into KPDD-CoT, which generates Chain-of-Thought rationales, and KPDD-PoT, which creates Program-of-Thought rationales. The experiment results show that KPDD-CoT significantly improves reasoning abilities, while KPDD-PoT achieves state-of-the-art performance in mathematical reasoning tasks. Our approach effectively mitigates misunderstanding errors, advancing the deployment of efficient and capable SLMs. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.11864

arXiv:2407.10132 [pdf, other]

Optimal Kernel Choice for Score Function-based Causal Discovery

Authors: Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong

Abstract: Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appr… ▽ More Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: Accepted by ICML2024

arXiv:2407.10119 [pdf, ps, other]

Affine and cyclotomic Schur categories

Authors: Linliang Song, Weiqiang Wang

Abstract: Using the affine web category introduced in a prequel as a building block, we formulate a diagrammatic $\Bbbk$-linear monoidal category, the affine Schur category, for any commutative ring $\Bbbk$. We then formulate diagrammatic categories, the cyclotomic Schur categories, with arbitrary parameters at positive integral levels. Integral bases consisting of elementary diagrams are obtained for affin… ▽ More Using the affine web category introduced in a prequel as a building block, we formulate a diagrammatic $\Bbbk$-linear monoidal category, the affine Schur category, for any commutative ring $\Bbbk$. We then formulate diagrammatic categories, the cyclotomic Schur categories, with arbitrary parameters at positive integral levels. Integral bases consisting of elementary diagrams are obtained for affine and cyclotomic Schur categories. A second diagrammatic basis, called a double SST basis, for any such cyclotomic Schur category is also established, leading to a conjectural higher level RSK correspondence. We show that the endomorphism algebras with the double SST bases are isomorphic to degenerate cyclotomic Schur algebras with their cellular bases, providing a first diagrammatic presentation of the latter. The presentations for the affine and cyclotomic Schur categories are much simplified when $\Bbbk$ is a field of characteristic zero. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 50 pages, many figures

arXiv:2407.10109 [pdf]

Hardware-Efficient and Reliable Coherent DSCM Systems Enabled by Single-Pilot-Tone-Based Polarization Demultiplexing

Authors: Wei Wang, Dongdong Zou, Weihao Ni, Fan Li

Abstract: Recently, coherent digital subcarrier multiplexing (DSCM) technology has become an attractive solution for next-generation ultra-high-speed datacenter interconnects (DCIs). To meet the requirements of low-cost and low-power consumption in DCI applications, a comprehensive simplification of the coherent DSCM system has been investigated. The pilot-tone-based polarization demultiplexing (PT-PDM) tec… ▽ More Recently, coherent digital subcarrier multiplexing (DSCM) technology has become an attractive solution for next-generation ultra-high-speed datacenter interconnects (DCIs). To meet the requirements of low-cost and low-power consumption in DCI applications, a comprehensive simplification of the coherent DSCM system has been investigated. The pilot-tone-based polarization demultiplexing (PT-PDM) technique, known for its low-power consumption and ultra-fast polarization tracking capabilities, has emerged as a compelling alternative to the power-hungry N-tap adaptive multi-input multiple-output (MIMO) equalizer. However, the effectiveness of this PT-PDM technique is extremely vulnerable to the receiver-side XY-skew (Rx-XY-skew), which is revealed in this paper for the first time. Then, a pilot-tone-enabled modified Godard phase detector (PT-MGPD) scheme is proposed to realize Rx-XY-skew estimation, serving as the prerequisite for the successful implementation of the PT-PDM and simplification of the adaptive equalizer. Both the simulation and experiment are conducted to evaluate the accuracy of the proposed PT-MGPD scheme. The results prove it can achieve accurate estimation with an error of less than 0.3ps. Besides, a low-complexity, high-spectral-efficiency, and ultra-fast polarization demultiplexing method based on a single pilot tone (SPT) is proposed for the DSCM system in this work. Based on the proposed PT-MGPD and SPT schemes, the conventional N-tap MIMO equalizer served for each subcarrier can be successfully pruned into two polarization-independent single-input single-output equalizers, and there is no performance penalty even if the polarization rotation speed reaches 10Mrad/s. According to the results, the proposed schemes provide a hardware-efficient and reliable coherent DSCM solution for next-generation ultra-high-speed DCIs. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.09792 [pdf, other]

Language-Augmented Symbolic Planner for Open-World Task Planning

Authors: Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, Jia Pan

Abstract: Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due… ▽ More Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due to their common assumption of complete domain knowledge which is hard to manually prepare for an open-world setting. In this paper, we introduce a Language-Augmented Symbolic Planner (LASP) that integrates pre-trained LLMs to enable conventional symbolic planners to operate in an open-world environment where only incomplete knowledge of action preconditions, objects, and properties is initially available. In case of execution errors, LASP can utilize the LLM to diagnose the cause of the error based on the observation and interact with the environment to incrementally build up its knowledge base necessary for accomplishing the given tasks. Experiments demonstrate that LASP is proficient in solving planning problems in the open-world setting, performing well even in situations where there are multiple gaps in the knowledge. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Accepted by Robotics: Science and Systems (RSS) 2024

arXiv:2407.09693 [pdf, other]

A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems

Authors: Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor

Abstract: The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symboli… ▽ More The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symbolic Energy-Based Models (NeSy-EBMs), a unifying mathematical framework for discriminative and generative modeling with probabilistic and non-probabilistic NeSy approaches. We utilize NeSy-EBMs to develop a taxonomy of modeling paradigms focusing on a system's neural-symbolic interface and reasoning capabilities. Additionally, we introduce a suite of learning techniques for NeSy-EBMs. Importantly, NeSy-EBMs allow the derivation of general expressions for gradients of prominent learning losses, and we provide four learning approaches that leverage methods from multiple domains, including bilevel and stochastic policy optimization. Finally, we present Neural Probabilistic Soft Logic (NeuPSL), an open-source NeSy-EBM library designed for scalability and expressivity, facilitating real-world application of NeSy systems. Through extensive empirical analysis across multiple datasets, we demonstrate the practical advantages of NeSy-EBMs in various tasks, including image classification, graph node labeling, autonomous vehicle situation awareness, and question answering. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09648 [pdf, other]

3x2: 3D Object Part Segmentation by 2D Semantic Correspondences

Authors: Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg

Abstract: 3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object par… ▽ More 3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation. We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels. By using features from pretrained foundation models and exploiting semantic and geometric correspondences, we are able to overcome the challenges of limited 3D annotations. Our approach leverages available 2D labels, enabling effective 3D object part segmentation. Our method 3-By-2 can accommodate various part taxonomies and granularities, demonstrating interesting part label transfer ability across different object categories. Project website: \url{https://ngailapdi.github.io/projects/3by2/}. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.09121 [pdf, other]

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Authors: Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

Abstract: This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at a… ▽ More This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at https://github.com/RobustNLP/DeRTa. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08785 [pdf, ps, other]

A kinetic Nash inequality and precise boundary behavior of the kinetic Fokker-Planck equation

Authors: Christopher Henderson, Giacomo Lucertini, Weinan Wang

Abstract: In this paper, we prove a kinetic Nash type inequality and adapt it to a new functional inequality for functions in a kinetic Sobolev space with absorbing boundary conditions on the half-space. As an application, we address the boundary behavior of the kinetic Fokker-Planck equations in the half-space. Our main result is the sharp regularity of the solution at the absorbing boundary and grazing se… ▽ More In this paper, we prove a kinetic Nash type inequality and adapt it to a new functional inequality for functions in a kinetic Sobolev space with absorbing boundary conditions on the half-space. As an application, we address the boundary behavior of the kinetic Fokker-Planck equations in the half-space. Our main result is the sharp regularity of the solution at the absorbing boundary and grazing set. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 35 pages, 2 figures

MSC Class: 35Q84; 35K65; 26D10; 35A23

arXiv:2407.08183 [pdf, other]

The white-light superflares from cool stars in GWAC triggers

Authors: Guang-Wei Li, Liang Wang, Hai-Long Yuan, Li-Ping Xin, Jing Wang, Chao Wu, Hua-Li Li, Hasitieer Haerken, Wei-Hua Wang, Hong-Bo Cai, Xu-Hui Han, Yang Xu, Lei Huang, Xiao-Meng Lu, Jian-Ying Bai, Xiang-Yu Wang, Zi-Gao Dai, En-Wei Liang, Jian-Yan Wei

Abstract: M-type stars are the ones that flare most frequently, but how big their maximum flare energy can reach is still unknown. We present 163 flares from 162 individual M2 through L1-type stars that triggered the GWAC, with flare energies ranging from $10^{32.2}$ to $10^{36.4}$ erg . The flare amplitudes range from $\triangle G = 0.84$ to $\sim 10$ mag. Flare energy increases with stellar surface temper… ▽ More M-type stars are the ones that flare most frequently, but how big their maximum flare energy can reach is still unknown. We present 163 flares from 162 individual M2 through L1-type stars that triggered the GWAC, with flare energies ranging from $10^{32.2}$ to $10^{36.4}$ erg . The flare amplitudes range from $\triangle G = 0.84$ to $\sim 10$ mag. Flare energy increases with stellar surface temperature ($T_{\rm eff}$) but both $\triangle G$ and equivalent duration $\log_{10}(ED)$ seem to be independent of $T_{\rm eff}$. Combining periods detected from light curves of TESS and K2, spectra from LAMOST, SDSS and the 2.16 m Telescope, and the Gaia DR3 data, we found that these GWAC flare stars are young. For the stars that have spectra, we found that these stars are in or very near to the saturation region, and $\log_{10}(L_{\rm Hα}/L_{\rm bol})$ is lower for M7-L1 stars than for M2-M6 stars. We also studied the relation between GWAC flare bolometric energy $E_{\rm bol}$ and stellar hemispherical area $S$, and found that $\log_{10}E_{\rm bol}$ (in erg) increases with increasing $S$ (in cm$^2$), and the maximum flare energy $\log_{10}E_{\rm bol, max} \geqslant \log_{10}S + 14.25$. For M7-L1 stars, there seem to be other factors limiting their maximum flare energies in addition to stellar hemispherical area. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 18 pages, 11 figures, 4 tables

arXiv:2407.08133 [pdf, other]

Nonverbal Interaction Detection

Authors: Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang

Abstract: This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the l… ▽ More This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the linguistic counterparts, and existing solutions typically examine nonverbal cues in isolation. Our study marks the first systematic effort to enhance the interpretation of multifaceted nonverbal signals. First, we contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups, along with 22 atomic-level nonverbal behaviors under five broad interaction types. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form <individual, group, interaction> from images. Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs. Central to the model is a dual multi-scale hypergraph that adeptly addresses individual-to-individual and group-to-group correlations across varying scales, facilitating interactional feature learning and eventually improving interaction prediction. Extensive experiments on NVI show that NVI-DEHR improves various baselines significantly in NVI-DET. It also exhibits leading performance on HOI-DET, confirming its versatility in supporting related tasks and strong generalization ability. We hope that our study will offer the community new avenues to explore nonverbal signals in more depth. △ Less

Submitted 14 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://github.com/weijianan1/NVI

arXiv:2407.08127 [pdf, other]

Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

Authors: Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang

Abstract: Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unreal… ▽ More Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unrealistic especially in black-box scenario. On the other hand, some training-based methods launch an attack through a single forward inference, whereas failing to directly learn high-level mappings from prediction vectors to images. Addressing these limitations, we propose a novel Prediction-to-Image (P2I) method for black-box MI attack. Specifically, we introduce the Prediction Alignment Encoder to map the target model's output prediction into the latent code of StyleGAN. In this way, prediction vector space can be well aligned with the more disentangled latent space, thus establishing a connection between prediction vectors and the semantic facial features. During the attack phase, we further design the Aligned Ensemble Attack scheme to integrate complementary facial attributes of target identity for better reconstruction. Experimental results show that our method outperforms other SOTAs, e.g.,compared with RLB-MI, our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.07924 [pdf, other]

Solving General Natural-Language-Description Optimization Problems with Large Language Models

Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale selfdeveloped optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the promptbased models. Some features of OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat). △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07504 [pdf, other]

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Authors: Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

Abstract: Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation… ▽ More Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation pre-training with the designed position-aware masked autoencoder (PAMA). Meanwhile, we propose the position-aware cross-attention (PACA) module with a kernel reorientation (KRO) strategy and an anchor dropout (AD) mechanism. The KRO strategy can capture the complete semantic structure and eliminate ambiguity in WSIs, and the AD contributes to enhancing the robustness and generalization of the model. We evaluated our method on 6 large-scale datasets from multiple organs for pan-cancer classification tasks. The results have demonstrated the effectiveness of PAMA in generalized and discriminative WSI representation learning and pan-cancer WSI pre-training. The proposed method was also compared with 7 WSI analysis methods. The experimental results have indicated that our proposed PAMA is superior to the state-of-the-art methods.The code and checkpoints are available at https://github.com/WkEEn/PAMA. △ Less

Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07487 [pdf, other]

Review-LLM: Harnessing Large Language Models for Personalized Review Generation

Authors: Qiyao Peng, Hongtao Liu, Hongyan Xu, Qing Yang, Minglai Shao, Wenjun Wang

Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' ph… ▽ More Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' phenomenon of the LLMs and could not generate personalized reviews (e.g., negative reviews). In this paper, we propose Review-LLM that customizes LLMs for personalized review generation. Firstly, we construct the prompt input by aggregating user historical behaviors, which include corresponding item titles and reviews. This enables the LLMs to capture user interest features and review writing style. Secondly, we incorporate ratings as indicators of satisfaction into the prompt, which could further improve the model's understanding of user preferences and the sentiment tendency control of generated reviews. Finally, we feed the prompt text into LLMs, and use Supervised Fine-Tuning (SFT) to make the model generate personalized reviews for the given user and target item. Experimental results on the real-world dataset show that our fine-tuned model could achieve better review generation performance than existing close-source LLMs. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07433 [pdf, other]

Controllable Navigation Instruction Generation with Chain of Thought Prompting

Authors: Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu

Abstract: Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation… ▽ More Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation environment. Leveraging the capabilities of Large Language Models (LLMs), we propose C-Instructor, which utilizes the chain-of-thought-style prompt for style-controllable and content-controllable instruction generation. Firstly, we propose a Chain of Thought with Landmarks (CoTL) mechanism, which guides the LLM to identify key landmarks and then generate complete instructions. CoTL renders generated instructions more accessible to follow and offers greater controllability over the manipulation of landmark objects. Furthermore, we present a Spatial Topology Modeling Task to facilitate the understanding of the spatial structure of the environment. Finally, we introduce a Style-Mixed Training policy, harnessing the prior knowledge of LLMs to enable style control for instruction generation based on different prompts within a single model instance. Extensive experiments demonstrate that instructions generated by C-Instructor outperform those generated by previous methods in text metrics, navigation guidance evaluation, and user studies. △ Less

Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.07056 [pdf, other]

CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement

Authors: Wei Wang, Zhi Jin

Abstract: Low-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPE… ▽ More Low-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPEG due to widespread low pixel values in dark areas. Hence, we propose the Compression-Aware Pre-trained Transformer (CAPformer), employing a novel pre-training strategy to learn lossless information from uncompressed low-light images. Additionally, the proposed Brightness-Guided Self-Attention (BGSA) mechanism enhances rational information gathering. Experiments demonstrate the superiority of our approach in mitigating compression effects on LLIE, showcasing its potential for improving LLIE in resource-constrained scenarios. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06953 [pdf, other]

SP-Chain: Boosting Intra-Shard and Cross-Shard Security and Performance in Blockchain Sharding

Authors: Mingzhe Li, You Lin, Wei Wang, Jin Zhang

Abstract: A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or t… ▽ More A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or trade off great performance for security. In this paper, we propose SP-Chain, a blockchain sharding system with enhanced Security and Performance for both intra- and cross-shard perspectives. For intra-shard aspect, we design a two-phase concurrent voting scheme to provide high system throughput and low transaction confirmation latency. Moreover, we propose an efficient unbiased leader rotation scheme to ensure high performance under malicious behavior. For cross-shard aspect, a proof-assisted efficient cross-shard transaction processing mechanism is proposed to guard the cross-shard transactions with low overhead. We implement SP-Chain based on Harmony, and evaluate its performance via large-scale deployment. Extensive evaluations suggest that SP-Chain can process more than 10,000 tx/sec under malicious behaviors with a confirmation latency of 7.6s in a network of 4,000 nodes. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06943 [pdf, other]

A Starter's Kit for Concentric Tube Robots

Authors: Kalina Bonofiglio, Wenpeng Wang, Ethan R. Wilke, Adri Rajaraman, Loris Fichera

Abstract: Concentric Tube Robots (CTRs) have garnered significant interest within the surgical robotics community because of their flexibility, dexterity, and ease of miniaturization. However, mastering the unique kinematics and design principles of CTRs can be challenging for newcomers to the field. In this paper, we present an educational kit aimed at lowering the barriers to entry into concentric tube ro… ▽ More Concentric Tube Robots (CTRs) have garnered significant interest within the surgical robotics community because of their flexibility, dexterity, and ease of miniaturization. However, mastering the unique kinematics and design principles of CTRs can be challenging for newcomers to the field. In this paper, we present an educational kit aimed at lowering the barriers to entry into concentric tube robot research. Our goal is to provide accessible learning resources for CTRs, bridging the knowledge gap between traditional robotic arms and these specialized devices. The proposed kit includes (1) An open-source design and assembly instructions for an economical (cost of materials $\approx$ 700 USD) modular CTR; (2) A set of self-study materials to learn the basics of CTR modeling and control, including automatically-graded assignments. To evaluate the effectiveness of our educational kit, we conducted a human subjects study involving first-year graduate students in engineering. Over a four-week period, participants -- none of whom had any prior knowledge of concentric tube robots -- successfully built their first CTR using the provided materials, implemented the robot's kinematics in MATLAB, and conducted a tip-tracking experiment with an optical tracking device. Our findings suggest that the proposed kit facilitates learning and hands-on experience with CTRs, and furthermore, it has the potential to help early-stage graduate students get rapidly started with CTR research. By disseminating these resources, we hope to broaden participation in concentric tube robot research to a wider a more diverse group of researchers. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06865 [pdf, ps, other]

Affine $\imath$quantum groups and Steinberg varieties of type C

Authors: Changjian Su, Weiqiang Wang

Abstract: We provide a geometric realization of the quasi-split affine $\imath$quantum group of type AIII$_{2n-1}^{(τ)}$ in terms of equivariant K-groups of non-connected Steinberg varieties of type C. This uses a new Drinfeld type presentation of this affine $\imath$quantum group which admits very nontrivial Serre relations. We then construct à la Springer a family of finite-dimensional standard modules an… ▽ More We provide a geometric realization of the quasi-split affine $\imath$quantum group of type AIII$_{2n-1}^{(τ)}$ in terms of equivariant K-groups of non-connected Steinberg varieties of type C. This uses a new Drinfeld type presentation of this affine $\imath$quantum group which admits very nontrivial Serre relations. We then construct à la Springer a family of finite-dimensional standard modules and irreducible modules of this $\imath$quantum group, and provide a composition multiplicity formula of the standard modules. △ Less

Submitted 13 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: References updated

arXiv:2407.06844 [pdf, other]

Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: submitted to TIP

arXiv:2407.06540 [pdf, other]

General and Task-Oriented Video Segmentation

Authors: Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang

Abstract: We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies… ▽ More We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies deployment. However, such a highly homogenized framework in current design, where each element maintains uniformity, could overlook the inherent diversity among different tasks and lead to suboptimal performance. To tackle this, GvSeg: i) provides a holistic disentanglement and modeling for segment targets, thoroughly examining them from the perspective of appearance, position, and shape, and on this basis, ii) reformulates the query initialization, matching and sampling strategies in alignment with the task-specific requirement. These architecture-agnostic innovations empower GvSeg to effectively address each unique task by accommodating the specific properties that characterize them. Extensive experiments on seven gold-standard benchmark datasets demonstrate that GvSeg surpasses all existing specialized/general solutions by a significant margin on four different video segmentation tasks. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://github.com/kagawa588/GvSeg

arXiv:2407.06426 [pdf, other]

DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations

Authors: Luke Yoffe, Alfonso Amayuelas, William Yang Wang

Abstract: To enhance Large Language Model (LLM) capabilities, multi-agent debates have been introduced, where multiple LLMs discuss solutions to a problem over several rounds of debate. However, LLMs often produce incorrect responses that appear deceptively confident, which can mislead other agents. This is partly because agents do not express their confidence levels during standard debates. To address this… ▽ More To enhance Large Language Model (LLM) capabilities, multi-agent debates have been introduced, where multiple LLMs discuss solutions to a problem over several rounds of debate. However, LLMs often produce incorrect responses that appear deceptively confident, which can mislead other agents. This is partly because agents do not express their confidence levels during standard debates. To address this, we introduce DebUnc, a multi-agent debate framework that uses uncertainty metrics to assess agent confidence levels. We adapted the LLM attention mechanism to adjust token weights based on confidence levels and also explored using textual prompts to convey confidence. Our evaluations across various benchmarks show that attention-based methods are particularly effective, and that as uncertainty metrics evolve, performance will continue to increase. The code is available at https://github.com/lukeyoffe/debunc △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05984 [pdf, other]

MBA-Net: SAM-driven Bidirectional Aggregation Network for Ovarian Tumor Segmentation

Authors: Yifan Gao, Wei Xia, Wenkui Wang, Xin Gao

Abstract: Accurate segmentation of ovarian tumors from medical images is crucial for early diagnosis, treatment planning, and patient management. However, the diverse morphological characteristics and heterogeneous appearances of ovarian tumors pose significant challenges to automated segmentation methods. In this paper, we propose MBA-Net, a novel architecture that integrates the powerful segmentation capa… ▽ More Accurate segmentation of ovarian tumors from medical images is crucial for early diagnosis, treatment planning, and patient management. However, the diverse morphological characteristics and heterogeneous appearances of ovarian tumors pose significant challenges to automated segmentation methods. In this paper, we propose MBA-Net, a novel architecture that integrates the powerful segmentation capabilities of the Segment Anything Model (SAM) with domain-specific knowledge for accurate and robust ovarian tumor segmentation. MBA-Net employs a hybrid encoder architecture, where the encoder consists of a prior branch, which inherits the SAM encoder to capture robust segmentation priors, and a domain branch, specifically designed to extract domain-specific features. The bidirectional flow of information between the two branches is facilitated by the robust feature injection network (RFIN) and the domain knowledge integration network (DKIN), enabling MBA-Net to leverage the complementary strengths of both branches. We extensively evaluate MBA-Net on the public multi-modality ovarian tumor ultrasound dataset and the in-house multi-site ovarian tumor MRI dataset. Our proposed method consistently outperforms state-of-the-art segmentation approaches. Moreover, MBA-Net demonstrates superior generalization capability across different imaging modalities and clinical sites. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: MICCAI 2024

arXiv:2407.05862 [pdf, other]

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

Authors: Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

Abstract: Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-ba… ▽ More Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a weight-sharing encoder and two identically structured decoders are utilized to perform masked token reconstruction. Additionally, we propose that for an input token masked by both masks simultaneously, the reconstructed features should be as similar as possible. This naturally establishes an explicit contrastive constraint within the generative MAE-based pre-training paradigm, resulting in our proposed method, Point-CMAE. Consequently, Point-CMAE effectively enhances the representation quality and transfer performance compared to its MAE counterpart. Experimental evaluations across various downstream applications, including classification, part segmentation, and few-shot learning, demonstrate the efficacy of our framework in surpassing state-of-the-art techniques under standard ViTs and single-modal settings. The source code and trained models are available at: https://github.com/Amazingren/Point-CMAE. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

Showing 1–50 of 7,305 results for author: Wang, W