Search | arXiv e-print repository

Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He

Authors: F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De Benedittis, I. De Mitri, F. de Palma, A. Di Giovanni, Q. Ding, T. K. Dong , et al. (126 additional authors not shown)

Abstract: Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp… ▽ More Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 17 pages, submitted to PRD

arXiv:2408.15813 [pdf, other]

DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries

Authors: Yu Yang, Jianbiao Mei, Liang Liu, Siliang Du, Yilin Xiao, Jongwon Ra, Yong Liu, Xiao Xu, Huifeng Wu

Abstract: LiDAR panoptic segmentation, which jointly performs instance and semantic segmentation for things and stuff classes, plays a fundamental role in LiDAR perception tasks. While most existing methods explicitly separate these two segmentation tasks and utilize different branches (i.e., semantic and instance branches), some recent methods have embraced the query-based paradigm to unify LiDAR panoptic… ▽ More LiDAR panoptic segmentation, which jointly performs instance and semantic segmentation for things and stuff classes, plays a fundamental role in LiDAR perception tasks. While most existing methods explicitly separate these two segmentation tasks and utilize different branches (i.e., semantic and instance branches), some recent methods have embraced the query-based paradigm to unify LiDAR panoptic segmentation. However, the distinct spatial distribution and inherent characteristics of objects(things) and their surroundings(stuff) in 3D scenes lead to challenges, including the mutual competition of things/stuff and the ambiguity of classification/segmentation. In this paper, we propose decoupling things/stuff queries according to their intrinsic properties for individual decoding and disentangling classification/segmentation to mitigate ambiguity. To this end, we propose a novel framework dubbed DQFormer to implement semantic and instance segmentation in a unified workflow. Specifically, we design a decoupled query generator to propose informative queries with semantics by localizing things/stuff positions and fusing multi-level BEV embeddings. Moreover, a query-oriented mask decoder is introduced to decode corresponding segmentation masks by performing masked cross-attention between queries and mask embeddings. Finally, the decoded masks are combined with the semantics of the queries to produce panoptic results. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our DQFormer framework. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 13 pages, 10 figures

arXiv:2408.14688 [pdf, other]

Lowering threshold of NaI(Tl) scintillator to 0.7 keV in the COSINE-100 experiment

Authors: G. H. Yu, N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. França, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, Y. J. Ko, D. H. Lee , et al. (34 additional authors not shown)

Abstract: COSINE-100 is a direct dark matter search experiment, with the primary goal of testing the annual modulation signal observed by DAMA/LIBRA, using the same target material, NaI(Tl). In previous analyses, we achieved the same 1 keV energy threshold used in the DAMA/LIBRA's analysis that reported an annual modulation signal with 11.6$σ$ significance. In this article, we report an improved analysis th… ▽ More COSINE-100 is a direct dark matter search experiment, with the primary goal of testing the annual modulation signal observed by DAMA/LIBRA, using the same target material, NaI(Tl). In previous analyses, we achieved the same 1 keV energy threshold used in the DAMA/LIBRA's analysis that reported an annual modulation signal with 11.6$σ$ significance. In this article, we report an improved analysis that lowered the threshold to 0.7 keV, thanks to the application of Multi-Layer Perception network and a new likelihood parameter with waveforms in the frequency domain. The lower threshold would enable a better comparison of COSINE-100 with new DAMA results with a 0.75 keV threshold and account for differences in quenching factors. Furthermore the lower threshold can enhance COSINE-100's sensitivity to sub-GeV dark matter searches. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.10803 [pdf]

doi 10.3847/1538-3881/ad4463

Estimating the Atmospheric Parameters of Early-type Stars from the Chinese Space Station Telescope (CSST) Slitless Spectra Survey

Authors: JiaRui Rao, HaiLiang Chen, JianPing Xiong, LuQian Wang, YanJun Guo, JiaJia Li, Chao Liu, ZhanWen Han, XueFei Chen

Abstract: The measurement of atmospheric parameters is fundamental for scientific research using stellar spectra. The Chinese Space Station Telescope (CSST), scheduled to be launched in 2024, will provide researchers with hundreds of millions of slitless spectra for stars during a 10 yr survey. And machine learning has unparalleled efficiency in processing large amounts of data compared to manual processing… ▽ More The measurement of atmospheric parameters is fundamental for scientific research using stellar spectra. The Chinese Space Station Telescope (CSST), scheduled to be launched in 2024, will provide researchers with hundreds of millions of slitless spectra for stars during a 10 yr survey. And machine learning has unparalleled efficiency in processing large amounts of data compared to manual processing. Here we studied the stellar parameters of early-type stars (effective temperature Teff more than 15,000 K) based on the design indicators of the CSST slitless spectrum and the machine learning algorithm, Stellar LAbel Machine. We used the Potsdam Wolf-Rayet (POWR) synthetic spectra library for cross validation. Then we tested the reliability of machine learning results by using the Next Generation Spectrum Library (NGSL) from Hubble Space Telescope observation data. We use the spectra with the impact of interstellar extinction (AV = 0, 0.5, 1, 1.5, 2 mag) and radial velocity (RV = -50, -30, 0, 30, 50 km s-1) from the POWR library as the test set. When RV = 0 km s-1 and AV = 0 mag, the average value and standard deviation for 3 wavelength ranges (2550-4050 Ang (R = 287); 4050-6300 Ang (R = 232); 6300-10000 Ang (R = 207)) are -66 K, 550 K, and 356 K for Teff, and 0.004 c.g.s, -0.024 c.g.s, and 0.01 c.g.s for log g. When using the observed data from NGSL as the testing samples, the deviation of Teff is less than 5%, and the deviation of log g is less than 11%. In addition, we also test the influence of shifting of spectra on the parameters accuracy. The deviation of Teff for the case with a shift of 5 Ang and 10 Ang are 3.6% and 4.3%, respectively; the deviation of log g are 4.2% and 5.1%. These results demonstrate that we can obtain relatively accurate stellar parameters of a population of early-type stars with the CSST slitless spectra and a machine-learning method. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Journal ref: The Astronomical Journal, 168:20 (17pp), 2024 July

arXiv:2408.09806 [pdf, other]

Improved background modeling for dark matter search with COSINE-100

Authors: G. H. Yu, N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, Y. J. Ko, D. H. Lee , et al. (33 additional authors not shown)

Abstract: COSINE-100 aims to conclusively test the claimed dark matter annual modulation signal detected by DAMA/LIBRA collaboration. DAMA/LIBRA has released updated analysis results by lowering the energy threshold to 0.75 keV through various upgrades. They have consistently claimed to have observed the annual modulation. In COSINE-100, it is crucial to lower the energy threshold for a direct comparison wi… ▽ More COSINE-100 aims to conclusively test the claimed dark matter annual modulation signal detected by DAMA/LIBRA collaboration. DAMA/LIBRA has released updated analysis results by lowering the energy threshold to 0.75 keV through various upgrades. They have consistently claimed to have observed the annual modulation. In COSINE-100, it is crucial to lower the energy threshold for a direct comparison with DAMA/LIBRA, which also enhances the sensitivity of the search for low-mass dark matter, enabling COSINE-100 to explore this area. Therefore, it is essential to have a precise and quantitative understanding of the background spectrum across all energy ranges. This study expands the background modeling from 0.7 to 4000 keV using 2.82 years of COSINE-100 data. The modeling has been improved to describe the background spectrum across all energy ranges accurately. Assessments of the background spectrum are presented, considering the nonproportionality of NaI(Tl) crystals at both low and high energies and the characteristic X-rays produced by the interaction of external backgrounds with materials such as copper. Additionally, constraints on the fit parameters obtained from the alpha spectrum modeling fit are integrated into this model. These improvements are detailed in the paper. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2407.20875 [pdf, other]

Localized stem structures in quasi-resonant two-soliton solutions for the asymmetric Nizhnik-Novikov-Veselov system

Authors: Feng Yuan, Jiguang Rao, Jingsong He, Yi Cheng

Abstract: Elastic collisions of solitons generally have a finite phase shift. When the phase shift has a finitely large value, the two vertices of the (2+1)-dimensional 2-soliton are significantly separated due to the phase shift, accompanied by the formation of a local structure connecting the two V-shaped solitons. We define this local structure as the stem structure. This study systematically investigate… ▽ More Elastic collisions of solitons generally have a finite phase shift. When the phase shift has a finitely large value, the two vertices of the (2+1)-dimensional 2-soliton are significantly separated due to the phase shift, accompanied by the formation of a local structure connecting the two V-shaped solitons. We define this local structure as the stem structure. This study systematically investigates the localized stem structures between two solitons in the (2+1)-dimensional asymmetric Nizhnik-Novikov-Veselov system. These stem structures, arising from quasi-resonant collisions between the solitons, exhibit distinct features of spatial locality and temporal invariance. We explore two scenarios: one characterized by weakly quasi-resonant collisions (i.e. $a_{12}\approx 0$), and the other by strongly quasi-resonant collisions (i.e. $a_{12}\approx +\infty$). Through mathematical analysis, we extract comprehensive insights into the trajectories, amplitudes, and velocities of the soliton arms. Furthermore, we discuss the characteristics of the stem structures, including their length and extreme points. Our findings shed new light on the interaction between solitons in the (2+1)-dimensional asymmetric Nizhnik-Novikov-Veselov system. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 14 pages, 6 figures;Accepted by journal of mathematical physics(July, 2024)

arXiv:2407.04925 [pdf, other]

RAMO: Retrieval-Augmented Generation for Enhancing MOOCs Recommendations

Authors: Jiarui Rao, Jionghao Lin

Abstract: Massive Open Online Courses (MOOCs) have significantly enhanced educational accessibility by offering a wide variety of courses and breaking down traditional barriers related to geography, finance, and time. However, students often face difficulties navigating the vast selection of courses, especially when exploring new fields of study. Driven by this challenge, researchers have been exploring cou… ▽ More Massive Open Online Courses (MOOCs) have significantly enhanced educational accessibility by offering a wide variety of courses and breaking down traditional barriers related to geography, finance, and time. However, students often face difficulties navigating the vast selection of courses, especially when exploring new fields of study. Driven by this challenge, researchers have been exploring course recommender systems to offer tailored guidance that aligns with individual learning preferences and career aspirations. These systems face particular challenges in effectively addressing the ``cold start'' problem for new users. Recent advancements in recommender systems suggest integrating large language models (LLMs) into the recommendation process to enhance personalized recommendations and address the ``cold start'' problem. Motivated by these advancements, our study introduces RAMO (Retrieval-Augmented Generation for MOOCs), a system specifically designed to overcome the ``cold start'' challenges of traditional course recommender systems. The RAMO system leverages the capabilities of LLMs, along with Retrieval-Augmented Generation (RAG)-facilitated contextual understanding, to provide course recommendations through a conversational interface, aiming to enhance the e-learning experience. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 7 pages, this paper underwent a rigorous review process and was officially accepted on May 31, 2024, for presentation at the Educational Data Mining 2024 Workshop: Leveraging Large Language Models for Next Generation Educational Technologies

arXiv:2407.03449 [pdf, other]

A Tutorial on Fluid Antenna System for 6G Networks: Encompassing Communication Theory, Optimization Methods and Hardware Designs

Authors: Wee Kiat New, Kai-Kit Wong, Hao Xu, Chao Wang, Farshad Rostami Ghadi, Jichen Zhang, Junhui Rao, Ross Murch, Pablo Ramírez-Espinosa, David Morales-Jimenez, Chan-Byoung Chae, Kin-Fai Tong

Abstract: The advent of the sixth-generation (6G) networks presents another round of revolution for the mobile communication landscape, promising an immersive experience, robust reliability, minimal latency, extreme connectivity, ubiquitous coverage, and capabilities beyond communication, including intelligence and sensing. To achieve these ambitious goals, it is apparent that 6G networks need to incorporat… ▽ More The advent of the sixth-generation (6G) networks presents another round of revolution for the mobile communication landscape, promising an immersive experience, robust reliability, minimal latency, extreme connectivity, ubiquitous coverage, and capabilities beyond communication, including intelligence and sensing. To achieve these ambitious goals, it is apparent that 6G networks need to incorporate the state-of-the-art technologies. One of the technologies that has garnered rising interest is fluid antenna system (FAS) which represents any software-controllable fluidic, conductive, or dielectric structure capable of dynamically changing its shape and position to reconfigure essential radio-frequency (RF) characteristics. Compared to traditional antenna systems (TASs) with fixed-position radiating elements, the core idea of FAS revolves around the unique flexibility of reconfiguring the radiating elements within a given space. One recent driver of FAS is the recognition of its position-flexibility as a new degree of freedom (dof) to harness diversity and multiplexing gains. In this paper, we provide a comprehensive tutorial, covering channel modeling, signal processing and estimation methods, information-theoretic insights, new multiple access techniques, and hardware designs. Moreover, we delineate the challenges of FAS and explore the potential of using FAS to improve the performance of other contemporary technologies. By providing insights and guidance, this tutorial paper serves to inspire researchers to explore new horizons and fully unleash the potential of FAS. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 50 pages, 45 figures, 5 tables. Submitted for potential publication

arXiv:2407.00137 [pdf, other]

Understanding and Modeling the Dynamics of Storm-time Atmospheric Neutral Density using Random Forests

Authors: Kyle R. Murphy, Alexa J. Halford, Vivian Liu, Jeffery Klenzing, Jonathon Smith, Katherine Garcia-Sage, Joshua Pettit, I. Jonathan Rae

Abstract: Atmospheric neutral density is a crucial component to accurately predict and track the motion of satellites. During periods of elevated solar and geomagnetic activity atmospheric neutral density becomes highly variable and dynamic. This variability and enhanced dynamics make it difficult to accurately model neutral density leading to increased errors which propagate from neutral density models thr… ▽ More Atmospheric neutral density is a crucial component to accurately predict and track the motion of satellites. During periods of elevated solar and geomagnetic activity atmospheric neutral density becomes highly variable and dynamic. This variability and enhanced dynamics make it difficult to accurately model neutral density leading to increased errors which propagate from neutral density models through to orbit propagation models. In this paper we investigate the dynamics of neutral density during geomagnetic storms. We use a combination of solar and geomagnetic variables to develop three Random Forest machine learning models of neutral density. These models are based on (1) slow solar indices, (2) high cadence solar irradiance, and (3) combined high-cadence solar irradiance and geomagnetic indices. Each model is validated using an out-of-sample dataset using analysis of residuals and typical metrics. During quiet-times, all three models perform well; however, during geomagnetic storms, the combined high cadence solar irradiance/geomagnetic model performs significantly better than the models based solely on solar activity. The combined model capturing an additional 10\% in the variability of density and having an error up to six times smaller during geomagnetic storms then the solar models. Overall, this work demonstrates the importance of including geomagnetic activity in the modeling of atmospheric density and serves as a proof of concept for using machine learning algorithms to model, and in the future forecast atmospheric density for operational use. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Submitted for publication to Space Weather

arXiv:2406.18530 [pdf, other]

MatchTime: Towards Automatic Soccer Game Commentary Generation

Authors: Jiayuan Rao, Haoning Wu, Chang Liu, Yanfeng Wang, Weidi Xie

Abstract: Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for… ▽ More Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Technical Report; Project Page: https://haoningwu3639.github.io/MatchTime/

arXiv:2406.16227 [pdf, other]

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data

Authors: Paul D. W. Kirk, Jackie Rao

Abstract: Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratifiction of patients or samples. However, the growth in availability of high-dimensional categorical data, including `omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical dat… ▽ More Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratifiction of patients or samples. However, the growth in availability of high-dimensional categorical data, including `omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in term of efficiency, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarisation and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas (TCGA), showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix's utility in integrative cluster analysis with different `omics datasets, enabling the discovery of novel subtypes. \textbf{Availability:} VICatMix is freely available as an R package, incorporating C++ for faster computation, at \url{https://github.com/j-ackierao/VICatMix}. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.05499 [pdf, other]

A Pixel-based Reconfigurable Antenna Design for Fluid Antenna Systems

Authors: Jichen Zhang, Junhui Rao, Zhaoyang Ming, Zan Li, Chi-Yuk Chiu, Kai-Kit Wong, Kin-Fai Tong, Ross Murch

Abstract: Fluid Antenna Systems (FASs) have recently been proposed for enhancing the performance of wireless communication. Previous antenna designs to meet the requirements of FAS have been based on mechanically movable or liquid antennas and therefore have limited reconfiguration speeds. In this paper, we propose a design for a pixel-based reconfigurable antenna (PRA) that meets the requirements of FAS an… ▽ More Fluid Antenna Systems (FASs) have recently been proposed for enhancing the performance of wireless communication. Previous antenna designs to meet the requirements of FAS have been based on mechanically movable or liquid antennas and therefore have limited reconfiguration speeds. In this paper, we propose a design for a pixel-based reconfigurable antenna (PRA) that meets the requirements of FAS and the required switching speed. It can provide 12 FAS ports across 1/2 wavelength and consists of an E-slot patch antenna and an upper reconfigurable pixel layer with 6 RF switches. Simulation and experimental results from a prototype operating at 2.5 GHz demonstrate that the design can meet the requirements of FAS including port correlation with matched impedance. △ Less

Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: 13 pages, 16 figures, Submitted to IEEE Transations on Antennas and Propagation

arXiv:2406.02975 [pdf, other]

A Shared-Aperture Dual-Band sub-6 GHz and mmWave Reconfigurable Intelligent Surface With Independent Operation

Authors: Junhui Rao, Yujie Zhang, Shiwen Tang, Zan Li, Zhaoyang Ming, Jichen Zhang, Chi Yuk Chiu, Ross Murch

Abstract: A novel dual-band reconfigurable intelligent surface (DBI-RIS) design that combines the functionalities of millimeter-wave (mmWave) and sub-6 GHz bands within a single aperture is proposed. This design aims to bridge the gap between current single-band reconfigurable intelligent surfaces (RISs) and wireless systems utilizing sub-6 GHz and mmWave bands that require RIS with independently reconfigur… ▽ More A novel dual-band reconfigurable intelligent surface (DBI-RIS) design that combines the functionalities of millimeter-wave (mmWave) and sub-6 GHz bands within a single aperture is proposed. This design aims to bridge the gap between current single-band reconfigurable intelligent surfaces (RISs) and wireless systems utilizing sub-6 GHz and mmWave bands that require RIS with independently reconfigurable dual-band operation. The mmWave element is realized by a double-layer patch antenna loaded with 1-bit phase shifters, providing two reconfigurable states. An 8x8 mmWave element array is selectively interconnected using three RF switches to form a reconfigurable sub-6 GHz element at 3.5 GHz. A suspended electromagnetic band gap (EBG) structure is proposed to suppress surface waves and ensure sufficient geometric space for the phase shifter and control networks in the mmWave element. A low-cost planar spiral inductor (PSI) is carefully optimized to connect mmWave elements, enabling the sub-6 GHz function without affecting mmWave operation. Finally, prototypes of the DBI-RIS are fabricated, and experimental verification is conducted using two separate measurement testbeds. The fabricated sub-6 GHz RIS successfully achieves beam steering within the range of -35 to 35 degrees for DBI-RIS with 4x4 sub-6 GHz elements, while the mmWave RIS demonstrates beam steering between -30 to 30 degrees for DBI-RIS with 8x8 mmWave elements, and have good agreement with simulation results. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.13314 [pdf, other]

doi 10.1088/1674-4527/ad5d8b

Simulation Study on Constraining GW Propagation Speed by GW and GRB Joint Observation on Binary Neutron Star Mergers

Authors: Jin-Hui Rao, Shu-Xu Yi, Lian Tao, Qing-Wen Tang

Abstract: Theories of modified gravity suggest that the propagation speed of gravitational wave (GW) $v_g$ may deviate from the speed of light $c$. A constraint can be placed on the difference between $c$ and $v_g$ with a simple method that uses the arrival time delay between GW and electromagnetic (EM) wave simultaneously emitted from a burst event. We simulated the joint observation of GW and short Gamma-… ▽ More Theories of modified gravity suggest that the propagation speed of gravitational wave (GW) $v_g$ may deviate from the speed of light $c$. A constraint can be placed on the difference between $c$ and $v_g$ with a simple method that uses the arrival time delay between GW and electromagnetic (EM) wave simultaneously emitted from a burst event. We simulated the joint observation of GW and short Gamma-Ray burst (sGRB) signals from Binary Neutron Star (BNS) merger events in different observation campaigns, involving advanced LIGO (aLIGO) in design sensitivity and Einstein Telescope (ET) joint-detected with \textit{Fermi}/GBM. As a result, the relative precision of constraint on $v_g$ can reach $\sim 10^{-17}$ (aLIGO) and $\sim 10^{-18}$ (ET), which are one and two orders of magnitude better than that from GW170817, respectively. We continue to obtain the bound of graviton mass $m_g \leq 7.1(3.2)\times 10^{-20}\,$eV with aLIGO (ET). Applying the Standard-Model Extension (SME) test framework, the constraint on $v_g$ allows us to study the Lorentz violation in the nondispersive, nonbirefringent limit of the gravitational sector. We obtain the constraints of the dimensionless isotropic coefficients $\bar{s}_{00}^{(4)}$ at mass dimension $d = 4$, which are $-1\times 10^{-15}< \bar{s}_{00}^{(4)}<9\times 10^{-17}$ for aLIGO and $-4\times 10^{-16}< \bar{s}_{00}^{(4)}<8\times 10^{-18}$ for ET. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.13021 [pdf, other]

IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

Authors: Diji Yang, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Jie Yang, Yi Zhang

Abstract: Although the Retrieval-Augmented Generation (RAG) paradigms can use external knowledge to enhance and ground the outputs of Large Language Models (LLMs) to mitigate generative hallucinations and static knowledge base problems, they still suffer from limited flexibility in adopting Information Retrieval (IR) systems with varying capabilities, constrained interpretability during the multi-round retr… ▽ More Although the Retrieval-Augmented Generation (RAG) paradigms can use external knowledge to enhance and ground the outputs of Large Language Models (LLMs) to mitigate generative hallucinations and static knowledge base problems, they still suffer from limited flexibility in adopting Information Retrieval (IR) systems with varying capabilities, constrained interpretability during the multi-round retrieval process, and a lack of end-to-end optimization. To address these challenges, we propose a novel LLM-centric approach, IM-RAG, that integrates IR systems with LLMs to support multi-round RAG through learning Inner Monologues (IM, i.e., the human inner voice that narrates one's thoughts). During the IM process, the LLM serves as the core reasoning model (i.e., Reasoner) to either propose queries to collect more information via the Retriever or to provide a final answer based on the conversational context. We also introduce a Refiner that improves the outputs from the Retriever, effectively bridging the gap between the Reasoner and IR modules with varying capabilities and fostering multi-round communications. The entire IM process is optimized via Reinforcement Learning (RL) where a Progress Tracker is incorporated to provide mid-step rewards, and the answer prediction is further separately optimized via Supervised Fine-Tuning (SFT). We conduct extensive experiments with the HotPotQA dataset, a popular benchmark for retrieval-based, multi-step question-answering. The results show that our approach achieves state-of-the-art (SOTA) performance while providing high flexibility in integrating IR modules as well as strong interpretability exhibited in the learned inner monologues. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Proceedings of the 47th International ACM SIGIR 2024

arXiv:2404.18413 [pdf, other]

3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Authors: Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang

Abstract: Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT researc… ▽ More Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT research. This paper presents a novel solution to this issue by introducing 3AM, an ambiguity-aware MMT dataset comprising 26,000 parallel sentence pairs in English and Chinese, each with corresponding images. Our dataset is specifically designed to include more ambiguity and a greater variety of both captions and images than other MMT datasets. We utilize a word sense disambiguation model to select ambiguous data from vision-and-language datasets, resulting in a more challenging dataset. We further benchmark several state-of-the-art MMT models on our proposed dataset. Experimental results show that MMT models trained on our dataset exhibit a greater ability to exploit visual information than those trained on other MMT datasets. Our work provides a valuable resource for researchers in the field of multimodal learning and encourages further exploration in this area. The data, code and scripts are freely available at https://github.com/MaxyLee/3AM. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.07503 [pdf, ps, other]

Best Practices and Lessons Learned on Synthetic Data

Authors: Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

Abstract: The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challeng… ▽ More The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models. △ Less

Submitted 10 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: In COLM 2024

arXiv:2403.10504 [pdf, other]

ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment

Authors: Xiaofeng Wu, Jia Rao, Wei Chen

Abstract: The advent of the Transformer architecture has propelled the growth of natural language processing (NLP) models, leading to remarkable achievements in numerous NLP tasks. Yet, the absence of specialized hardware like expansive GPU memory and high-speed interconnects poses challenges for training large-scale models. This makes it daunting for many users to experiment with pre-training and fine-tuni… ▽ More The advent of the Transformer architecture has propelled the growth of natural language processing (NLP) models, leading to remarkable achievements in numerous NLP tasks. Yet, the absence of specialized hardware like expansive GPU memory and high-speed interconnects poses challenges for training large-scale models. This makes it daunting for many users to experiment with pre-training and fine-tuning large language models (LLMs). In this study, we introduce \atom, a resilient distributed training framework designed for asynchronous training of vast models in a decentralized setting using cost-effective hardware, including consumer-grade GPUs and Ethernet. Unlike conventional model partitioning methods that distribute sub-models across GPUs, \atom aims to accommodate a complete LLM on one host (peer) through seamlessly model swapping and concurrently trains multiple copies across various peers to optimize training throughput. Through static analysis, \atom identifies the best model partitioning strategy and flawlessly merges model execution with swapping. Key benefits of \atom include: Avoiding the central point of failure found in pipeline parallelism methods. Demonstrating superior performance and scalability compared to closely-integrated pipeline parallelism in slower networks. Our experiments using different GPT-3 model configurations reveal that, in scenarios with suboptimal network connections, \atom can enhance training efficiency up to $20 \times$ when juxtaposed with the state-of-the-art decentralized pipeline parallelism approaches. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.15275 [pdf, ps, other]

doi 10.1007/s10686-024-09924-0

Simulation Studies for the First Pathfinder of the CATCH Space Mission

Authors: Yiming Huang, Juan Zhang, Lian Tao, Zhengwei Li, Donghua Zhao, Qian-Qing Yin, Xiangyang Wen, Jingyu Xiao, Chen Zhang, Shuang-Nan Zhang, Shaolin Xiong, Qingcui Bu, Jirong Cang, Dezhi Cao, Wen Chen, Siran Ding, Min Gao, Yang Gao, Shujin Hou, Liping Jia, Ge Jin, Dalin Li, Jinsong Li, Panping Li, Yajun Li , et al. (20 additional authors not shown)

Abstract: The Chasing All Transients Constellation Hunters (CATCH) space mission is an intelligent constellation consisting of 126 micro-satellites in three types (A, B, and C), designed for X-ray observation with the objective of studying the dynamic universe. Currently, we are actively developing the first Pathfinder (CATCH-1) for the CATCH mission, specifically for type-A satellites. CATCH-1 is equipped… ▽ More The Chasing All Transients Constellation Hunters (CATCH) space mission is an intelligent constellation consisting of 126 micro-satellites in three types (A, B, and C), designed for X-ray observation with the objective of studying the dynamic universe. Currently, we are actively developing the first Pathfinder (CATCH-1) for the CATCH mission, specifically for type-A satellites. CATCH-1 is equipped with Micro Pore Optics (MPO) and a 4-pixel Silicon Drift Detector (SDD) array. To assess its scientific performance, including the effective area of the optical system, on-orbit background, and telescope sensitivity, we employ the Monte Carlo software Geant4 for simulation in this study. The MPO optics exhibit an effective area of $41$ cm$^2$ at the focal spot for 1 keV X-rays, while the entire telescope system achieves an effective area of $29$ cm$^2$ at 1 keV when taking into account the SDD detector's detection efficiency. The primary contribution to the background is found to be from the Cosmic X-ray Background. Assuming a 625 km orbit with an inclination of $29^\circ$, the total background for CATCH-1 is estimated to be $8.13\times10^{-2}$ counts s$^{-1}$ in the energy range of 0.5--4 keV. Based on the background within the central detector and assuming a Crab-like source spectrum, the estimated ideal sensitivity could achieve $1.9\times10^{-12}$ erg cm$^{-2}$ s$^{-1}$ for an exposure of 10$^4$ s in the energy band of 0.5--4 keV. Furthermore, after simulating the background caused by low-energy charged particles near the geomagnetic equator, we have determined that there is no need to install a magnetic deflector. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14538 [pdf, other]

Interference Produces False-Positive Pricing Experiments

Authors: Lars Roemheld, Justin Rao

Abstract: It is standard practice in online retail to run pricing experiments by randomizing at the article-level, i.e. by changing prices of different products to identify treatment effects. Due to customers' cross-price substitution behavior, such experiments suffer from interference bias: the observed difference between treatment groups in the experiment is typically significantly larger than the global… ▽ More It is standard practice in online retail to run pricing experiments by randomizing at the article-level, i.e. by changing prices of different products to identify treatment effects. Due to customers' cross-price substitution behavior, such experiments suffer from interference bias: the observed difference between treatment groups in the experiment is typically significantly larger than the global effect that could be expected after a roll-out decision of the tested pricing policy. We show in simulations that such bias can be as large as 100%, and report experimental data implying bias of similar magnitude. Finally, we discuss approaches for de-biased pricing experiments, suggesting observational methods as a potentially attractive alternative to clustering. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.08562 [pdf, other]

Higher Layers Need More LoRA Experts

Authors: Chongyang Gao, Kezhen Chen, Jinmeng Rao, Baochen Sun, Ruibo Liu, Daiyi Peng, Yawen Zhang, Xiaoyuan Guo, Jie Yang, VS Subrahmanian

Abstract: Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Re… ▽ More Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts. We investigate several architectures with varying layer-wise expert configurations. Experiments on six well-known NLP and commonsense QA benchmarks demonstrate that MoLA achieves equal or superior performance compared to all baselines. We find that allocating more LoRA experts to higher layers further enhances the effectiveness of models with a certain number of experts in total. With much fewer parameters, this allocation strategy outperforms the setting with the same number of experts in every layer. This work can be widely used as a plug-and-play parameter-efficient tuning approach for various applications. The code is available at https://github.com/GCYZSL/MoLA. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: The code is available at https://github.com/GCYZSL/MoLA

arXiv:2402.07427 [pdf]

Insights into Spatio-temporal dynamics during shock -- droplet flame interaction

Authors: Gautham Vadlamudi, Akhil Aravind, Saini Jatin Rao, Saptarshi Basu

Abstract: The study comprehensively investigates the response of a combusting droplet during its interaction with a high-speed transient flow that is imposed by a coaxially propagating blast wave. The blast wave is generated using a specially designed unique miniature shock generation apparatus that generates blast waves using the wire-explosion technique which facilitates a wide range of shock Mach number… ▽ More The study comprehensively investigates the response of a combusting droplet during its interaction with a high-speed transient flow that is imposed by a coaxially propagating blast wave. The blast wave is generated using a specially designed unique miniature shock generation apparatus that generates blast waves using the wire-explosion technique which facilitates a wide range of shock Mach number (1.03 < Ms < 1.8). The experiments are performed in two configurations: Open field blast wave and focused blast wave. The charging voltage and the configuration determine the shock Mach number (Ms) and flow characteristics. The flame is found to exhibit two major response patterns: partial extinction followed by re-ignition and full extinction. Simultaneously, the droplet also interacts with the flow imposed by the blast wave exhibiting different modes of response ranging from pure deformation, Rayleigh-Taylor piercing bag breakup, and shear-induced stripping. The KH instability is exhibited along the windward side interface of the droplet during the interaction with the blast wave decay profile which gets aggravated when the induced flow interaction ensues. Increasing the Mach number (Ms > 1.1) makes the droplet flame more vulnerable to extinction. However, the flame exhibits stretching and shedding, followed by re-ignition at lower Mach numbers (Ms < 1.06). In all cases, the flame base lifts off in response to the imposed flow, and the advection of the flame base interacting with the flame tip results in flame extinction. The entire interaction occurs in two stages: 1) interaction with the blast wave and the decaying velocity profile associated with it, and 2) interaction with the induced flow behind the blast wave as a result of the entrainment (delayed response). The criteria for partial and complete extinction of flame have been postulated which is in good agreement with the experiments. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.04710 [pdf, other]

Incorporating Retrieval-based Causal Learning with Information Bottlenecks for Interpretable Graph Neural Networks

Authors: Jiahua Rao, Jiancong Xie, Hanjing Lin, Shuangjia Zheng, Zhen Wang, Yuedong Yang

Abstract: Graph Neural Networks (GNNs) have gained considerable traction for their capability to effectively process topological data, yet their interpretability remains a critical concern. Current interpretation methods are dominated by post-hoc explanations to provide a transparent and intuitive understanding of GNNs. However, they have limited performance in interpreting complicated subgraphs and can't u… ▽ More Graph Neural Networks (GNNs) have gained considerable traction for their capability to effectively process topological data, yet their interpretability remains a critical concern. Current interpretation methods are dominated by post-hoc explanations to provide a transparent and intuitive understanding of GNNs. However, they have limited performance in interpreting complicated subgraphs and can't utilize the explanation to advance GNN predictions. On the other hand, transparent GNN models are proposed to capture critical subgraphs. While such methods could improve GNN predictions, they usually don't perform well on explanations. Thus, it is desired for a new strategy to better couple GNN explanation and prediction. In this study, we have developed a novel interpretable causal GNN framework that incorporates retrieval-based causal learning with Graph Information Bottleneck (GIB) theory. The framework could semi-parametrically retrieve crucial subgraphs detected by GIB and compress the explanatory subgraphs via a causal module. The framework was demonstrated to consistently outperform state-of-the-art methods, and to achieve 32.71\% higher precision on real-world explanation scenarios with diverse explanation types. More importantly, the learned explanations were shown able to also improve GNN prediction performance. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2401.13154 [pdf, other]

Nomad: Non-Exclusive Memory Tiering via Transactional Page Migration

Authors: Lingfeng Xiang, Zhen Lin, Weishu Deng, Hui Lu, Jia Rao, Yifan Yuan, Ren Wang

Abstract: With the advent of byte-addressable memory devices, such as CXL memory, persistent memory, and storage-class memory, tiered memory systems have become a reality. Page migration is the de facto method within operating systems for managing tiered memory. It aims to bring hot data whenever possible into fast memory to optimize the performance of data accesses while using slow memory to accommodate da… ▽ More With the advent of byte-addressable memory devices, such as CXL memory, persistent memory, and storage-class memory, tiered memory systems have become a reality. Page migration is the de facto method within operating systems for managing tiered memory. It aims to bring hot data whenever possible into fast memory to optimize the performance of data accesses while using slow memory to accommodate data spilled from fast memory. While the existing research has demonstrated the effectiveness of various optimizations on page migration, it falls short of addressing a fundamental question: Is exclusive memory tiering, in which a page is either present in fast memory or slow memory, but not both simultaneously, the optimal strategy for tiered memory management? We demonstrate that page migration-based exclusive memory tiering suffers significant performance degradation when fast memory is under pressure. In this paper, we propose non-exclusive memory tiering, a page management strategy that retains a copy of pages recently promoted from slow memory to fast memory to mitigate memory thrashing. To enable non-exclusive memory tiering, we develop Nomad, a new page management mechanism for Linux that features transactional page migration and page shadowing. Nomad helps remove page migration off the critical path of program execution and makes migration completely asynchronous. Evaluations with carefully crafted micro-benchmarks and real-world applications show that Nomad is able to achieve up to 6x performance improvement over the state-of-the-art transparent page placement (TPP) approach in Linux when under memory pressure. We also compare Nomad with a recently proposed hardware-assisted, access sampling-based page migration approach and demonstrate Nomad's strengths and potential weaknesses in various scenarios. △ Less

Submitted 17 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.12068 [pdf, other]

Resource-constrained stereo singing voice cancellation

Authors: Clara Borrelli, James Rae, Dogac Basaran, Matt McVicar, Mehrez Souden, Matthias Mauch

Abstract: We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited a… ▽ More We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited and singing voice processing has to run with limited look-ahead. In practice, this is realised by adapting an existing mono model to handle stereo input. Improvements in quality are obtained by tuning model parameters and expanding the training set. Moreover, we highlight the benefits a stereo model brings by introducing a new metric which detects attenuation inconsistencies between channels. Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial, confirming the effectiveness of our techniques in stringent listening tests. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.07462 [pdf, other]

doi 10.1140/epjc/s10052-024-12770-1

Nonproportionality of NaI(Tl) Scintillation Detector for Dark Matter Search Experiments

Authors: S. M. Lee, G. Adhikari, N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Fran. a, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, S. W. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim , et al. (37 additional authors not shown)

Abstract: We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced… ▽ More We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced by decays supported by both long and short-lived isotopes. Analyzing peaks from decays supported only by short-lived isotopes presented a unique challenge due to their limited statistics and overlapping energies, which was overcome by long-term data collection and a time-dependent analysis. A key achievement is the direct measurement of the 0.87 keV light yield, resulting from the cascade following electron capture decay of $^{22}$Na from internal contamination. This measurement, previously accessible only indirectly, deepens our understanding of NaI(Tl) scintillator behavior in the region of interest for dark matter searches. This study holds substantial implications for background modeling and the interpretation of dark matter signals in NaI(Tl) experiments. △ Less

Submitted 10 May, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: 12 pages, 7 figures

Journal ref: Eur. Phys. J. C 84 (2024) 484

arXiv:2401.07032 [pdf, other]

Interaction of Vortex Ring with Perforated V-Wall

Authors: Siddhant Jain, Saini Jatin Rao, Saptarshi Basu

Abstract: Experiments are performed to investigate the interaction of a vortex ring (Reynolds number based on circulation (Re gamma = 11500) with perforated surface (open area ratio, phi1 = 0.24 and phi2 = 0.44) with different included angles (theta = 60deg - 180deg ). The phenomenon is characterized using techniques like Planer Laser-Induced Fluorescence (PLIF) imaging and Particle Image Velocimetry (PIV).… ▽ More Experiments are performed to investigate the interaction of a vortex ring (Reynolds number based on circulation (Re gamma = 11500) with perforated surface (open area ratio, phi1 = 0.24 and phi2 = 0.44) with different included angles (theta = 60deg - 180deg ). The phenomenon is characterized using techniques like Planer Laser-Induced Fluorescence (PLIF) imaging and Particle Image Velocimetry (PIV). Lagrangian analysis using finite-time Lyapunov exponents (FTLE) and Gamma_2 vortex identification methods are utilised to understand the flow physics. Early observations reveal the growth of induced mushroom structures through the holes as a consequence of placing the perforated surface in the path of the vortex ring. These structures along with Kelvin-Helmholtz (K-H) instability imparts the initial instability to the emerging jets. We discern a sequential emergence of the vortex ring in the form of jets at lower theta value that diminishes at higher values. Except for theta = 150deg cases, where the flow from the two halves starts to talk resulting in a divergence in the circulation ratio, a reformed vortex ring is formed for all cases in the far downstream. A detailed discussion on the downstream vorticity dynamics has been been provided using vorticity contours, time-series variation of circulation and the FTLE fields. By varying the value of theta, we present a more generalised study of vortex ring interacting with perforated surfaces that finds application in multiple domains including flow control, manipulation and vortical cleaning. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.07027 [pdf, other]

Dynamics of Soap Bubble Inflation

Authors: Saini Jatin Rao, Siddhant Jain, Saptarshi Basu

Abstract: Bubbles have always captivated our curiosity with their aesthetics and complexities alike. While the act of blowing bubbles is familiar to everyone, the underlying physics of these fleeting spheres often eludes reasoning. In this letter, we discuss the dynamics of inflating a soap bubble using controlled airflow through a film-coated nozzle. We assess and predict the rate of inflation by varying t… ▽ More Bubbles have always captivated our curiosity with their aesthetics and complexities alike. While the act of blowing bubbles is familiar to everyone, the underlying physics of these fleeting spheres often eludes reasoning. In this letter, we discuss the dynamics of inflating a soap bubble using controlled airflow through a film-coated nozzle. We assess and predict the rate of inflation by varying the source pressure. Visualising the previously unexplored internal flow reveals that air enters the bubble as a round jet, emerging from the nozzle opening and impinges on the expanding concave bubble interface to form a toroidal vortex. Several scaling laws of the associated vortical flow spanning the entire bubble and the vortex core are reported. The observed dynamics of this bubble-confined vortex ring formation indicate universality in certain aspects when compared to the free laminar vortex rings. △ Less

Submitted 8 February, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

arXiv:2312.12130 [pdf, other]

Cumulants and ordering of their ratios in 2D Potts models: Lessons for QCD?

Authors: Rajiv V. Gavai, Bedangadas Mohanty, Jaydev Singh Rao, Swati Saha

Abstract: Theoretical considerations suggest an ordering of the ratios of net-baryon number fluctuations in the vicinity of the transition from the low-temperature hadronic phase to the high temperature quark-gluon plasma phase at small values of the baryon chemical potential, $μ_B$, in the QCD phase diagram. The ordering hierarchy is $\frac{χ_6}{χ_2} < \frac{χ_5}{χ_1} < \frac{χ_4}{χ_2} < \frac{χ_3}{χ_1}$,… ▽ More Theoretical considerations suggest an ordering of the ratios of net-baryon number fluctuations in the vicinity of the transition from the low-temperature hadronic phase to the high temperature quark-gluon plasma phase at small values of the baryon chemical potential, $μ_B$, in the QCD phase diagram. The ordering hierarchy is $\frac{χ_6}{χ_2} < \frac{χ_5}{χ_1} < \frac{χ_4}{χ_2} < \frac{χ_3}{χ_1}$, where $χ_n$ is the $n^\mathrm{th}$ order cumulant of net-baryon number fluctuation. The STAR experiment observed this hierarchy in the ordering of cumulant ratios of net-proton number (a proxy of net-baryon number) for a range of colliding energies. These inequalities can be tested in spin models by taking the corresponding order parameters in the model as an analog of baryon density. We employed two different models: the two-state and three-state Potts models in two dimensions, which undergo a transition from an ordered phase to a disordered phase at their respective critical temperature. Simulations were performed on square lattices of different sizes using the Wolff algorithm. The cumulants of total magnetization are obtained up to the sixth order in both of these models in a temperature range near their corresponding critical temperatures. With increasing lattice size, height (trough) of the peaks (dips) of the higher-order cumulants appears to increase with the increase in the order of the cumulants. Except in a narrow range above the critical temperature of the three-state Potts model, the complete inequality or its complete reverse is not satisfied in the temperature ranges simulated. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.05752 [pdf, other]

Camera-based 3D Semantic Scene Completion with Sparse Guidance Network

Authors: Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Xiangrui Zhao, Jongwon Ra, Laijian Li, Yong Liu

Abstract: Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D mod… ▽ More Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to directly process the lifted 3D features that are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose an end-to-end camera-based SSC framework, termed SGN, to diffuse semantics from the semantic- and occupancy-aware seed voxels to the whole scene based on geometry prior and occupancy information. By designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial occupancy and geometry priors, we enhance the feature separation between different categories and expedite the convergence of semantic diffusion. Extensive experimental results on the SemanticKITTI dataset demonstrate the superiority of our SGN over existing state-of-the-art methods. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.01151 [pdf]

doi 10.5281/zenodo.8286277

Here Is Not There: Measuring Entailment-Based Trajectory Similarity for Location-Privacy Protection and Beyond

Authors: Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi, Jinmeng Rao, Song Gao, Ling Cai, Anita Graser

Abstract: While the paths humans take play out in social as well as physical space, measures to describe and compare their trajectories are carried out in abstract, typically Euclidean, space. When these measures are applied to trajectories of actual individuals in an application area, alterations that are inconsequential in abstract space may suddenly become problematic once overlaid with geographic realit… ▽ More While the paths humans take play out in social as well as physical space, measures to describe and compare their trajectories are carried out in abstract, typically Euclidean, space. When these measures are applied to trajectories of actual individuals in an application area, alterations that are inconsequential in abstract space may suddenly become problematic once overlaid with geographic reality. In this work, we present a different view on trajectory similarity by introducing a measure that utilizes logical entailment. This is an inferential perspective that considers facts as triple statements deduced from the social and environmental context in which the travel takes place, and their practical implications. We suggest a formalization of entailment-based trajectory similarity, measured as the overlapping proportion of facts, which are spatial relation statements in our case study. With the proposed measure, we evaluate LSTM-TrajGAN, a privacy-preserving trajectory-generation model. The entailment-based model evaluation reveals potential consequences of disregarding the rich structure of geographic space (e.g., miscalculated insurance risk due to regional shifts in our toy example). Our work highlights the advantage of applying logical entailment to trajectory-similarity reasoning for location-privacy protection and beyond. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.05010 [pdf, other]

doi 10.1016/j.astropartphys.2024.102945

Alpha backgrounds in NaI(Tl) crystals of COSINE-100

Authors: G. Adhikari, N. Carlin, D. F. F. S. Cavalcante, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, S. W. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim , et al. (38 additional authors not shown)

Abstract: COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Ca… ▽ More COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Carlo simulation results and activity quantification of the alpha decay components of the COSINE-100 NaI(Tl) crystals. The data strongly indicate that the alpha decays probabilistically undergo two possible quenching factors but require further investigation. The fitted results are consistent with independent measurements and improve the overall understanding of the COSINE-100 backgrounds. Furthermore, the half-life of 216Po has been measured to be 143.4 +/- 1.2 ms, which is consistent with and more precise than recent measurements. △ Less

Submitted 30 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.13248 [pdf, other]

doi 10.1145/3615886.3627742

FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Authors: Yuxiao Qu, Jinmeng Rao, Song Gao, Qianheng Zhang, Wei-Lun Chao, Yu Su, Michelle Miller, Alfonso Morales, Patrick Huber

Abstract: Understanding and measuring the resilience of food supply networks is a global imperative to tackle increasing food insecurity. However, the complexity of these networks, with their multidimensional interactions and decisions, presents significant challenges. This paper proposes FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, designed to overcome these challenge… ▽ More Understanding and measuring the resilience of food supply networks is a global imperative to tackle increasing food insecurity. However, the complexity of these networks, with their multidimensional interactions and decisions, presents significant challenges. This paper proposes FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, designed to overcome these challenges and enhance the analysis of geospatial resilience of multicommodity food flow network, which is one type of spatial networks. FLEE-GNN addresses the limitations of current methodologies, such as entropy-based methods, in terms of generalizability, scalability, and data privacy. It combines the robustness and adaptability of graph neural networks with the privacy-conscious and decentralized aspects of federated learning on food supply network resilience analysis across geographical regions. This paper also discusses FLEE-GNN's innovative data generation techniques, experimental designs, and future directions for improvement. The results show the advancements of this approach to quantifying the resilience of multicommodity food flow networks, contributing to efforts towards ensuring global food security using AI methods. The developed FLEE-GNN has the potential to be applied in other spatial networks with spatially heterogeneous sub-network distributions. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 10 pages, 5 figures

ACM Class: I.2

Journal ref: ACM SIGSPATIAL GeoAI 2023

arXiv:2310.05286 [pdf, other]

Generalizable Error Modeling for Search Relevance Data Annotation Tasks

Authors: Heinrich Peters, Alireza Hashemi, James Rae

Abstract: Human data annotation is critical in shaping the quality of machine learning (ML) and artificial intelligence (AI) systems. One significant challenge in this context is posed by annotation errors, as their effects can degrade the performance of ML models. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML… ▽ More Human data annotation is critical in shaping the quality of machine learning (ML) and artificial intelligence (AI) systems. One significant challenge in this context is posed by annotation errors, as their effects can degrade the performance of ML models. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML applications (music streaming, video streaming, and mobile apps) and assesses its potential to enhance the quality and efficiency of the data annotation process. Drawing on real-world data from an extensive search relevance annotation program, we illustrate that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications (i.e., a global, task-agnostic model performs on par with task-specific models). We present model explainability analyses to identify which types of features are the main drivers of predictive performance. Additionally, we demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors (e.g., 40% efficiency gains for the music streaming application). These results underscore that automated error detection models can yield considerable improvements in the efficiency and quality of data annotation processes. Thus, our findings reveal critical insights into effective error management in the data annotation process, thereby contributing to the broader field of human-in-the-loop ML. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.00413 [pdf, other]

SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution

Authors: Gengchen Mai, Ni Lao, Weiwei Sun, Yuchi Ma, Jiaming Song, Chenlin Meng, Hongxu Ma, Jinmeng Rao, Ziyuan Li, Stefano Ermon

Abstract: Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolu… ▽ More Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolutions. To address this challenge, we propose Spatial-Spectral Implicit Function (SSIF), a neural implicit model that represents an image as a function of both continuous pixel coordinates in the spatial domain and continuous wavelengths in the spectral domain. We empirically demonstrate the effectiveness of SSIF on two challenging spatio-spectral super-resolution benchmarks. We observe that SSIF consistently outperforms state-of-the-art baselines even when the baselines are allowed to train separate models at each spectral resolution. We show that SSIF generalizes well to both unseen spatial resolutions and spectral resolutions. Moreover, SSIF can generate high-resolution images that improve the performance of downstream tasks (e.g., land use classification) by 1.7%-7%. △ Less

Submitted 30 September, 2023; originally announced October 2023.

MSC Class: 68T07; 68T45 ACM Class: I.4.10; I.2.10; I.4.6

arXiv:2309.17319 [pdf]

doi 10.1145/3589132.3625611

Building Privacy-Preserving and Secure Geospatial Artificial Intelligence Foundation Models

Authors: Jinmeng Rao, Song Gao, Gengchen Mai, Krzysztof Janowicz

Abstract: In recent years we have seen substantial advances in foundation models for artificial intelligence, including language, vision, and multimodal models. Recent studies have highlighted the potential of using foundation models in geospatial artificial intelligence, known as GeoAI Foundation Models, for geographic question answering, remote sensing image understanding, map generation, and location-bas… ▽ More In recent years we have seen substantial advances in foundation models for artificial intelligence, including language, vision, and multimodal models. Recent studies have highlighted the potential of using foundation models in geospatial artificial intelligence, known as GeoAI Foundation Models, for geographic question answering, remote sensing image understanding, map generation, and location-based services, among others. However, the development and application of GeoAI foundation models can pose serious privacy and security risks, which have not been fully discussed or addressed to date. This paper introduces the potential privacy and security risks throughout the lifecycle of GeoAI foundation models and proposes a comprehensive blueprint for research directions and preventative and control strategies. Through this vision paper, we hope to draw the attention of researchers and policymakers in geospatial domains to these privacy and security risks inherent in GeoAI foundation models and advocate for the development of privacy-preserving and secure GeoAI foundation models. △ Less

Submitted 12 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: 1 figure

ACM Class: I.2.0

Journal ref: ACM SIGSPATIAL 2023

arXiv:2309.12916 [pdf, other]

Meso-scale size effects of material heterogeneities on crack propagation in brittle solids: Perspectives from phase-field simulations

Authors: Liuchi Li, Jack Rao, Todd Hufnagel, KT Ramesh

Abstract: Brittle solids are often toughened by adding a second-phase material. This practice often results in composites with material heterogeneities on the meso scale: large compared to the scale of the process zone but small compared to that of the application. The specific configuration (both geometrical and mechanical) of this mesoscale heterogeneity is generally recognized as important in determining… ▽ More Brittle solids are often toughened by adding a second-phase material. This practice often results in composites with material heterogeneities on the meso scale: large compared to the scale of the process zone but small compared to that of the application. The specific configuration (both geometrical and mechanical) of this mesoscale heterogeneity is generally recognized as important in determining crack propagation and, subsequently, the (effective) toughness of the composite. Here, we systematically investigate how dynamic crack propagation is affected by mesoscale heterogeneities taking the form of an array of inclusions. Using a variational phase-field approach, we compute the apparent crack speed and fracture energy dissipation rate to compare crack propagation under Mode-I loading across different configurations of these inclusions. If fixing the volume fraction of inclusions, matching the inclusion size to the K-dominance zone size gives rise to the best toughening outcome. Conversely, if varying the volume fraction of inclusions, a lower volume fraction configuration can lead to a better toughening outcome if and only if the inclusion size approaches from above the size of the K-dominance zone. Since the size of the K-dominance zone can be estimated \textit{a priori} given an understanding of the application scenario and material availability, we can, in principle, exploit this estimation to design a material's mesoscale heterogeneity that optimally balances the tradeoff between strength and toughness. This paves the way for realizing functional (meta-)materials against crack propagation in extreme environments. △ Less

Submitted 19 February, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.11587 [pdf, other]

CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches

Authors: Jinmeng Rao, Song Gao, Sijia Zhu

Abstract: The prevalence of ubiquitous location-aware devices and mobile Internet enables us to collect massive individual-level trajectory dataset from users. Such trajectory big data bring new opportunities to human mobility research but also raise public concerns with regard to location privacy. In this work, we present the Conditional Adversarial Trajectory Synthesis (CATS), a deep-learning-based GeoAI… ▽ More The prevalence of ubiquitous location-aware devices and mobile Internet enables us to collect massive individual-level trajectory dataset from users. Such trajectory big data bring new opportunities to human mobility research but also raise public concerns with regard to location privacy. In this work, we present the Conditional Adversarial Trajectory Synthesis (CATS), a deep-learning-based GeoAI methodological framework for privacy-preserving trajectory data generation and publication. CATS applies K-anonymity to the underlying spatiotemporal distributions of human movements, which provides a distributional-level strong privacy guarantee. By leveraging conditional adversarial training on K-anonymized human mobility matrices, trajectory global context learning using the attention-based mechanism, and recurrent bipartite graph matching of adjacent trajectory points, CATS is able to reconstruct trajectory topology from conditionally sampled locations and generate high-quality individual-level synthetic trajectory data, which can serve as supplements or alternatives to raw data for privacy-preserving trajectory data publication. The experiment results on over 90k GPS trajectories show that our method has a better performance in privacy preservation, spatiotemporal characteristic preservation, and downstream utility compared with baseline methods, which brings new insights into privacy-preserving human mobility research using generative AI techniques and explores data ethics issues in GIScience. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 9 figures, 4 figures

ACM Class: I.2

Journal ref: International Journal of Geographical Information Science; 2023

arXiv:2309.04041 [pdf, other]

Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models

Authors: Jiaying Lu, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Baochen Sun, Carl Yang, Jie Yang

Abstract: Large Vision-Language Models (LVLMs) offer remarkable benefits for a variety of vision-language tasks. However, a challenge hindering their application in real-world scenarios, particularly regarding safety, robustness, and reliability, is their constrained semantic grounding ability, which pertains to connecting language to the physical-world entities or concepts referenced in images. Therefore,… ▽ More Large Vision-Language Models (LVLMs) offer remarkable benefits for a variety of vision-language tasks. However, a challenge hindering their application in real-world scenarios, particularly regarding safety, robustness, and reliability, is their constrained semantic grounding ability, which pertains to connecting language to the physical-world entities or concepts referenced in images. Therefore, a crucial need arises for a comprehensive study to assess the semantic grounding ability of widely used LVLMs. Despite the significance, sufficient investigation in this direction is currently lacking. Our work bridges this gap by designing a pipeline for generating large-scale evaluation datasets covering fine-grained semantic information, such as color, number, material, etc., along with a thorough assessment of seven popular LVLMs' semantic grounding ability. Results highlight prevalent misgrounding across various aspects and degrees. To address this issue, we propose a data-centric enhancement method that aims to improve LVLMs' semantic grounding ability through multimodal instruction tuning on fine-grained conversations. Experiments on enhanced LVLMs demonstrate notable improvements in addressing misgrounding issues. △ Less

Submitted 12 January, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: This paper has been accepted to the AAAI'24 Workshop on Responsible Language Models (ReLM 2024)

arXiv:2308.12898 [pdf, other]

Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

Authors: Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding

Abstract: The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic. However, there have been few endeavors dedicated to the exploration of 1) whether essential linguistic knowledge (e.g., semantics and syntax) can… ▽ More The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic. However, there have been few endeavors dedicated to the exploration of 1) whether essential linguistic knowledge (e.g., semantics and syntax) can be extracted during VLP, and 2) how such linguistic knowledge impact or enhance the multimodal alignment. In response, here we aim to elucidate the impact of comprehensive linguistic knowledge, including semantic expression and syntactic structure, on multimodal alignment. Specifically, we design and release the SNARE, the first large-scale multimodal alignment probing benchmark, to detect the vital linguistic components, e.g., lexical, semantic, and syntax knowledge, containing four tasks: Semantic structure, Negation logic, Attribute ownership, and Relationship composition. Based on our proposed probing benchmarks, our holistic analyses of five advanced VLP models illustrate that the VLP model: i) shows insensitivity towards complex syntax structures and relies on content words for sentence comprehension; ii) demonstrates limited comprehension of combinations between sentences and negations; iii) faces challenges in determining the presence of actions or spatial relationships within visual information and struggles with verifying the correctness of triple combinations. We make our benchmark and code available at \url{https://github.com/WangFei-2019/SNARE/}. △ Less

Submitted 25 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: [TL;DR] we design and release the SNARE, the first large-scale multimodal alignment probing benchmark for current vision-language pretrained models

arXiv:2308.09970 [pdf, other]

Tackling Vision Language Tasks Through Learning Inner Monologues

Authors: Diji Yang, Kezhen Chen, Jinmeng Rao, Xiaoyuan Guo, Yawen Zhang, Jie Yang, Yi Zhang

Abstract: Visual language tasks require AI models to comprehend and reason with both visual and textual content. Driven by the power of Large Language Models (LLMs), two prominent methods have emerged: (1) the hybrid integration between LLMs and Vision-Language Models (VLMs), where visual inputs are firstly converted into language descriptions by VLMs, serving as inputs for LLMs to generate final answer(s);… ▽ More Visual language tasks require AI models to comprehend and reason with both visual and textual content. Driven by the power of Large Language Models (LLMs), two prominent methods have emerged: (1) the hybrid integration between LLMs and Vision-Language Models (VLMs), where visual inputs are firstly converted into language descriptions by VLMs, serving as inputs for LLMs to generate final answer(s); (2) visual feature alignment in language space, where visual inputs are encoded as embeddings and projected to LLMs' language space via further supervised fine-tuning. The first approach provides light training costs and interpretability but is hard to be optimized in an end-to-end fashion. The second approach presents decent performance, but feature alignment usually requires large amounts of training data and lacks interpretability. To tackle this dilemma, we propose a novel approach, Inner Monologue Multi-Modal Optimization (IMMO), to solve complex vision language problems by simulating inner monologue processes, a cognitive process in which an individual engages in silent verbal communication with themselves. We enable LLMs and VLMs to interact through natural language conversation and propose to use a two-stage training process to learn how to do the inner monologue (self-asking questions and answering questions). IMMO is evaluated on two popular tasks and the results suggest by emulating the cognitive phenomenon of internal dialogue, our approach can enhance reasoning and explanation abilities, contributing to the more effective fusion of vision and language models. More importantly, instead of using predefined human-crafted monologues, IMMO learns this process within the deep learning models, promising wider applicability to many different AI problems beyond vision language tasks. △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2307.10678 [pdf, other]

Depth from Defocus Technique: A Simple Calibration-Free Approach for Dispersion Size Measurement

Authors: Saini Jatin Rao, Shubham Sharma, Saptarshi Basu, Cameron Tropea

Abstract: Particle size measurement is crucial in various applications, be it sizing droplets in inkjet printing or respiratory events, tracking particulate ejection in hypersonic impacts, or detecting floating target markers in free surface flows. Such systems are characterised by extracting quantitative information like size, position, velocity and number density of the dispersed particles, which is typic… ▽ More Particle size measurement is crucial in various applications, be it sizing droplets in inkjet printing or respiratory events, tracking particulate ejection in hypersonic impacts, or detecting floating target markers in free surface flows. Such systems are characterised by extracting quantitative information like size, position, velocity and number density of the dispersed particles, which is typically non-trivial. The existing methods like phase Doppler or digital holography offer precise estimates at the expense of complicated systems, demanding significant expertise. We present a novel volumetric measurement approach for estimating the size and position of dispersed spherical particles that utilises a unique 'Depth from Defocus' (DFD) technique with a single camera. The calibration free sizing enables in-situ examination of hard to measure systems, including naturally occurring phenomena like pathogenic aerosols, pollen dispersion or raindrops. The efficacy of the technique is demonstrated for diverse sparse dispersions, including dots, glass beads, spray droplets, and pollen grains. The simple optical configuration and semi-autonomous calibration procedure make the method readily deployable and accessible, with a scope of applicability across vast research horizons. △ Less

Submitted 3 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.09814 [pdf, other]

doi 10.1103/PhysRevD.108.092006

Search for inelastic WIMP-iodine scattering with COSINE-100

Authors: G. Adhikari, N. Carlin, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, J. H. Jo, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, Y. J. Ko, D. H. Lee , et al. (34 additional authors not shown)

Abstract: We report the results of a search for inelastic scattering of weakly interacting massive particles (WIMPs) off $^{127}$I nuclei using NaI(Tl) crystals with a data exposure of 97.7 kg$\cdot$years from the COSINE-100 experiment. The signature of inelastic WIMP-$^{127}$I scattering is a nuclear recoil accompanied by a 57.6 keV $γ$-ray from the prompt deexcitation, producing a more energetic signal co… ▽ More We report the results of a search for inelastic scattering of weakly interacting massive particles (WIMPs) off $^{127}$I nuclei using NaI(Tl) crystals with a data exposure of 97.7 kg$\cdot$years from the COSINE-100 experiment. The signature of inelastic WIMP-$^{127}$I scattering is a nuclear recoil accompanied by a 57.6 keV $γ$-ray from the prompt deexcitation, producing a more energetic signal compared to the typical WIMP nuclear recoil signal. We found no evidence for this inelastic scattering signature and set a 90 $\%$ confidence level upper limit on the WIMP-proton spin-dependent, inelastic scattering cross section of $1.2 \times 10^{-37} {\rm cm^{2}}$ at the WIMP mass 500 ${\rm GeV/c^{2}}$. △ Less

Submitted 30 October, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: 8 pages, 5 figures. arXiv admin note: text overlap with arXiv:2104.03537

Journal ref: Phys. Rev. D 108, 092006 (2023)

arXiv:2306.14657 [pdf, other]

A Diversity Analysis of Safety Metrics Comparing Vehicle Performance in the Lead-Vehicle Interaction Regime

Authors: Harnarayan Singh, Bowen Weng, Sughosh J. Rao, Devin Elsasser

Abstract: Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metr… ▽ More Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metrics for vehicle safety performance assessment and justification. This issue gets further amplified with the evolving interest in Advanced Driver Assistance Systems and Automated Driving Systems. In this paper, the authors seek to perform a unified study that facilitates an improved community-wide understanding of vehicle performance metrics using the lead-vehicle interaction operational design domain as a common means of performance comparison. In particular, the authors study the diversity (including constructive formulation discrepancies and empirical performance differences) among 33 base metrics with up to 51 metric variants (with different choices of hyper-parameters) in the existing literature, published between 1967 and 2022. Two data sets are adopted for the empirical performance diversity analysis, including vehicle trajectories from normal highway driving environment and relatively high-risk incidents with collisions and near-miss cases. The analysis further implies that (i) the conceptual acceptance of a safety metric proposal can be problematic if the assumptions, conditions, and types of outcome assurance are not justified properly, and (ii) the empirical performance justification of an acceptable metric can also be problematic as a dominant consensus is not observed among metrics empirically. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: A modified manuscript of this preprint has been accepted to be published as a regular paper at IEEE Transactions on Intelligent Transportation Systems

arXiv:2306.04907 [pdf, other]

Estimation of Poverty Measures for Small Areas Under a Two-Fold Nested Error Linear Regression Model: Comparison of Two Methods

Authors: Maryam Sohrabi, J. N. K. Rao

Abstract: Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular pove… ▽ More Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular poverty incidence and poverty gap called FGT measures, using a simulated census method, called ELL, based on a one-fold nested error model for a suitable transformation of the welfare variable. Modified ELL methods leading to significant gain in efficiency over ELL also have been proposed under the one-fold model. An advantage of ELL and modified ELL methods is that distributional assumptions on the random effects in the model are not needed. In this paper, we extend ELL and modified ELL to two-fold nested error models to estimate poverty indicators for areas (say a state) and subareas (say counties within a state). Our simulation results indicate that the modified ELL estimators lead to large efficiency gains over ELL at the area level and subarea level. Further, modified ELL method retaining both area and subarea estimated effects in the model (called MELL2) performs significantly better in terms of mean squared error (MSE) for sampled subareas than the modified ELL retaining only estimated area effect in the model (called MELL1). △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.04348 [pdf, other]

Non-Hermitian Topological Magnonics

Authors: Tao Yu, Ji Zou, Bowen Zeng, J. W. Rao, Ke Xia

Abstract: Dissipation in mechanics, optics, acoustics, and electronic circuits is nowadays recognized to be not always detrimental but can be exploited to achieve non-Hermitian topological phases or properties with functionalities for potential device applications. As elementary excitations of ordered magnetic moments that exist in various magnetic materials, magnons are the information carriers in magnonic… ▽ More Dissipation in mechanics, optics, acoustics, and electronic circuits is nowadays recognized to be not always detrimental but can be exploited to achieve non-Hermitian topological phases or properties with functionalities for potential device applications. As elementary excitations of ordered magnetic moments that exist in various magnetic materials, magnons are the information carriers in magnonic devices with low-energy consumption for reprogrammable logic, non-reciprocal communication, and non-volatile memory functionalities. Non-Hermitian topological magnonics deals with the engineering of dissipation and/or gain for non-Hermitian topological phases or properties in magnets that are not achievable in the conventional Hermitian scenario, with associated functionalities cross-fertilized with their electronic, acoustic, optic, and mechanic counterparts, such as giant enhancement of magnonic frequency combs, magnon amplification, (quantum) sensing of the magnetic field with unprecedented sensitivity, magnon accumulation, and perfect absorption of microwaves. In this review article, we address the unified approach in constructing magnonic non-Hermitian Hamiltonian, introduce the basic non-Hermitian topological physics, and provide a comprehensive overview of the recent theoretical and experimental progress towards achieving distinct non-Hermitian topological phases or properties in magnonic devices, including exceptional points, exceptional nodal phases, non-Hermitian magnonic SSH model, and non-Hermitian skin effect. We emphasize the non-Hermitian Hamiltonian approach based on the Lindbladian or self-energy of the magnonic subsystem but address the physics beyond it as well, such as the crucial quantum jump effect in the quantum regime and non-Markovian dynamics. We provide a perspective for future opportunities and challenges before concluding this article. △ Less

Submitted 9 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 101 pages, 35 figures

arXiv:2306.02120 [pdf, other]

Giant Enhancement of Magnonic Frequency Combs by Exceptional Points

Authors: Congyi Wang, Jinwei Rao, Zhijian Chen, Kaixin Zhao, Liaoxin Sun, Bimu Yao, Tao Yu, Yi-Pu Wang, Wei Lu

Abstract: With their incomparable time-frequency accuracy, frequency combs have significantly advanced precision spectroscopy, ultra-sensitive detection, and atomic clocks. Traditional methods to create photonic, phononic, and magnonic frequency combs hinge on material nonlinearities which are often weak, necessitating high power densities to surpass their initiation thresholds, which subsequently limits th… ▽ More With their incomparable time-frequency accuracy, frequency combs have significantly advanced precision spectroscopy, ultra-sensitive detection, and atomic clocks. Traditional methods to create photonic, phononic, and magnonic frequency combs hinge on material nonlinearities which are often weak, necessitating high power densities to surpass their initiation thresholds, which subsequently limits their applications. Here, we introduce a novel nonlinear process to efficiently generate magnonic frequency combs (MFCs) by exploiting exceptional points (EPs) in a coupled system comprising a pump-induced magnon mode and a Kittel mode. Even without any cavity, our method greatly improves the efficiency of nonlinear frequency conversion and achieves optimal MFCs at low pump power. Additionally, our novel nonlinear process enables excellent tunability of EPs using the polarization and power of the pump, simplifying MFC generation and manipulation. Our work establishes a synergistic relationship between non-Hermitian physics and MFCs, which is advantages for coherent/quantum information processing and ultra-sensitive detection. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: 7 pages, 4 figures

arXiv:2306.00322 [pdf, other]

doi 10.1103/PhysRevLett.131.201802

Search for Boosted Dark Matter in COSINE-100

Authors: G. Adhikari, N. Carlin, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, J. H. Jo, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, Y. J. Ko, D. H. Lee , et al. (34 additional authors not shown)

Abstract: We search for energetic electron recoil signals induced by boosted dark matter (BDM) from the galactic center using the COSINE-100 array of NaI(Tl) crystal detectors at the Yangyang Underground Laboratory. The signal would be an excess of events with energies above 4 MeV over the well-understood background. Because no excess of events are observed in a 97.7 kg$\cdot$years exposure, we set limits o… ▽ More We search for energetic electron recoil signals induced by boosted dark matter (BDM) from the galactic center using the COSINE-100 array of NaI(Tl) crystal detectors at the Yangyang Underground Laboratory. The signal would be an excess of events with energies above 4 MeV over the well-understood background. Because no excess of events are observed in a 97.7 kg$\cdot$years exposure, we set limits on BDM interactions under a variety of hypotheses. Notably, we explored the dark photon parameter space, leading to competitive limits compared to direct dark photon search experiments, particularly for dark photon masses below 4\,MeV and considering the invisible decay mode. Furthermore, by comparing our results with a previous BDM search conducted by the Super-Kamionkande experiment, we found that the COSINE-100 detector has advantages in searching for low-mass dark matter. This analysis demonstrates the potential of the COSINE-100 detector to search for MeV electron recoil signals produced by the dark sector particle interactions. △ Less

Submitted 30 October, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. Lett. 131, 201802 (2023)

Showing 1–50 of 245 results for author: Rae, J