Zum Hauptinhalt springen

Showing 101–150 of 17,264 results for author: Li, Y

.
  1. arXiv:2408.10926  [pdf, other

    astro-ph.IM hep-ex hep-ph

    GRANDlib: A simulation pipeline for the Giant Radio Array for Neutrino Detection (GRAND)

    Authors: GRAND Collaboration, Rafael Alves Batista, Aurélien Benoit-Lévy, Teresa Bister, Martina Bohacova, Mauricio Bustamante, Washington Carvalho, Yiren Chen, LingMei Cheng, Simon Chiche, Jean-Marc Colley, Pablo Correa, Nicoleta Cucu Laurenciu, Zigao Dai, Rogerio M. de Almeida, Beatriz de Errico, Sijbrand de Jong, João R. T. de Mello Neto, Krijn D. de Vries, Valentin Decoene, Peter B. Denton, Bohao Duan, Kaikai Duan, Ralph Engel, William Erba , et al. (90 additional authors not shown)

    Abstract: The operation of upcoming ultra-high-energy cosmic-ray, gamma-ray, and neutrino radio-detection experiments, like the Giant Radio Array for Neutrino Detection (GRAND), poses significant computational challenges involving the production of numerous simulations of particle showers and their detection, and a high data throughput. GRANDlib is an open-source software tool designed to meet these challen… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 11 pages, 9 figures, plus appendices

  2. arXiv:2408.10924  [pdf, other

    hep-ph nucl-th

    Unveiling the jet angular broadening with $γ-$jet in high-energy nuclear collisions

    Authors: Sa Wang, Yao Li, Jin-Wen Kang, Ben-Wei Zhang

    Abstract: Medium modification of jet substructure within the hot and dense nuclear matter has attracted enormous interest from the heavy-ion physics community in recent years. Measurements of inclusive jet show the angular narrowing in nucleus-nucleus collisions, while the recent CMS results of the photon-tagged jets ($γ-$jet) indicate hints of broadening. In this work, we conduct a theoretical study on the… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  3. arXiv:2408.10906  [pdf, other

    cs.CV

    ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

    Authors: Qi Ma, Yue Li, Bin Ren, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Danda Pani Paudel

    Abstract: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, w… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.10870  [pdf

    physics.chem-ph

    Revisiting the measurements and interpretations of DLVO forces

    Authors: Bo Feng, Xiantang Liu, Xinmin Liu, Yingli Li, Hang Li

    Abstract: The DLVO theory and electrical double layer (EDL) theory are the foundation of colloid and interface science. With the invention and development of surface forces apparatus (SFA) and atomic force microscope (AFM), the measurements and interpretations of DLVO forces (i.e., mainly measuring the EDL force (electrostatic force) FEDL and van der Waals force FvdW, and interpreting the potential ψ, charg… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 71 pages, 18 figures

  5. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  7. arXiv:2408.10795  [pdf, other

    cs.CL

    Adversarial Attack for Explanation Robustness of Rationalization Models

    Authors: Yuankai Zhang, Lingxiao Kong, Haozhao Wang, Ruixuan Li, Jun Wang, Yuhua Li, Wei Liu

    Abstract: Rationalization models, which select a subset of input text as rationale-crucial for humans to understand and trust predictions-have recently emerged as a prominent research area in eXplainable Artificial Intelligence. However, most of previous studies mainly focus on improving the quality of the rationale, ignoring its robustness to malicious attack. Specifically, whether the rationalization mode… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  8. arXiv:2408.10738  [pdf, other

    cs.CR

    PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection

    Authors: Tri Cao, Chengyu Huang, Yuexin Li, Huilin Wang, Amy He, Nay Oo, Bryan Hooi

    Abstract: Phishing attacks are a major threat to online security, exploiting user vulnerabilities to steal sensitive information. Various methods have been developed to counteract phishing, each with varying levels of accuracy, but they also encounter notable limitations. In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowl… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  9. arXiv:2408.10670  [pdf

    cs.CV eess.IV

    A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning

    Authors: Deyu Li, Longfei Xiao, Handi Wei, Yan Li, Binghua Zhang

    Abstract: The accurate measurement of the wave field and its spatiotemporal evolution is essential in many hydrodynamic experiments and engineering applications. The binocular stereo imaging technique has been widely used to measure waves. However, the optical properties of indoor water surfaces, including transparency, specular reflection, and texture absence, pose challenges for image processing and stere… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  10. arXiv:2408.10658  [pdf, other

    cs.RO

    Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

    Authors: Dayou Li, Chenkun Zhao, Shuo Yang, Lin Ma, Yibin Li, Wei Zhang

    Abstract: We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted to ICARM 2024

  11. arXiv:2408.10626  [pdf, ps, other

    math.RT

    Cores and weights of multipartitions and blocks of Ariki-Koike algebras

    Authors: Yanbo Li, Kai Meng Tan

    Abstract: Let $e$ be an integer at least two. We define the $e$-core and the $e$-weight of a multipartition associated with a multicharge as the $e$-core and the $e$-weight of its image under the Uglov map. We do not place any restriction on the multicharge for these definitions. We show how these definitions lead to the definition of the $e$-core and the $e$-weight of a block of an Ariki-Koike algebra with… ▽ More

    Submitted 28 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 19 pages

    MSC Class: 20C08; 05E10

  12. arXiv:2408.10599  [pdf, other

    hep-ex cs.CV

    Vision Calorimeter for Anti-neutron Reconstruction: A Baseline

    Authors: Hongtian Yu, Yangu Li, Mingrui Wu, Letian Shen, Yue Liu, Yunxuan Song, Qixiang Ye, Xiaorui Lyu, Yajun Mao, Yangheng Zheng, Yunfan Liu

    Abstract: In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles. However, this confronts significant challenges instrumentally with the electromagnetic calorimeter (EMC), a typical experimental sensor but recovering… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  13. arXiv:2408.10578  [pdf, other

    cs.RO

    Where to Fetch: Extracting Visual Scene Representation from Large Pre-Trained Models for Robotic Goal Navigation

    Authors: Yu Li, Dayou Li, Chenkun Zhao, Ruifeng Wang, Ran Song, Wei Zhang

    Abstract: To complete a complex task where a robot navigates to a goal object and fetches it, the robot needs to have a good understanding of the instructions and the surrounding environment. Large pre-trained models have shown capabilities to interpret tasks defined via language descriptions. However, previous methods attempting to integrate large pre-trained models with daily tasks are not competent in ma… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  14. arXiv:2408.10501  [pdf, other

    cs.IT eess.SP

    Generative Diffusion Models for High Dimensional Channel Estimation

    Authors: Xingyu Zhou, Le Liang, Jing Zhang, Peiwen Jiang, Yong Li, Shi Jin

    Abstract: Along with the prosperity of generative artificial intelligence (AI), its potential for solving conventional challenges in wireless communications has also surfaced. Inspired by this trend, we investigate the application of the advanced diffusion models (DMs), a representative class of generative AI models, to high dimensional wireless channel estimation. By capturing the structure of multiple-inp… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  15. arXiv:2408.10489  [pdf, other

    quant-ph

    Interplay of Quantum Resources in Nonlocality Tests

    Authors: Hai-Hao Dong, Yuwei Zhu, Su-Yi Cheng, Xingjian Zhang, Cheng-Long Li, Ying-Zhao Li, Hao Li, Lixing You, Xiongfeng Ma, Qiang Zhang, Jian-Wei Pan

    Abstract: Nonlocality, evidenced by the violation of Bell inequalities, not only signifies entanglement but also highlights measurement incompatibility in quantum systems. Utilizing the generalized Clauser-Horne-Shimony-Holt (CHSH) Bell inequality, our high-efficiency optical setup achieves a loophole-free violation of $2.0132$. This result provides a device-independent lower bound on entanglement, quantifi… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 15 pages, 9 figures

  16. arXiv:2408.10287  [pdf

    physics.optics cs.AI eess.IV

    Recognizing Beam Profiles from Silicon Photonics Gratings using Transformer Model

    Authors: Yu Dian Lim, Hong Yu Li, Simon Chun Kiat Goh, Xiangyu Wang, Peng Zhao, Chuan Seng Tan

    Abstract: Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transf… ▽ More

    Submitted 22 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  17. arXiv:2408.10189  [pdf, other

    cs.LG cs.AI

    Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

    Authors: Aviv Bick, Kevin Y. Li, Eric P. Xing, J. Zico Kolter, Albert Gu

    Abstract: Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown promise, but have been pretrained with substantially less computational resources than the strongest Transformer models. In this work, we present a metho… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  18. arXiv:2408.10154  [pdf, other

    cs.CV cs.RO

    LoopSplat: Loop Closure by Registering 3D Gaussian Splats

    Authors: Liyuan Zhu, Yue Li, Erik Sandström, Shengyu Huang, Konrad Schindler, Iro Armeni

    Abstract: Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS subm… ▽ More

    Submitted 19 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: Project page: https://loopsplat.github.io/

  19. arXiv:2408.10056  [pdf, other

    math.RT math.RA

    Finite dimensional 2-cyclic Jacobian algebras

    Authors: Yiyu Li, Liangang Peng

    Abstract: In this paper, we start with a class of quivers containing only 2-cycles and loops, referred to as 2-cyclic quivers. We prove that there exists a potential on these quivers that ensures the resulting quiver with potential is Jacobian-finite. As an application, we first demonstrate through covering theory that a Jacobian-finite potential exists on a class of 2-acyclic quivers. Secondly, by using th… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  20. arXiv:2408.10007  [pdf, other

    cs.CV

    P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders

    Authors: Xuechao Chen, Ying Chen, Jialin Li, Qiang Nie, Yong Liu, Qixing Huang, Yang Li

    Abstract: 3D pre-training is crucial to 3D perception tasks. However, limited by the difficulties in collecting clean 3D data, 3D pre-training consistently faced data scaling challenges. Inspired by semi-supervised learning leveraging limited labeled data and a large amount of unlabeled data, in this work, we propose a novel self-supervised pre-training framework utilizing the real 3D data and the pseudo-3D… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Under review. Pre-print

  21. arXiv:2408.09984  [pdf, other

    cs.CV

    Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype

    Authors: Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, Yan Wang

    Abstract: Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zer… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  22. arXiv:2408.09935  [pdf, other

    cs.CR

    Privacy Technologies for Financial Intelligence

    Authors: Yang Li, Thilina Ranbaduge, Kee Siong Ng

    Abstract: Financial crimes like terrorism financing and money laundering can have real impacts on society, including the abuse and mismanagement of public funds, increase in societal problems such as drug trafficking and illicit gambling with attendant economic costs, and loss of innocent lives in the case of terrorism activities. Complex financial crimes can be hard to detect primarily because data related… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  23. arXiv:2408.09850  [pdf, other

    quant-ph

    Enhancing quantum phase synchronization through squeezed-reservoir engineering

    Authors: Xing Xiao, Tian-Xiang Lu, Wo-Jun Zhong, Yan-Ling Li

    Abstract: We investigate the enhancement of quantum phase synchronization in a two-level system (TLS) coupled to a squeezed reservoir. Our study reveals that the squeezed reservoir induces a stable limit cycle in the TLS, enhancing the quantum phase synchronization. We utilize the Husimi $Q$-function to describe the phase portrait of the driven TLS, and the $S$-function to quantitatively illustrate the effe… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 6 pages,4 figures, comments are welcome!

  24. arXiv:2408.09845  [pdf, other

    cs.SI physics.soc-ph

    Predicting Long-term Dynamics of Complex Networks via Identifying Skeleton in Hyperbolic Space

    Authors: Ruikun Li, Huandong Wang, Jinghua Piao, Qingmin Liao, Yong Li

    Abstract: Learning complex network dynamics is fundamental for understanding, modeling, and controlling real-world complex systems. Though great efforts have been made to predict the future states of nodes on networks, the capability of capturing long-term dynamics remains largely limited. This is because they overlook the fact that long-term dynamics in complex network are predominantly governed by their i… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  25. TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics

    Authors: Chang Liu, Jingtao Ding, Yiwen Song, Yong Li

    Abstract: Predicting the resilience of complex networks, which represents the ability to retain fundamental functionality amidst external perturbations or internal failures, plays a critical role in understanding and improving real-world complex systems. Traditional theoretical approaches grounded in nonlinear dynamical systems rely on prior knowledge of network dynamics. On the other hand, data-driven appr… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  26. arXiv:2408.09815  [pdf, other

    cs.LG cs.HC

    A Population-to-individual Tuning Framework for Adapting Pretrained LM to On-device User Intent Prediction

    Authors: Jiahui Gong, Jingtao Ding, Fanjin Meng, Guilong Chen, Hong Chen, Shen Zhao, Haisheng Lu, Yong Li

    Abstract: Mobile devices, especially smartphones, can support rich functions and have developed into indispensable tools in daily life. With the rise of generative AI services, smartphones can potentially transform into personalized assistants, anticipating user needs and scheduling services accordingly. Predicting user intents on smartphones, and reflecting anticipated activities based on past interactions… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: accepted by KDD 2024

  27. arXiv:2408.09787  [pdf, other

    cs.CL cs.CV cs.MM

    Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

    Authors: Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang

    Abstract: Traditional animation generation methods depend on training generative models with human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial human effort and incurs high training costs. Due to limited prompting plans, these methods typically produce brief, information-poor, and context-incoherent animations. To overcome these limitations and automate the animatio… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by SIGGRAPH Asia 2024, Project and Codes: https://github.com/HITsz-TMG/Anim-Director

  28. arXiv:2408.09743  [pdf, other

    cs.CV cs.AI cs.CL

    R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation

    Authors: Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang

    Abstract: Inspired by the tremendous success of Large Language Models (LLMs), existing X-ray medical report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve f… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  29. arXiv:2408.09707  [pdf, other

    physics.optics physics.app-ph

    0ptical trapping with optical magnetic field and photonic Hall effect forces

    Authors: Yanzeng Li, Emmanuel Valenton, Spoorthi Nagasamudram, John Parker, Marcos Perez, Uttam Manna, Mahua Biswas, Stuart A. Rice, Norbert F. Scherer

    Abstract: Optical trapping is having ever-increasing impact in science $-$ particularly biophysics, photonics and most recently in quantum optomechanics $-$ owing to its superior capability for manipulating nanoscale structures and materials. However, essentially all experimental optical trapping studies in the optical dipole regime have, to date, been dominated by the interaction between a material's elect… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  30. arXiv:2408.09705  [pdf, other

    cs.LG cs.SI

    Community-Centric Graph Unlearning

    Authors: Yi Li, Shichao Zhang, Guixian Zhang, Debo Cheng

    Abstract: Graph unlearning technology has become increasingly important since the advent of the `right to be forgotten' and the growing concerns about the privacy and security of artificial intelligence. Graph unlearning aims to quickly eliminate the effects of specific data on graph neural networks (GNNs). However, most existing deterministic graph unlearning frameworks follow a balanced partition-submodel… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  31. arXiv:2408.09695  [pdf, other

    cs.LG cs.AI physics.ao-ph

    LightWeather: Harnessing Absolute Positional Encoding to Efficient and Scalable Global Weather Forecasting

    Authors: Yisong Fu, Fei Wang, Zezhi Shao, Chengqing Yu, Yujie Li, Zhao Chen, Zhulin An, Yongjun Xu

    Abstract: Recently, Transformers have gained traction in weather forecasting for their capability to capture long-term spatial-temporal correlations. However, their complex architectures result in large parameter counts and extended training times, limiting their practical application and scalability to global-scale forecasting. This paper aims to explore the key factor for accurate weather forecasting and… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  32. arXiv:2408.09689  [pdf, other

    hep-ph nucl-th

    Gravitational form factor $D$ of charmonium from shear stress

    Authors: Tianyang Hu, Xianghui Cao, Siqi Xu, Yang Li, Xingbo Zhao, James P. Vary

    Abstract: Based on our recent analysis of the hadronic matrix element of the stress-energy tensor in covariant light front dynamics, we extract the charmonium gravitational form factor $D(Q^2)$ from shear stress $T^{12}$. This is in contrast to our recent work using the (light-front) energy density $T^{+-}$. Indeed, by comparing these two currents, we identify terms that are responsible for the violation of… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures

  33. arXiv:2408.09535  [pdf, other

    hep-ph hep-th nucl-th

    Dissecting a strongly coupled scalar nucleon

    Authors: Xianghui Cao, Yang Li, James P. Vary

    Abstract: We continue our investigation of the stress within a strongly coupled scalar nucleon, and now dissect the gravitational form factors into contributions from its constituents, the (mock) nucleon and the (mock) pion. The computation is based on a non-perturbative solution of the scalar Yukawa model in the light-front Hamiltonian formalism with a Fock sector expansion including up to one nucleon and… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 11 pages, 4 figures

  34. arXiv:2408.09491  [pdf, other

    cs.SD eess.AS

    A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

    Authors: Yangze Li, Xiong Wang, Songjun Cao, Yike Zhang, Long Ma, Lei Xie

    Abstract: Audio-LLM introduces audio modality into a large language model (LLM) to enable a powerful LLM to recognize, understand, and generate audio. However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors. This paper proposes a transcription prompt-based audio-LLM by introducing an ASR… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  35. arXiv:2408.09482  [pdf, other

    astro-ph.GA astro-ph.IM

    First Discovery and Confirmation of PN Candidates Found from AI and Deep Learning Techniques Applied to VPHAS+ Survey Data

    Authors: Yushan Li, Quentin Parker, Peng Jia

    Abstract: Context. We have developed deep learning (DL) and AI-based tools to search extant narrow-band wide-field H$α$ surveys of the Galactic Plane for elusive planetary nebulae (PNe) which are hidden in dense star fields towards the Galactic center. They are faint, low-surface brightness, usually resolved sources, which are not discovered by previous automatic searches that depend on photometric data for… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  36. arXiv:2408.09474  [pdf, other

    cs.CR cs.CL cs.CV

    Image-Based Geolocation Using Large Vision-Language Models

    Authors: Yi Liu, Junchen Ding, Gelei Deng, Yuekang Li, Tianwei Zhang, Weisong Sun, Yaowen Zheng, Jingquan Ge, Yang Liu

    Abstract: Geolocation is now a vital aspect of modern life, offering numerous benefits but also presenting serious privacy concerns. The advent of large vision-language models (LVLMs) with advanced image-processing capabilities introduces new risks, as these models can inadvertently reveal sensitive geolocation information. This paper presents the first in-depth study analyzing the challenges posed by tradi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  37. arXiv:2408.09438  [pdf, ps, other

    cs.MM cs.AI cs.CV cs.SD

    Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

    Authors: Qifei Li, Yingming Gao, Yuhua Wen, Cong Wang, Ya Li

    Abstract: To address the limitation in multimodal emotion recognition (MER) performance arising from inter-modal information fusion, we propose a novel MER framework based on multitask learning where fusion occurs after alignment, called Foal-Net. The framework is designed to enhance the effectiveness of modality fusion and includes two auxiliary tasks: audio-video emotion alignment (AVEL) and cross-modal e… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted by INTERSPEECH 2024

  38. arXiv:2408.09417  [pdf

    cond-mat.str-el

    Discovery of terahertz-frequency orbitally-coupled magnons in a kagome ferromagnet

    Authors: Mengqian Che, Weizhao Chen, Maoyuan Wang, F. Michael Bartram, Liangyang Liu, Xuebin Dong, Jinjin Liu, Yidian Li, Hao Lin, Zhiwei Wang, Enke Liu, Yugui Yao, Zhe Yuan, Guang-Ming Zhang, Luyi Yang

    Abstract: In ferromagnetic materials, magnons - quanta of spin waves - typically resonate in the gigahertz range. Beyond conventional magnons, while theoretical studies have predicted magnons associated with orbital magnetic moments, their direct observation has remained challenging. Here, we present the discovery of two distinct terahertz orbitally-coupled magnon resonances in the topological kagome ferrom… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  39. arXiv:2408.09402  [pdf, other

    hep-ph physics.plasm-ph

    Transition signatures for electron-positron pair creation in space-time inhomogeneous electric field

    Authors: C. K. Li, X. X. Zhou, Q. Chen, B. An, Y. J. Li, N. S. Lin, Y. Wan

    Abstract: The process of electron-positron pair creation through multi-photon absorption in a space-time dependent electric field is analyzed using computational quantum field theory. Our findings reveal two distinct pair creation channels: the symmetric and asymmetric transition channels. We propose that the asymmetric transition channel arises from the inherent spatial inhomogeneity of intense laser pulse… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  40. arXiv:2408.09398  [pdf, other

    math.DS

    Quantitative uniform exponential acceleration of averages along decaying waves

    Authors: Zhicheng Tong, Yong Li

    Abstract: In this study, utilizing a specific exponential weighting function, we investigate the uniform exponential convergence of weighted Birkhoff averages along decaying waves and delve into several related variants. A key distinction from traditional scenarios is evident here: despite reduced regularity in observables, our method still maintains exponential convergence. In particular, we develop new te… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 28 pages, 1 figure

    MSC Class: 37A25; 37A30; 37A46

  41. arXiv:2408.09395  [pdf, other

    cs.CV

    OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

    Authors: Yang Li, Jianing Deng, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Xingtao Zhou, Catherine C. Liu, Bo Fu

    Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging and joint modeling of multiple discrete and continuous clinical scores presents a promising new paradigm for multi-task problems in Ophthalmology. The bi-channel framework that arises from the Ophthalmic phenomenon of ``interocular asymmetries'' of both eyes (OU) calls for new employment on the SOTA transformer-based models.… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  42. arXiv:2408.09380   

    cs.AI cs.IR

    ELASTIC: Efficient Linear Attention for Sequential Interest Compression

    Authors: Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun Liu, Peixing Xu, Guorui Zhou

    Abstract: State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: We hereby withdraw this paper from arXiv due to incomplete experiments. Upon further review, we have determined that additional experimental work is necessary to fully validate our findings and conclusions

  43. arXiv:2408.09377  [pdf, other

    cs.LG cs.IT stat.ML

    Mutual Information Multinomial Estimation

    Authors: Yanzhi Chen, Zijing Ou, Adrian Weller, Yingzhen Li

    Abstract: Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning. This work proposes a new estimator for mutual information. Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate. This preliminary estimate serves as a bridge between the joint and the marginal distribution, and by comparing with this br… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  44. arXiv:2408.09369  [pdf, other

    eess.IV cs.CV

    Flemme: A Flexible and Modular Learning Platform for Medical Images

    Authors: Guoqing Zhang, Jingyun Yang, Yang Li

    Abstract: As the rapid development of computer vision and the emergence of powerful network backbones and architectures, the application of deep learning in medical imaging has become increasingly significant. Unlike natural images, medical images lack huge volumes of data but feature more modalities, making it difficult to train a general model that has satisfactory performance across various datasets. In… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures

  45. arXiv:2408.09260  [pdf

    astro-ph.IM astro-ph.EP physics.ins-det

    Analysis of the Effect of Tilted Corner Cube Reflector Arrays on Lunar Laser Ranging

    Authors: Jin Cao, Rufeng Tang, Kai Huang, Zhulian Li, Yongzhang Yang, Kai Huang, Jintao Li, Yuqiang Li

    Abstract: This paper primarily investigates the effect of the tilt of corner cube reflector (CCR) arrays on lunar laser ranging (LLR). A mathematical model was established to study the random errors caused by the tilt of the CCR arrays. The study found that, ideally, when the laser ranging pulse width is 10 picoseconds or less, it is possible to distinguish from which specific corner cubes within the CCR ar… ▽ More

    Submitted 21 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Journal ref: Remote Sens. 2024, 16(16), 3030

  46. arXiv:2408.09150  [pdf, other

    cs.CL cs.AI

    CogLM: Tracking Cognitive Development of Large Language Models

    Authors: Xinglin Wang, Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: Piaget's Theory of Cognitive Development (PTC) posits that the development of cognitive levels forms the foundation for human learning across various abilities. As Large Language Models (LLMs) have recently shown remarkable abilities across a wide variety of tasks, we are curious about the cognitive levels of current LLMs: to what extent they have developed and how this development has been achiev… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: under review

  47. arXiv:2408.09141  [pdf, other

    astro-ph.HE

    Evidence for hybrid gamma-ray emission from the supernova remnant G150.3+4.5

    Authors: Yuan Li, Siming Liu, Gwenael Giacinti

    Abstract: The supernova remnant (SNR) G150.3+4.5 was first identified in radio, exhibiting a hard GeV spectrum and a $\sim 1.5^\circ$ radius. Radio observations revealed a bright arc with an index of $\sim -0.40$, which stands in contrast to the index of $\sim -0.69$ for the rest. This arc is coincident with the point-like \emph{Fermi} source 4FGL J0426.5+5434 and KM2A source 1LHAASO J0428+5531. The rest of… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures, accepted for publication in A&A

  48. arXiv:2408.09097  [pdf, other

    cs.CV cs.AI

    Depth-guided Texture Diffusion for Image Semantic Segmentation

    Authors: Wei Sun, Yuan Li, Qixiang Ye, Jianbin Jiao, Yanzhao Zhou

    Abstract: Depth information provides valuable insights into the 3D structure especially the outline of objects, which can be utilized to improve the semantic segmentation tasks. However, a naive fusion of depth information can disrupt feature and compromise accuracy due to the modality gap between the depth and the vision. In this work, we introduce a Depth-guided Texture Diffusion approach that effectively… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  49. arXiv:2408.09064  [pdf, other

    cs.CV cs.LG

    MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

    Authors: Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

    Abstract: Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by MICCAI 2024

  50. arXiv:2408.09051  [pdf, other

    astro-ph.CO

    AI-assisted super-resolution cosmological simulations IV: An emulator for deterministic realizations

    Authors: Xiaowen Zhang, Patrick Lachance, Ankita Dasgupta, Rupert A. C. Croft, Tiziana Di Matteo, Yueying Ni, Simeon Bird, Yin Li

    Abstract: Super-resolution (SR) models in cosmological simulations use deep learning (DL) to rapidly supplement low-resolution (LR) runs with statistically correct, fine details. The SR technique preserves large-scale structures by conditioning on a low-resolution (LR) version of the simulation. On smaller scales, the generative deep learning (DL) process is stochastic, resulting in numerous possible SR rea… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 16 pages, 15 figures