Zum Hauptinhalt springen

Showing 51–100 of 17,264 results for author: Li, Y

.
  1. arXiv:2408.13986  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

    Authors: Jie Feng, Yuwei Du, Jie Zhao, Yong Li

    Abstract: Human mobility prediction plays a crucial role in various real-world applications. Although deep learning based models have shown promising results over the past decade, their reliance on extensive private mobility data for training and their inability to perform zero-shot predictions, have hindered further advancements. Recently, attempts have been made to apply large language models (LLMs) to mo… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 13 pages

  2. arXiv:2408.13980  [pdf, other

    cs.CV

    FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation

    Authors: Daixun Li, Weiying Xie, Mingxiang Cao, Yunke Wang, Jiaqing Zhang, Yunsong Li, Leyuan Fang, Chang Xu

    Abstract: Multimodal image fusion and segmentation enhance scene understanding in autonomous driving by integrating data from various sensors. However, current models struggle to efficiently segment densely packed elements in such scenes, due to the absence of comprehensive fusion features that can guide mid-process fine-tuning and focus attention on relevant areas. The Segment Anything Model (SAM) has emer… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  3. arXiv:2408.13977  [pdf, other

    cs.HC

    Say Your Reason: Extract Contextual Rules In Situ for Context-aware Service Recommendation

    Authors: Yuxuan Li, Jiahui Li, Lihang Pan, Chun Yu, Yuanchun Shi

    Abstract: This paper introduces SayRea, an interactive system that facilitates the extraction of contextual rules for personalized context-aware service recommendations in mobile scenarios. The system monitors a user's execution of registered services on their smartphones (via accessibility service) and proactively requests a single-sentence reason from the user. By utilizing a Large Language Model (LLM), S… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  4. arXiv:2408.13759  [pdf, other

    cs.RO

    MASQ: Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion

    Authors: Qi Liu, Jingxiang Guo, Sixu Lin, Shuaikang Ma, Jinxuan Zhu, Yanjie Li

    Abstract: This paper proposes a novel method to improve locomotion learning for a single quadruped robot using multi-agent deep reinforcement learning (MARL). Many existing methods use single-agent reinforcement learning for an individual robot or MARL for the cooperative task in multi-robot systems. Unlike existing methods, this paper proposes using MARL for the locomotion learning of a single quadruped ro… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  5. arXiv:2408.13750  [pdf, other

    cs.AI cs.MA

    Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective

    Authors: Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li

    Abstract: Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the fir… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  6. arXiv:2408.13738  [pdf, other

    cs.CL

    Poor-Supervised Evaluation for SuperLLM via Mutual Consistency

    Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: The guidance from capability evaluations has greatly propelled the progress of both human society and Artificial Intelligence. However, as LLMs evolve, it becomes challenging to construct evaluation benchmarks for them with accurate labels on hard tasks that approach the boundaries of human capabilities. To credibly conduct evaluation without accurate labels (denoted as poor-supervised evaluation)… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: ACL findings

  7. arXiv:2408.13728  [pdf, other

    cs.CV

    3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification

    Authors: Haizhao Jing, Liuwei Wan, Xizhe Xue, Haokui Zhang, Ying Li

    Abstract: Recently, the Vision Transformer (ViT) model has replaced the classical Convolutional Neural Network (ConvNet) in various computer vision tasks due to its superior performance. Even in hyperspectral image (HSI) classification field, ViT-based methods also show promising potential. Nevertheless, ViT encounters notable difficulties in processing HSI data. Its self-attention mechanism, which exhibits… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  8. arXiv:2408.13705  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval

    Authors: Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu

    Abstract: The success of speech-image retrieval relies on establishing an effective alignment between speech and image. Existing methods often model cross-modal interaction through simple cosine similarity of the global feature of each modality, which fall short in capturing fine-grained details within modalities. To address this issue, we introduce an effective framework and a novel learning task named cro… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.13119

  9. arXiv:2408.13687  [pdf, other

    quant-ph

    Quantum error correction below the surface code threshold

    Authors: Rajeev Acharya, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Nikita Astrakhantsev, Juan Atalaya, Ryan Babbush, Dave Bacon, Brian Ballard, Joseph C. Bardin, Johannes Bausch, Andreas Bengtsson, Alexander Bilmes, Sam Blackwell, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, David A. Browne , et al. (224 additional authors not shown)

    Abstract: Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 10 pages, 4 figures, Supplementary Information

  10. arXiv:2408.13578  [pdf, other

    physics.geo-ph

    Adaptive Graded Denoising of Seismic Data Based on Noise Estimation and Local Similarity

    Authors: Xueting Yang, Yong Li, Zhangquan Liao, Yingtian Liu, Junheng Peng

    Abstract: Seismic data denoising is an important part of seismic data processing, which directly relate to the follow-up processing of seismic data. In terms of this issue, many authors proposed many methods based on rank reduction, sparse transformation, domain transformation, and deep learning. However, when the seismic data is noisy, complex and uneven, these methods often lead to over-denoising or under… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: This article has been submitted to geophysics

    MSC Class: 86-10 ACM Class: I.4.4

  11. arXiv:2408.13499  [pdf, other

    cs.CV

    R2G: Reasoning to Ground in 3D Scenes

    Authors: Yixuan Li, Zan Wang, Wei Liang

    Abstract: We propose Reasoning to Ground (R2G), a neural symbolic model that grounds the target objects within 3D scenes in a reasoning manner. In contrast to prior works, R2G explicitly models the 3D scene with a semantic concept-based scene graph; recurrently simulates the attention transferring across object entities; thus makes the process of grounding the target objects with the highest probability int… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  12. arXiv:2408.13471  [pdf, other

    cs.LG cs.AI

    Disentangled Generative Graph Representation Learning

    Authors: Xinyue Hu, Zhibin Duan, Xinyang Liu, Yuxin Li, Bo Chen, Mingyuan Zhou

    Abstract: Recently, generative graph models have shown promising results in learning graph representations through self-supervised methods. However, most existing generative graph representation learning (GRL) approaches rely on random masking across the entire graph, which overlooks the entanglement of learned representations. This oversight results in non-robustness and a lack of explainability. Furthermo… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  13. arXiv:2408.13457  [pdf, other

    cs.CL cs.AI

    Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning

    Authors: Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a se… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: Preprint

  14. arXiv:2408.13385  [pdf, other

    cs.CV

    MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning

    Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li, Ruixuan Li

    Abstract: Humans exhibit a remarkable ability to learn quickly from a limited number of labeled samples, a capability that starkly contrasts with that of current machine learning systems. Unsupervised Few-Shot Learning (U-FSL) seeks to bridge this divide by reducing reliance on annotated datasets during initial training phases. In this work, we first quantitatively assess the impacts of Masked Image Modelin… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: ACMMM 2024 (Oral)

  15. arXiv:2408.13373  [pdf, other

    cs.CV

    Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-Shot Open-Set Recognition

    Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li

    Abstract: Few-shot open-set recognition (FSOR) is a challenging task that requires a model to recognize known classes and identify unknown classes with limited labeled data. Existing approaches, particularly Negative-Prototype-Based methods, generate negative prototypes based solely on known class data. However, as the unknown space is infinite while the known space is limited, these methods suffer from lim… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: ACMMM 2024

  16. Optimal Dispatch Strategy for a Multi-microgrid Cooperative Alliance Using a Two-Stage Pricing Mechanism

    Authors: Yonghui Nie, Zhi Li, Jie Zhang, Lei Gao, Yang Li, Hengyu Zhou

    Abstract: To coordinate resources among multi-level stakeholders and enhance the integration of electric vehicles (EVs) into multi-microgrids, this study proposes an optimal dispatch strategy within a multi-microgrid cooperative alliance using a nuanced two-stage pricing mechanism. Initially, the strategy assesses electric energy interactions between microgrids and distribution networks to establish a found… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Sustainable Energy, Paper no. TSTE-00122-2024

  17. arXiv:2408.13252  [pdf, other

    cs.CV

    LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

    Authors: Shuai Yang, Jing Tan, Mengchen Zhang, Tong Wu, Yixuan Li, Gordon Wetzstein, Ziwei Liu, Dahua Lin

    Abstract: 3D immersive scene generation is a challenging yet critical task in computer vision and graphics. A desired virtual 3D scene should 1) exhibit omnidirectional view consistency, and 2) allow for free exploration in complex scene hierarchies. Existing methods either rely on successive scene expansion via inpainting or employ panorama representation to represent large FOV scene environments. However,… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Project page: https://ys-imtech.github.io/projects/LayerPano3D/

  18. arXiv:2408.13134  [pdf, ps, other

    math.NA math.PR

    Optimal order time discretizations for stochastic semilinear wave equations with multiplicative noise

    Authors: Xiaobing Feng, Yukun Li, Liet Vo

    Abstract: This paper is concerned with developing and analyzing two novel implicit temporal discretization methods for the stochastic semilinear wave equations with multiplicative noise. The proposed methods are natural extensions of well-known time-discrete schemes for deterministic wave equations, hence, they are easy to implement. It is proved that both methods are energy-stable. Moreover, the first meth… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 28 pages, 0 figure, 3 tables

    MSC Class: 65N12; 65N15; 65N30

  19. arXiv:2408.13119  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Coarse-to-fine Alignment Makes Better Speech-image Retrieval

    Authors: Lifeng Zhou, Yuke Li

    Abstract: In this paper, we propose a novel framework for speech-image retrieval. We utilize speech-image contrastive (SIC) learning tasks to align speech and image representations at a coarse level and speech-image matching (SIM) learning tasks to further refine the fine-grained cross-modal alignment. SIC and SIM learning tasks are jointly trained in a unified manner. To optimize the learning process, we u… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  20. arXiv:2408.12914  [pdf, ps, other

    eess.SP

    A Recursion-Based SNR Determination Method for Short Packet Transmission: Analysis and Applications

    Authors: Chengzhe Yin, Rui Zhang, Yongzhao Li, Yuhan Ruan, Tao Li, Jiaheng Lu

    Abstract: The short packet transmission (SPT) has gained much attention in recent years. In SPT, the most significant characteristic is that the finite blocklength code (FBC) is adopted. With FBC, the signal-to-noise ratio (SNR) cannot be expressed as an explicit function with respect to the other transmission parameters. This raises the following two problems for the resource allocation in SPTs: (i) The ex… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  21. arXiv:2408.12897  [pdf, other

    eess.IV cs.CV

    When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation

    Authors: Xi Zhu, Wei Zhang, Yijie Li, Lauren J. O'Donnell, Fan Zhang

    Abstract: Diffusion MRI (dMRI) is an advanced imaging technique characterizing tissue microstructure and white matter structural connectivity of the human brain. The demand for high-quality dMRI data is growing, driven by the need for better resolution and improved tissue contrast. However, acquiring high-quality dMRI data is expensive and time-consuming. In this context, deep generative modeling emerges as… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 11 pages, 3 figures

  22. arXiv:2408.12821  [pdf, other

    cs.CV cs.AI

    Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery

    Authors: Zhenyuan Yang, Xuhui Lin, Qinyi He, Ziye Huang, Zhengliang Liu, Hanqi Jiang, Peng Shu, Zihao Wu, Yiwei Li, Stephen Law, Gengchen Mai, Tianming Liu, Tao Yang

    Abstract: The emergence of Large Language Models (LLMs) and multimodal foundation models (FMs) has generated heightened interest in their applications that integrate vision and language. This paper investigates the capabilities of ChatGPT-4V and Gemini Pro for Street View Imagery, Built Environment, and Interior by evaluating their performance across various tasks. The assessments include street furniture i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  23. arXiv:2408.12803  [pdf, other

    cs.LG cs.AI cs.IR

    Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth

    Authors: Yuxiang Wei, Zhaoxin Qiu, Yingjie Li, Yuke Sun, Xiaoling Li

    Abstract: As a key component in boosting online user growth, uplift modeling aims to measure individual user responses (e.g., whether to play the game) to various treatments, such as gaming bonuses, thereby enhancing business outcomes. However, previous research typically considers a single-task, single-treatment setting, where only one treatment exists and the overall treatment effect is measured by a sing… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  24. arXiv:2408.12798  [pdf, other

    cs.AI

    BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

    Authors: Yige Li, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Jun Sun

    Abstract: Generative Large Language Models (LLMs) have made significant strides across various tasks, but they remain vulnerable to backdoor attacks, where specific triggers in the prompt cause the LLM to generate adversary-desired responses. While most backdoor research has focused on vision or text classification tasks, backdoor attacks in text generation have been largely overlooked. In this work, we int… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  25. arXiv:2408.12767  [pdf, other

    cs.NE cs.AI cs.AR cs.LG

    When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design

    Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Yuhang Li, Youngeun Kim, Priyadarshini Panda

    Abstract: This review explores the intersection of bio-plausible artificial intelligence in the form of Spiking Neural Networks (SNNs) with the analog In-Memory Computing (IMC) domain, highlighting their collective potential for low-power edge computing environments. Through detailed investigation at the device, circuit, and system levels, we highlight the pivotal synergies between SNNs and IMC architecture… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 19 Pages, 13 Figures

  26. arXiv:2408.12748  [pdf, other

    cs.CL cs.AI cs.LG

    SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection

    Authors: Mengya Hu, Rui Xu, Deren Lei, Yaxi Li, Mingyu Wang, Emily Ching, Eslam Kamal, Alex Deng

    Abstract: Large language models (LLMs) are highly capable but face latency challenges in real-time applications, such as conducting online hallucination detection. To overcome this issue, we propose a novel framework that leverages a small language model (SLM) classifier for initial detection, followed by a LLM as constrained reasoner to generate detailed explanations for detected hallucinated content. This… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: preprint under review

  27. arXiv:2408.12725  [pdf, other

    physics.ins-det hep-ex

    DUNE Phase II: Scientific Opportunities, Detector Concepts, Technological Solutions

    Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, C. Andreopoulos, M. Andreotti , et al. (1347 additional authors not shown)

    Abstract: The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Report number: FERMILAB-TM-2833-LBNF

  28. arXiv:2408.12451  [pdf, other

    cond-mat.quant-gas cond-mat.mes-hall cond-mat.str-el quant-ph

    Dissipation and Interaction-Controlled Non-Hermitian Skin Effects

    Authors: Yang Li, Zhao-Fan Cai, Tao Liu, Franco Nori

    Abstract: Non-Hermitian skin effects (NHSEs) have recently been investigated extensively at the single-particle level. When many-body interactions become dominant, novel non-Hermitian physical phenomena can emerge. In this work, we theoretically study NHSEs controlled by dissipation and interaction. We consider a 1D zigzag Bose-Hubbard lattice, subject to magnetic flux, staggered onsite single-particle loss… ▽ More

    Submitted 24 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: 16 pages, 9 figures; Comments are welcome

  29. arXiv:2408.12420  [pdf, other

    cs.AI

    Dataset | Mindset = Explainable AI | Interpretable AI

    Authors: Caesar Wu, Rajkumar Buyya, Yuan Fang Li, Pascal Bouvry

    Abstract: We often use "explainable" Artificial Intelligence (XAI)" and "interpretable AI (IAI)" interchangeably when we apply various XAI tools for a given dataset to explain the reasons that underpin machine learning (ML) outputs. However, these notions can sometimes be confusing because interpretation often has a subjective connotation, while explanations lean towards objective facts. We argue that XAI i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  30. arXiv:2408.12414  [pdf, other

    cs.DB

    BIPeC: A Combined Change-Point Analyzer to Identify Performance Regressions in Large-scale Database Systems

    Authors: Zhan Lyu, Thomas Bach, Yong Li, Nguyen Minh Le, Lars Hoemke

    Abstract: Performance testing in large-scale database systems like SAP HANA is a crucial yet labor-intensive task, involving extensive manual analysis of thousands of measurements, such as CPU time and elapsed time. Manual maintenance of these metrics is time-consuming and susceptible to human error, making early detection of performance regressions challenging. We address these issues by proposing an autom… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  31. arXiv:2408.12373  [pdf, other

    cs.LG cs.AI

    Cell-ontology guided transcriptome foundation model

    Authors: Xinyu Yuan, Zhihao Zhan, Zuobai Zhang, Manqi Zhou, Jianan Zhao, Boyu Han, Yue Li, Jian Tang

    Abstract: Transcriptome foundation models TFMs hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, whi… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: All anonymous reviewers' constructive suggestions are appreciated. The next version will be updated soon

  32. Basis-independent quantum coherence and its distribution under relativistic motion

    Authors: Ming-Ming Du, Hong-Wei Li, Zhen Tao, Shu-Ting Shen, Xiao-Jing Yan. Xi-Yun Li, Wei Zhong, Yu-Bo Sheng, Lan Zhou

    Abstract: Recent studies have increasingly focused on the effect of relativistic motion on quantum coherence. Prior research predominantly examined the influence of relative motion on basis-dependent quantum coherence, underscoring its susceptibility to decoherence under accelerated conditions. Yet, the effect of relativistic motion on basis-independent quantum coherence, which is critical for understanding… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 7 pages, 3 figures

  33. arXiv:2408.12236  [pdf, other

    cs.AI

    MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient

    Authors: Yanzeng Li, Cheng Zeng, Jinchao Zhang, Jie Zhou, Lei Zou

    Abstract: Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversati… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  34. arXiv:2408.12201  [pdf, ps, other

    math.DG math.AP

    Prescribing positive curvature with conical singularities on $\mathbb S^2$

    Authors: Jingyi Chen, Yuxiang Li, Yunqing Wu

    Abstract: For conformal metrics with conical singularities and positive curvature on $\mathbb S^2$, we prove a convergence theorem and apply it to obtain a criterion for nonexistence in an open region of the prescribing data. The core of our study is a fine analysis of the bubble trees and an area identity in the convergence process.

    Submitted 22 August, 2024; originally announced August 2024.

  35. arXiv:2408.12195  [pdf, ps, other

    math.DG math.AP

    Prescribing negative curvature with cusps and conical singularities on compact surface

    Authors: Jingyi Chen, Yuxiang Li, Yunqing Wu

    Abstract: On a compact surface, we prove existence and uniqueness of the conformal metric whose curvature is prescribed by a negative function away from finitely many points where the metric has prescribed angles presenting cusps or conical singularities.

    Submitted 22 August, 2024; originally announced August 2024.

  36. arXiv:2408.12161  [pdf, other

    cs.CV

    Rebalancing Multi-Label Class-Incremental Learning

    Authors: Kaile Du, Yifan Zhou, Fan Lyu, Yuyang Li, Junzhou Xie, Yixi Shen, Fuyuan Hu, Guangcan Liu

    Abstract: Multi-label class-incremental learning (MLCIL) is essential for real-world multi-label applications, allowing models to learn new labels while retaining previously learned knowledge continuously. However, recent MLCIL approaches can only achieve suboptimal performance due to the oversight of the positive-negative imbalance problem, which manifests at both the label and loss levels because of the t… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  37. arXiv:2408.12076  [pdf, other

    cs.CL cs.AI

    ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

    Authors: Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng

    Abstract: Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missin… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Review

  38. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  39. arXiv:2408.11850  [pdf, other

    cs.CL

    Parallel Speculative Decoding with Adaptive Draft Length

    Authors: Tianyu Liu, Yun Li, Qitan Lv, Kai Liu, Jianchen Zhu, Winston Hu

    Abstract: Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  40. arXiv:2408.11849  [pdf, other

    cs.CL cs.AI eess.AS

    Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

    Authors: Yinghao Aaron Li, Xilin Jiang, Jordan Darefsky, Ge Zhu, Nima Mesgarani

    Abstract: The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resou… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: CoLM 2024

  41. arXiv:2408.11843  [pdf, other

    cs.CL cs.AI

    Editable Fairness: Fine-Grained Bias Mitigation in Language Models

    Authors: Ruizhe Chen, Yichen Li, Jianfei Yang, Joey Tianyi Zhou, Zuozhu Liu

    Abstract: Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.09341

  42. arXiv:2408.11824   

    cs.HC cs.AI

    AppAgent v2: Advanced Agent for Flexible Mobile Interactions

    Authors: Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

    Abstract: With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible actio… ▽ More

    Submitted 23 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Pre-print version, some content needs to be supplemented

  43. arXiv:2408.11681  [pdf, other

    hep-ph

    Variational autoencoder inverse mapper for extraction of Compton form factors: Benchmarks and conditional learning

    Authors: Fayaz Hossen, Douglas Adams, Joshua Bautista, Yaohang Li, Gia-Wei Chern, Simonetta Liuti, Marie Boer, Marija Cuic, Gari R. Goldstein, Michael Engelhardt, Huey-Wen Li

    Abstract: Deeply virtual exclusive scattering processes (DVES) serve as precise probes of nucleon quark and gluon distributions in coordinate space. These distributions are derived from generalized parton distributions (GPDs) via Fourier transform relative to proton momentum transfer. QCD factorization theorems enable DVES to be parameterized by Compton form factors (CFFs), which are convolutions of GPDs wi… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages, 9 figures

  44. arXiv:2408.11463  [pdf, other

    cs.CV

    Low-Light Object Tracking: A Benchmark

    Authors: Pengzhi Zhong, Xiaoyu Guo, Defeng Huang, Xiaojun Peng, Yian Li, Qijun Zhao, Shuiwang Li

    Abstract: In recent years, the field of visual tracking has made significant progress with the application of large-scale training datasets. These datasets have supported the development of sophisticated algorithms, enhancing the accuracy and stability of visual object tracking. However, most research has primarily focused on favorable illumination circumstances, neglecting the challenges of tracking in low… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  45. arXiv:2408.11449  [pdf, other

    cs.AI

    Enabling Small Models for Zero-Shot Classification through Model Label Learning

    Authors: Jia Zhang, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li

    Abstract: Vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot ability in image classification tasks by aligning text and images but suffer inferior performance compared with task-specific expert models. On the contrary, expert models excel in their specialized domains but lack zero-shot ability for new tasks. How to obtain both the high performance of expert models and zero-shot a… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  46. T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

    Authors: Yili Li, Jing Yu, Keke Gai, Bang Liu, Gang Xiong, Qi Wu

    Abstract: Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, which are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in nat… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  47. arXiv:2408.11426  [pdf, other

    cs.RO

    AS-LIO: Spatial Overlap Guided Adaptive Sliding Window LiDAR-Inertial Odometry for Aggressive FOV Variation

    Authors: Tianxiang Zhang, Xuanxuan Zhang, Zongbo Liao, Xin Xia, You Li

    Abstract: LiDAR-Inertial Odometry (LIO) demonstrates outstanding accuracy and stability in general low-speed and smooth motion scenarios. However, in high-speed and intense motion scenarios, such as sharp turns, two primary challenges arise: firstly, due to the limitations of IMU frequency, the error in estimating significantly non-linear motion states escalates; secondly, drastic changes in the Field of Vi… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures

  48. arXiv:2408.11329  [pdf, ps, other

    eess.SP

    Full-Duplex ISAC-Enabled D2D Underlaid Cellular Networks: Joint Transceiver Beamforming and Power Allocation

    Authors: Tao Jiang, Ming Jin, Qinghua Guo, Yinhong Liu, Yaming Li

    Abstract: Integrating device-to-device (D2D) communication into cellular networks can significantly reduce the transmission burden on base stations (BSs). Besides, integrated sensing and communication (ISAC) is envisioned as a key feature in future wireless networks. In this work, we consider a full-duplex ISAC- based D2D underlaid system, and propose a joint beamforming and power allocation scheme to impro… ▽ More

    Submitted 21 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to IEEE Transactions on Wireless Communications on 7 June,2024

  49. arXiv:2408.11298  [pdf, other

    hep-ph nucl-th

    Towards a first principles light-front Hamiltonian for the nucleon

    Authors: Siqi Xu, Yiping Liu, Chandan Mondal, Jiangshan Lan, Xingbo Zhao, Yang Li, James P. Vary

    Abstract: We solve the nucleon's wave functions from the eigenstates of the light-front quantum chromodynamics Hamiltonian for the first time, using a fully relativistic and nonperturbative approach based on light-front quantization, without an explicit confining potential. These eigenstates are determined for the three-quark, three-quark-gluon, and three-quark-quark-antiquark Fock representations, making t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  50. arXiv:2408.10994  [pdf, other

    quant-ph

    Microsatellite-based real-time quantum key distribution

    Authors: Yang Li, Wen-Qi Cai, Ji-Gang Ren, Chao-Ze Wang, Meng Yang, Liang Zhang, Hui-Ying Wu, Liang Chang, Jin-Cai Wu, Biao Jin, Hua-Jian Xue, Xue-Jiao Li, Hui Liu, Guang-Wen Yu, Xue-Ying Tao, Ting Chen, Chong-Fei Liu, Wen-Bin Luo, Jie Zhou, Hai-Lin Yong, Yu-Huai Li, Feng-Zhi Li, Cong Jiang, Hao-Ze Chen, Chao Wu , et al. (16 additional authors not shown)

    Abstract: A quantum network provides an infrastructure connecting quantum devices with revolutionary computing, sensing, and communication capabilities. As the best-known application of a quantum network, quantum key distribution (QKD) shares secure keys guaranteed by the laws of quantum mechanics. A quantum satellite constellation offers a solution to facilitate the quantum network on a global scale. The M… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 40 pages, 8 figures