Skip to main content

Showing 1–50 of 2,338 results for author: Li, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13181  [pdf, other

    cs.CV

    Training-Free Large Model Priors for Multiple-in-One Image Restoration

    Authors: Xuanhua He, Lang Li, Yingying Wang, Hui Zheng, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a nov… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13096  [pdf, other

    cs.PF

    DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information

    Authors: Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang

    Abstract: Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (DVFS) has emerged as a promising technique for conserving energy while maintaining the quality of service (QoS) of GPU applications. However, existing solutions using DVFS are hindered by inefficiency… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.13076  [pdf, other

    cs.MA cs.NI eess.SP

    Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks

    Authors: Ziqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu

    Abstract: Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  4. arXiv:2407.13054  [pdf, other

    cs.AI

    Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data

    Authors: Wenjin Niu, Zijun Gao, Liyan Song, Lingbo Li

    Abstract: Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, the existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies and a lack of comprehensive evaluations. This study addresses these gaps by conducting an exhaustive review and empirical evaluation of causal disc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.12782  [pdf, other

    cs.LG cs.CV

    Contrastive Adversarial Training for Unsupervised Domain Adaptation

    Authors: Jiahong Chen, Zhilin Zhang, Lucy Li, Behzad Shahrasbi, Arjun Mishra

    Abstract: Domain adversarial training has shown its effective capability for finding domain invariant feature representations and been successfully adopted for various domain adaptation tasks. However, recent advances of large models (e.g., vision transformers) and emerging of complex adaptation scenarios (e.g., DomainNet) make adversarial training being easily biased towards source domain and hardly adapte… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  6. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. arXiv:2407.12385  [pdf, other

    cs.IR

    RankTower: A Synergistic Framework for Enhancing Two-Tower Pre-Ranking Model

    Authors: YaChen Yan, Liubo Li

    Abstract: In large-scale ranking systems, cascading architectures have been widely adopted to achieve a balance between efficiency and effectiveness. The pre-ranking module plays a vital role in selecting a subset of candidates for the subsequent ranking module. It is crucial for the pre-ranking model to maintain a balance between efficiency and accuracy to adhere to online latency constraints. In this pape… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  8. arXiv:2407.12371  [pdf, other

    cs.CV cs.AI

    HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

    Authors: Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang

    Abstract: Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Project page: https://lvxintao.github.io/himo, accepted by ECCV 2024

  9. arXiv:2407.11683  [pdf, other

    cs.CV

    Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning

    Authors: Yunbin Tu, Liang Li, Li Su, Chenggang Yan, Qingming Huang

    Abstract: Change captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo changes about location and scale, and certain objects might overlap others, resulting in perturbational and discrimination-degraded features between two images. Howe… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  10. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  11. arXiv:2407.11466  [pdf, other

    cs.CY

    Navigating the Data Trading Crossroads: An Interdisciplinary Survey

    Authors: Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang

    Abstract: Data has been increasingly recognized as a critical factor in the future economy. However, constructing an efficient data trading market faces challenges such as privacy breaches, data monopolies, and misuse. Despite numerous studies proposing algorithms to protect privacy and methods for pricing data, a comprehensive understanding of these issues and systemic solutions remain elusive. This paper… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.11100  [pdf, other

    cs.CR cs.AI cs.CL

    Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

    Authors: Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Qiao Yu, Li Li, Fei-Yue Wang

    Abstract: Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensur… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 59 pages, 7 figures

  13. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  14. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  15. arXiv:2407.10205  [pdf, other

    quant-ph cs.ET math.CO

    Parallel Ising Annealer via Gradient-based Hamiltonian Monte Carlo

    Authors: Hao Wang, Zixuan Liu, Zhixin Xie, Langyu Li, Zibo Miao, Wei Cui, Yu Pan

    Abstract: Ising annealer is a promising quantum-inspired computing architecture for combinatorial optimization problems. In this paper, we introduce an Ising annealer based on the Hamiltonian Monte Carlo, which updates the variables of all dimensions in parallel. The main innovation is the fusion of an approximate gradient-based approach into the Ising annealer which introduces significant acceleration and… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  16. arXiv:2407.10159  [pdf, other

    cs.CV cs.LG cs.RO

    RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

    Authors: Li Li, Hubert P. H. Shum, Toby P. Breckon

    Abstract: 3D point clouds play a pivotal role in outdoor scene perception, especially in the context of autonomous driving. Recent advancements in 3D LiDAR segmentation often focus intensely on the spatial positioning and distribution of points for accurate segmentation. However, these methods, while robust in variable conditions, encounter challenges due to sole reliance on coordinates and point intensity,… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; 18 pages, 6 figures, 7 tables; Code at https://github.com/l1997i/rapid_seg

    Journal ref: Eur. Conf. Comput. Vis. (ECCV 2024)

  17. arXiv:2407.09966  [pdf, other

    cs.CV eess.IV

    Optimizing ROI Benefits Vehicle ReID in ITS

    Authors: Mei Qiu, Lauren Ann Christopher, Lingxi Li, Stanley Chien, Yaobin Chen

    Abstract: Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidenc… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  18. arXiv:2407.09710  [pdf, other

    quant-ph cs.PL

    DisQ: A Markov Decision Process Based Language for Quantum Distributed Systems

    Authors: Le Chang, Saitej Yavvari, Rance Cleaveland, Samik Basu, Liyi Li

    Abstract: The development of quantum computers has reached a great milestone, in spite of restrictions on important quantum resources: the number of qubits being entangled at a single-location quantum computer. Recently, there has been some work to combine single-location quantum computing and quantum networking techniques to develop distributed quantum systems such that large entangled qubit groups can be… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Version 1

  19. arXiv:2407.09191  [pdf, other

    cs.CV cs.AI

    From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

    Authors: Hanrong Shi, Lin Li, Jun Xiao, Yueting Zhuang, Long Chen

    Abstract: Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects. To bridge this gap, we propose a model-agnostic Curricular shApe-aware FEature (CA… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCV

  20. arXiv:2407.08951  [pdf, other

    cs.SD eess.AS

    Audio Spotforming Using Nonnegative Tensor Factorization with Attractor-Based Regularization

    Authors: Shoma Ayano, Li Li, Shogo Seki, Daichi Kitamura

    Abstract: Spotforming is a target-speaker extraction technique that uses multiple microphone arrays. This method applies beamforming (BF) to each microphone array, and the common components among the BF outputs are estimated as the target source. This study proposes a new common component extraction method based on nonnegative tensor factorization (NTF) for higher model interpretability and more robust spot… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted at EUSIPCO2024

  21. arXiv:2407.08722  [pdf, other

    cs.RO cs.CV cs.LG

    Unifying 3D Representation and Control of Diverse Robots with a Single Camera

    Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

    Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Page: https://sizhe-li.github.io/publication/neural_jacobian_field

  22. arXiv:2407.08351  [pdf, other

    cs.CL cs.LG

    AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

    Authors: Xiang Lisa Li, Evan Zheran Liu, Percy Liang, Tatsunori Hashimoto

    Abstract: Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), a… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: preprint

  23. arXiv:2407.07860  [pdf, other

    cs.CV

    Controlling Space and Time with Diffusion Models

    Authors: Daniel Watson, Saurabh Saxena, Lala Li, Andrea Tagliasacchi, David J. Fleet

    Abstract: We present 4DiM, a cascaded diffusion model for 4D novel view synthesis (NVS), conditioned on one or more images of a general scene, and a set of camera poses and timestamps. To overcome challenges due to limited availability of 4D training data, we advocate joint training on 3D (with camera pose), 4D (pose+time) and video (time but no pose) data and propose a new architecture that enables the sam… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  24. arXiv:2407.07842  [pdf, other

    cs.CV

    Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification

    Authors: Mei Qiu, Lauren Christopher, Lingxi Li

    Abstract: Vision Transformers (ViTs) have excelled in vehicle re-identification (ReID) tasks. However, non-square aspect ratios of image or video input might significantly affect the re-identification performance. To address this issue, we propose a novel ViT-based ReID framework in this paper, which fuses models trained on a variety of aspect ratios. Our main contributions are threefold: (i) We analyze asp… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  25. arXiv:2407.07699  [pdf, other

    cs.IT

    Transmission Design for XL-RIS-Aided Massive MIMO System with Visibility Regions

    Authors: Luchu Li, Kangda Zhi, Cunhua Pan

    Abstract: This paper proposes a two-timescale transmission scheme for extremely large-scale (XL)-reconfigurable intelligent surfaces (RIS)-aided massive multi-input multi-output (MIMO) systems considering visibility regions (VRs). The beamforming of base stations (BS) is designed based on rapidly changing instantaneous channel state information (CSI), while the phase shifts of RIS are configured based on sl… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

  26. arXiv:2407.07503  [pdf, other

    cs.CV cs.IR

    Metasurface-based Snapshot Shortwave-Infrared Hyperspectral Image Reconstruction with Inter and Intra Prior Learning Network

    Authors: Linqiang Li, Jinglei Hao, Yongqiang Zhao, Pan Liu, Haofang Yan, Ziqin Zhang, Seong G. Kong

    Abstract: Shortwave-infrared(SWIR) spectral information,ranging from 1 μm to 2.5μm, breaks the limitations of traditional color cameras in acquiring scene information and has been used in many fields. However, conventional SWIR hyperspectral imaging systems face challenges due to their bulky setups and low acquisition speed. In this work, we introduce a snapshot SWIR hyperspectral imaging system based on a… ▽ More

    Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 10 pages,5 figures

  27. arXiv:2407.06705  [pdf, other

    cs.NI eess.SP

    Integrated Sensing and Communications for Resource Allocation in Non-Terrestrial Networks

    Authors: Israel Leyva-Mayorga, Fabio Saggese, Lintao Li, Petar Popovski

    Abstract: The integration of Non-Terrestrial Networks (NTNs) with Low Earth Orbit (LEO) satellite constellations into 5G and Beyond is essential to achieve truly global connectivity. A distinctive characteristic of LEO mega-constellations is that they constitute a global infrastructure with predictable dynamics, which enables the pre-planned allocation of the radio resources. However, the different bands th… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Submitted for publication to IEEE Transactions on Wireless Communications

  28. arXiv:2407.06597  [pdf, other

    cs.AI

    TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

    Authors: Renjie Liang, Li Li, Chongzhi Zhang, Jing Wang, Xizhou Zhu, Aixin Sun

    Abstract: In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we dev… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  29. arXiv:2407.06573  [pdf, other

    cs.SE

    LLM for Mobile: An Initial Roadmap

    Authors: Daihang Chen, Yonghui Liu, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Shuai Wang, Xiao Chen, Tegawendé F. Bissyandé, Jacques Klein, Li Li

    Abstract: When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable nativ… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  30. arXiv:2407.06540  [pdf, other

    cs.CV cs.AI

    General and Task-Oriented Video Segmentation

    Authors: Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang

    Abstract: We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/kagawa588/GvSeg

  31. arXiv:2407.06113  [pdf, other

    cs.CV

    C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

    Authors: Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

    Abstract: Compositional actions consist of dynamic (verbs) and static (objects) concepts. Humans can easily recognize unseen compositions using the learned concepts. For machines, solving such a problem requires a model to recognize unseen actions composed of previously observed verbs and objects, thus requiring, so-called, compositional generalization ability. To facilitate this research, we propose a nove… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  32. arXiv:2407.06078  [pdf, ps, other

    cs.SD

    Few-Shot Keyword Spotting from Mixed Speech

    Authors: Junming Yuan, Ying Shi, LanTian Li, Dong Wang, Askar Hamdulla

    Abstract: Few-shot keyword spotting (KWS) aims to detect unknown keywords with limited training samples. A commonly used approach is the pre-training and fine-tuning framework. While effective in clean conditions, this approach struggles with mixed keyword spotting -- simultaneously detecting multiple keywords blended in an utterance, which is crucial in real-world applications. Previous research has propos… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: accepted by INTERSPEECH 2024

  33. arXiv:2407.05975  [pdf, other

    cs.CL cs.AI

    LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

    Authors: Yinquan Lu, Wenhao Zhu, Lei Li, Yu Qiao, Fei Yuan

    Abstract: Large Language Models~(LLMs) demonstrate remarkable translation capabilities in high-resource language tasks, yet their performance in low-resource languages is hindered by insufficient multilingual data during pre-training. To address this, we dedicate 35,000 A100-SXM4-80GB GPU hours in conducting extensive multilingual continual pre-training on the LLaMA series models, enabling translation suppo… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  34. arXiv:2407.05616  [pdf, other

    cs.CV

    Explainable Image Recognition via Enhanced Slot-attention Based Classifier

    Authors: Bowen Wang, Liangzhi Li, Jiahao Zhang, Yuta Nakashima, Hajime Nagahara

    Abstract: The imperative to comprehend the behaviors of deep learning models is of utmost importance. In this realm, Explainable Artificial Intelligence (XAI) has emerged as a promising avenue, garnering increasing interest in recent years. Despite this, most existing methods primarily depend on gradients or input perturbation, which often fails to embed explanations directly within the model's decision-mak… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

  35. arXiv:2407.05554  [pdf, other

    cs.CV

    PANS: Probabilistic Airway Navigation System for Real-time Robust Bronchoscope Localization

    Authors: Qingyao Tian, Zhen Chen, Huai Liao, Xinyan Huang, Bingyu Yang, Lujie Li, Hongbin Liu

    Abstract: Accurate bronchoscope localization is essential for pulmonary interventions, by providing six degrees of freedom (DOF) in airway navigation. However, the robustness of current vision-based methods is often compromised in clinical practice, and they struggle to perform in real-time and to generalize across cases unseen during training. To overcome these challenges, we propose a novel Probabilistic… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  36. arXiv:2407.05388  [pdf, other

    cs.CV

    Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis

    Authors: Qi Sun, Hang Zhou, Wengang Zhou, Li Li, Houqiang Li

    Abstract: Synthesizing realistic 3D indoor scenes is a challenging task that traditionally relies on manual arrangement and annotation by expert designers. Recent advances in autoregressive models have automated this process, but they often lack semantic understanding of the relationships and hierarchies present in real-world scenes, yielding limited performance. In this paper, we propose Forest2Seq, a fram… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  37. arXiv:2407.05104  [pdf, other

    cs.CY

    Crowdsourced reviews reveal substantial disparities in public perceptions of parking

    Authors: Lingyao Li, Songhua Hu, Ly Dinh, Libby Hemphill

    Abstract: Due to increased reliance on private vehicles and growing travel demand, parking remains a longstanding urban challenge globally. Quantifying parking perceptions is paramount as it enables decision-makers to identify problematic areas and make informed decisions on parking management. This study introduces a cost-effective and widely accessible data source, crowdsourced online reviews, to investig… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  38. arXiv:2407.04206  [pdf, other

    math.NA cs.CE

    Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation

    Authors: Zichao Long, Lin Li, Lei Han, Xianglong Meng, Chongjun Ding, Ruiyan Li, Wu Jiang, Fuchen Ding, Jiaqing Yue, Zhichao Li, Yisheng Hu, Ding Li, Heng Liao

    Abstract: Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  39. arXiv:2407.03966  [pdf, other

    cs.SD cs.AI eess.AS

    Serialized Output Training by Learned Dominance

    Authors: Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han

    Abstract: Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization st… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: accepted by INTERSPEECH 2024

  40. arXiv:2407.02530  [pdf, ps, other

    quant-ph cs.DS

    Unifying quantum spatial search, state transfer and uniform sampling on graphs: simple and exact

    Authors: Qingwen Wang, Ying Jiang, Lvzhou Li

    Abstract: This article presents a novel and succinct algorithmic framework via alternating quantum walks, unifying quantum spatial search, state transfer and uniform sampling on a large class of graphs. Using the framework, we can achieve exact uniform sampling over all vertices and perfect state transfer between any two vertices, provided that eigenvalues of Laplacian matrix of the graph are all integers.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This manuscript has some overlap with arXiv:2307.16133. More precisely, it is an advanced version of arXiv:2307.16133, which not only modifies the paper structure and some results but also adds several new results

  41. arXiv:2407.01942  [pdf, other

    cs.AI cs.CL cs.CV

    Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

    Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi

    Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages

  42. arXiv:2407.01220  [pdf, other

    cs.CV

    Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation

    Authors: Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Yuwei Guo, Shuyuan Yang

    Abstract: Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enables open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. While effective, however, the per-pixel distilla… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 7 figures

  43. arXiv:2407.00943  [pdf, other

    cs.DC cs.LG

    FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlapping and Participant Selection

    Authors: Jiaxiang Geng, Boyu Li, Xiaoqi Qin, Yixuan Li, Liang Li, Yanzhao Hou, Miao Pan

    Abstract: Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlapping local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues i… ▽ More

    Submitted 2 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 21 pages, 10 figures, Submitted to Sensys2024

  44. arXiv:2407.00928  [pdf, other

    cs.LG cs.CL

    FoldGPT: Simple and Effective Large Language Model Compression Scheme

    Authors: Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen

    Abstract: The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of mos… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  45. arXiv:2407.00280  [pdf, other

    eess.IV cs.CV

    IVCA: Inter-Relation-Aware Video Complexity Analyzer

    Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

    Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

  46. arXiv:2406.19922  [pdf, other

    cs.CV

    Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography Warping

    Authors: Tianli Liao, Ce Wang, Lei Li, Guangen Liu, Nan Li

    Abstract: Large parallax between images is an intractable issue in image stitching. Various warping-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography warping guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partit… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 11 pages, 9 figures

  47. arXiv:2406.19400  [pdf, other

    cs.CV

    Deep Convolutional Neural Networks Meet Variational Shape Compactness Priors for Image Segmentation

    Authors: Kehui Zhang, Lingfeng Li, Hao Liu, Jing Yuan, Xue-Cheng Tai

    Abstract: Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fi… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: 28 pages

  48. arXiv:2406.18572  [pdf, other

    cs.CV cs.LG

    GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

    Authors: Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

    Abstract: This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  49. arXiv:2406.18532  [pdf, other

    cs.CL cs.AI cs.LG

    Symbolic Learning Enables Self-Evolving Agents

    Authors: Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang

    Abstract: The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code available at https://github.com/aiwaves-cn/agents

  50. arXiv:2406.17534  [pdf, other

    cs.CL

    Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

    Authors: Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, Min Zhang

    Abstract: Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we intro… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages