Skip to main content

Showing 1–50 of 147 results for author: Xia, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06089  [pdf, other

    cs.CL

    Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models

    Authors: Jinliang Lu, Ziliang Pang, Min Xiao, Yaochen Zhu, Rui Xia, Jiajun Zhang

    Abstract: The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies f… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2407.03939  [pdf

    cs.CV

    SfM on-the-fly: Get better 3D from What You Capture

    Authors: Zongqian Zhan, Yifei Yu, Rui Xia, Wentian Gan, Hong Xie, Giulio Perda, Luca Morelli, Fabio Remondino, Xin Wang

    Abstract: In the last twenty years, Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc., whereas real-time performance is just a recent topic of growing interest. This work builds upon the original on-the-fly SfM (Zhan et al., 2024) and presents an updated version with three new advancements to get better 3D from what you capture:… ▽ More

    Submitted 14 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  4. arXiv:2406.15126  [pdf, other

    cs.CL

    On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey

    Authors: Lin Long, Rui Wang, Ruixuan Xiao, Junbo Zhao, Xiao Ding, Gang Chen, Haobo Wang

    Abstract: Within the evolving landscape of deep learning, the dilemma of data quantity and quality has been a long-standing problem. The recent advent of Large Language Models (LLMs) offers a data-centric solution to alleviate the limitations of real-world data with synthetic data generation. However, current investigations into this field lack a unified framework and mostly stay on the surface. Therefore,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: A survey on LLMs-driven synthetic data generation, curation and evaluation

  5. arXiv:2406.14884  [pdf, other

    cs.CL

    FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents

    Authors: Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li

    Abstract: LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. De… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  6. arXiv:2406.11633  [pdf, other

    cs.CV

    DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

    Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

    Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

  7. arXiv:2406.07961  [pdf, other

    cs.CV cs.AI

    Accurate Explanation Model for Image Classifiers using Class Association Embedding

    Authors: Ruitao Xie, Jingbang Chen, Limai Jiang, Rui Xiao, Yi Pan, Yunpeng Cai

    Abstract: Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 40th IEEE International Conference on Data Engineering

  8. arXiv:2406.07571  [pdf, other

    cs.CY

    Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

    Authors: Harsh Kumar, Ruiwei Xiao, Benjamin Lawson, Ilya Musabirov, Jiakai Shi, Xinyuan Wang, Huayin Luo, Joseph Jay Williams, Anna Rafferty, John Stamper, Michael Liut

    Abstract: Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mi… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Accepted at L@S'24

  9. arXiv:2406.07268  [pdf, other

    cs.MM cs.CL cs.CV

    Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation

    Authors: Jinyuan Li, Ziyan Li, Han Li, Jianfei Yu, Rui Xia, Di Sun, Gang Pan

    Abstract: Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases u… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Extension of our Findings of EMNLP 2023 & ACL 2024 paper

  10. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  11. arXiv:2405.13049  [pdf, other

    cs.CL cs.AI cs.MM

    SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

    Authors: Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria

    Abstract: The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted to the 18th International Workshop on Semantic Evaluation (SemEval-2024). 12 pages, 3 figures, 4 Tables

    Journal ref: https://aclanthology.org/2024.semeval-1.277/

  12. arXiv:2405.04645  [pdf, other

    cs.HC cs.CY

    Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences

    Authors: John Stamper, Ruiwei Xiao, Xinying Hou

    Abstract: The field of Artificial Intelligence in Education (AIED) focuses on the intersection of technology, education, and psychology, placing a strong emphasis on supporting learners' needs with compassion and understanding. The growing prominence of Large Language Models (LLMs) has led to the development of scalable solutions within educational settings, including generating different types of feedback… ▽ More

    Submitted 11 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to 25th International Conference on Artificial Intelligence in Education (AIED 2024) BlueSky special track

  13. arXiv:2405.00313  [pdf, other

    cs.CV

    Streamlining Image Editing with Layered Diffusion Brushes

    Authors: Peyman Gholami, Robert Xiao

    Abstract: Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages promp… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.00219

  14. arXiv:2404.15675  [pdf, other

    cs.IR

    Hi-Gen: Generative Retrieval For Large-Scale Personalized E-commerce Search

    Authors: Yanjing Wu, Yinfu Feng, Jian Wang, Wenji Zhou, Yunan Ye, Rong Xiao

    Abstract: Leveraging generative retrieval (GR) techniques to enhance search systems is an emerging methodology that has shown promising results in recent years. In GR, a text-to-text model maps string queries directly to relevant document identifiers (docIDs), so it dramatically simplifies the whole retrieval process. However, when applying most GR models in large-scale E-commerce for personalized item sear… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  15. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  16. arXiv:2404.11889  [pdf, other

    eess.IV cs.CV

    Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

    Authors: Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

    Abstract: X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures

  17. arXiv:2404.02213  [pdf, other

    cs.HC cs.AI cs.CY

    Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

    Authors: Ruiwei Xiao, Xinying Hou, John Stamper

    Abstract: Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-sol… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted CHI 2024 LBW - 10 pages

  18. arXiv:2403.15901  [pdf, other

    cs.AI cs.CV

    MatchSeg: Towards Better Segmentation via Reference Image Matching

    Authors: Ruiqiang Xiao, Jiayu Huo, Haotian Zheng, Yang Liu, Sebastien Ourselin, Rachel Sparks

    Abstract: Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the q… ▽ More

    Submitted 19 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

  19. arXiv:2403.15835  [pdf, other

    cs.CV

    Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

    Authors: Hancheng Ye, Chong Yu, Peng Ye, Renqiu Xia, Yansong Tang, Jiwen Lu, Tao Chen, Bo Zhang

    Abstract: Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint. Such a separate evaluation process induces the gap between importance and sparsity score distributions, thus causing high search costs… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Our code will be available at www.github.com/HankYe/Once-for-Both

  20. arXiv:2403.02799  [pdf, other

    cs.CL cs.AI

    DPPA: Pruning Method for Large Language Model to Model Merging

    Authors: Yaochen Zhu, Rui Xia, Jiajun Zhang

    Abstract: Model merging is to combine fine-tuned models derived from multiple domains, with the intent of enhancing the model's proficiency across various domains. The principal concern is the resolution of parameter conflicts. A substantial amount of existing research remedy this issue during the merging stage, with the latest study focusing on resolving this issue throughout the pruning stage. The DARE ap… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  21. arXiv:2402.17213  [pdf, other

    cs.CV cs.AI

    VCD: Knowledge Base Guided Visual Commonsense Discovery in Images

    Authors: Xiangqing Shen, Yurun Song, Siwei Wu, Rui Xia

    Abstract: Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-graine… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  22. arXiv:2402.12185  [pdf, other

    cs.CV

    ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

    Authors: Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao

    Abstract: Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal eva… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Code and dataset are available for downloading at: https://github.com/UniModal4Reasoning/ChartVLM 22 pages, 15 figures

  23. arXiv:2402.11809  [pdf, other

    cs.CL cs.AI cs.LG

    Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

    Authors: Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao

    Abstract: This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables… ▽ More

    Submitted 19 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings

  24. arXiv:2402.07913  [pdf, other

    cs.CL cs.AI cs.HC

    QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners

    Authors: Rui Xiao, Lu Han, Xiaoying Zhou, Jiong Wang, Na Zong, Pengyu Zhang

    Abstract: In online learning platforms, particularly in rapidly growing computer programming courses, addressing the thousands of students' learning queries requires considerable human cost. The creation of intelligent assistant large language models (LLMs) tailored for programming education necessitates distinct data support. However, in real application scenarios, the data resources for training such LLMs… ▽ More

    Submitted 22 February, 2024; v1 submitted 30 January, 2024; originally announced February 2024.

  25. arXiv:2401.13588  [pdf

    cs.CL cs.AI cs.SE

    Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

    Authors: Darren Liu, Cheng Ding, Delgersuren Bold, Monique Bouvier, Jiaying Lu, Benjamin Shickel, Craig S. Jabaley, Wenhui Zhang, Soojin Park, Michael J. Young, Mark S. Wainwright, Gilles Clermont, Parisa Rashidi, Eric S. Rosenthal, Laurie Dimisko, Ran Xiao, Joo Heung Yoon, Carl Yang, Xiao Hu

    Abstract: The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  26. arXiv:2401.12522  [pdf, other

    cs.CL cs.AI cs.LG

    BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

    Authors: Feng Lin, Hanling Yi, Hongbin Li, Yifan Yang, Xiaotian Yu, Guangming Lu, Rong Xiao

    Abstract: Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of pro… ▽ More

    Submitted 25 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: An appendix has been included. Source code at https://github.com/linfeng93/BiTA

  27. arXiv:2401.02847  [pdf, other

    cs.CV cs.GR cs.LG

    Generating Non-Stationary Textures using Self-Rectification

    Authors: Yang Zhou, Rongjun Xiao, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while fa… ▽ More

    Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Project page: https://github.com/xiaorongjun000/Self-Rectification

  28. arXiv:2312.17120  [pdf, other

    cs.CL cs.AI cs.LG

    Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math

    Authors: Zengzhi Wang, Rui Xia, Pengfei Liu

    Abstract: High-quality, large-scale corpora are the cornerstone of building foundation models. In this work, we introduce \textsc{MathPile}, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. Throughout its creation, we adhered to the principle of ``\emph{less is more}'', firmly believing in the supremacy of data quality over quantity, even in the pre-training phase. Our met… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 37 pages. Working in Progress. https://github.com/GAIR-NLP/MathPile/

  29. arXiv:2312.08718  [pdf, other

    cs.RO

    Trajectory Planning and Tracking of Hybrid Flying-Crawling Quadrotors

    Authors: Dongnan Hu, Ruihao Xia, Xin Jin, Yang Tang

    Abstract: Hybrid Flying-Crawling Quadrotors (HyFCQs) are transformable robots with the ability of terrestrial and aerial hybrid motion. This article presents a trajectory planning and tracking framework designed for HyFCQs. In this framework, a terrestrial-aerial path-searching method with the crawling limitation of HyFCQs is proposed to guarantee the dynamical feasibility of trajectories. Additionally, a t… ▽ More

    Submitted 14 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  30. arXiv:2312.07075  [pdf, other

    cs.RO

    Motion Planning and Control of A Morphing Quadrotor in Restricted Scenarios

    Authors: Guiyang Cui, Ruihao Xia, Xin Jin, Yang Tang

    Abstract: Morphing quadrotors with four external actuators can adapt to different restricted scenarios by changing their geometric structure. However, previous works mainly focus on the improvements in structures and controllers, and existing planning algorithms don't consider the morphological modifications, which leads to safety and dynamic feasibility issues. In this paper, we propose a unified planning… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 8 pages, 9 figures

  31. arXiv:2312.02300  [pdf

    cs.LG eess.SP

    Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Fadi B Nahab, Xiao Hu

    Abstract: This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  32. arXiv:2311.18399  [pdf, other

    eess.AS cs.SD

    Audio Prompt Tuning for Universal Sound Separation

    Authors: Yuzhuo Liu, Xubo Liu, Yan Zhao, Yuanyuan Wang, Rui Xia, Pingchuan Tain, Yuxuan Wang

    Abstract: Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating arbitrary sounds with a single system is challenging, and the robustness is not always guaranteed. In this work, we propose audio prompt tuning (APT), a simple yet… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  33. arXiv:2311.15614  [pdf, other

    cs.CL

    FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

    Authors: Ruixuan Xiao, Yiwen Dong, Junbo Zhao, Runze Wu, Minmin Lin, Gang Chen, Haobo Wang

    Abstract: Collecting high-quality labeled data for model training is notoriously time-consuming and labor-intensive for various NLP tasks. While copious solutions, such as active learning for small language models (SLMs) and prevalent in-context learning in the era of large language models (LLMs), have been proposed and alleviate the labeling burden to some extent, their performances are still subject to hu… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 (Main conference)

  34. In-Context Learning for Knowledge Base Question Answering for Unmanned Systems based on Large Language Models

    Authors: Yunlong Chen, Yaming Zhang, Jianfei Yu, Li Yang, Rui Xia

    Abstract: Knowledge Base Question Answering (KBQA) aims to answer factoid questions based on knowledge bases. However, generating the most appropriate knowledge base query code based on Natural Language Questions (NLQ) poses a significant challenge in KBQA. In this work, we focus on the CCKS2023 Competition of Question Answering with Knowledge Graph Inference for Unmanned Systems. Inspired by the recent suc… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Runner up of the CCKS 2023 question answering with knowledge graph inference for unmanned systems evaluation task, accepted as an evaluation paper

    ACM Class: I.2.7

  35. arXiv:2310.10219  [pdf, other

    cs.CV cs.AI

    Using Global Land Cover Product as Prompt for Cropland Mapping via Visual Foundation Model

    Authors: Chao Tao, Aoran Hu, Rong Xiao, Haifeng Li, Yuze Wang

    Abstract: Data-driven deep learning methods have shown great potential in cropland mapping. However, due to multiple factors such as attributes of cropland (topography, climate, crop type) and imaging conditions (viewing angle, illumination, scale), croplands under different scenes demonstrate a great domain gap. This makes it difficult for models trained in the specific scenes to directly generalize to oth… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  36. arXiv:2310.06594  [pdf, other

    cs.CV

    On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets

    Authors: Ning Liao, Shaofeng Zhang, Renqiu Xia, Min Cao, Yu Qiao, Junchi Yan

    Abstract: There is an emerging line of research on multimodal instruction tuning, and a line of benchmarks has been proposed for evaluating these models recently. Instead of evaluating the models directly, in this paper, we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets. Also, we seek the way of building a dataset for developing an all-powerful VLIT model, which we believe could also… ▽ More

    Submitted 29 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  37. arXiv:2310.06502  [pdf, other

    cs.CL

    The Limits of ChatGPT in Extracting Aspect-Category-Opinion-Sentiment Quadruples: A Comparative Analysis

    Authors: Xiancai Xu, Jia-Dong Zhang, Rongchang Xiao, Lei Xiong

    Abstract: Recently, ChatGPT has attracted great attention from both industry and academia due to its surprising abilities in natural language understanding and generation. We are particularly curious about whether it can achieve promising performance on one of the most complex tasks in aspect-based sentiment analysis, i.e., extracting aspect-category-opinion-sentiment quadruples from texts. To this end, in… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  38. arXiv:2310.03293  [pdf, other

    cs.CL

    A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User's Intentions

    Authors: Siwei Wu, Xiangqing Shen, Rui Xia

    Abstract: Large Language Models (LLMs), such as ChatGPT, have recently been applied to various NLP tasks due to its open-domain generation capabilities. However, there are two issues with applying LLMs to dialogue tasks. 1. During the dialogue process, users may have implicit intentions that might be overlooked by LLMs. Consequently, generated responses couldn't align with the user's intentions. 2. It is un… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  39. arXiv:2310.02174  [pdf, other

    cs.CL cs.AI cs.LG

    Ask Again, Then Fail: Large Language Models' Vacillations in Judgment

    Authors: Qiming Xie, Zengzhi Wang, Yi Feng, Rui Xia

    Abstract: We observe that current conversational language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify… ▽ More

    Submitted 11 June, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted by ACL 2024 main conference

  40. arXiv:2309.12865  [pdf, other

    cs.CV

    Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image Classification

    Authors: Xizhe Xue, Haokui Zhang, Rong Xiao, Ying Li, Zongwen Bai, Mike Zheng Shou

    Abstract: Data-hungry HSI classification methods require high-quality labeled HSIs, which are often costly to obtain. This characteristic limits the performance potential of data-driven methods when dealing with limited annotated samples. Bridging the domain gap between data acquired from different sensors allows us to utilize abundant labeled data across sensors to break this bottleneck. In this paper, we… ▽ More

    Submitted 18 July, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  41. arXiv:2309.11883  [pdf

    cs.CV cs.RO

    On-the-Fly SfM: What you capture is What you get

    Authors: Zongqian Zhan, Rui Xia, Yifei Yu, Yibo Xu, Xin Wang

    Abstract: Over the last decades, ample achievements have been made on Structure from motion (SfM). However, the vast majority of them basically work in an offline manner, i.e., images are firstly captured and then fed together into a SfM pipeline for obtaining poses and sparse point cloud. In this work, on the contrary, we present an on-the-fly SfM: running online SfM while image capturing, the newly taken… ▽ More

    Submitted 13 February, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  42. arXiv:2309.11268  [pdf, other

    cs.CV

    StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

    Authors: Renqiu Xia, Bo Zhang, Haoyang Peng, Hancheng Ye, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan

    Abstract: Charts are common in literature across different scientific fields, conveying rich information easily accessible to readers. Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data, e.g. in a tabular form. In this paper, we aim to establish a unified and label-efficient learning par… ▽ More

    Submitted 18 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: SimChart9K is available for downloading at: https://github.com/UniModal4Reasoning/SimChart9K 26 pages, 15 figures

  43. arXiv:2309.07408  [pdf, other

    cs.RO

    An Explicit Method for Fast Monocular Depth Recovery in Corridor Environments

    Authors: Yehao Liu, Ruoyan Xia, Xiaosu Xu, Zijian Wang, Yiqing Ya, Mingze Fan

    Abstract: Monocular cameras are extensively employed in indoor robotics, but their performance is limited in visual odometry, depth estimation, and related applications due to the absence of scale information.Depth estimation refers to the process of estimating a dense depth map from the corresponding input image, existing researchers mostly address this issue through deep learning-based approaches, yet the… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures. arXiv admin note: text overlap with arXiv:2111.08600 by other authors

  44. arXiv:2309.05527  [pdf, other

    cs.CV

    ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

    Authors: Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao

    Abstract: Domain shifts such as sensor type changes and geographical situation variations are prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on the previous domain knowledge can be hardly directly deployed to a new domain without additional costs. In this paper, we provide a new perspective and approach of alleviating the domain shifts, by proposing a Reconstruction-Sim… ▽ More

    Submitted 25 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted by ICLR 2024. Code and simulated points are available at https://github.com/PJLab-ADG/3DTrans#resimad

  45. arXiv:2308.08345  [pdf, other

    eess.IV cs.CV

    GAEI-UNet: Global Attention and Elastic Interaction U-Net for Vessel Image Segmentation

    Authors: Ruiqiang Xiao, Zhuoyue Wan

    Abstract: Vessel image segmentation plays a pivotal role in medical diagnostics, aiding in the early detection and treatment of vascular diseases. While segmentation based on deep learning has shown promising results, effectively segmenting small structures and maintaining connectivity between them remains challenging. To address these limitations, we propose GAEI-UNet, a novel model that combines global at… ▽ More

    Submitted 22 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2004.03696 by other authors

  46. arXiv:2308.07723  [pdf, other

    cs.RO cs.MA

    Extended Preintegration for Relative State Estimation of Leader-Follower Platform

    Authors: Ruican Xia, Hailong Pei

    Abstract: Relative state estimation using exteroceptive sensors suffers from limitations of the field of view (FOV) and false detection, that the proprioceptive sensor (IMU) data are usually engaged to compensate. Recently ego-motion constraint obtained by Inertial measurement unit (IMU) preintegration has been extensively used in simultaneous localization and mapping (SLAM) to alleviate the computation bur… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  47. arXiv:2308.06948  [pdf, other

    cs.CV

    MixBCT: Towards Self-Adapting Backward-Compatible Training

    Authors: Yu Liang, Yufeng Zhang, Shiliang Zhang, Yaowei Wang, Sheng Xiao, Rong Xiao, Xiaoyu Wang

    Abstract: Backward-compatible training circumvents the need for expensive updates to the old gallery database when deploying an advanced new model in the retrieval system. Previous methods achieved backward compatibility by aligning prototypes of the new model with the old one, yet they often overlooked the distribution of old features, limiting their effectiveness when the low quality of the old model resu… ▽ More

    Submitted 26 May, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  48. arXiv:2308.05037  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Separate Anything You Describe

    Authors: Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instr… ▽ More

    Submitted 27 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Code, benchmark and pre-trained models: https://github.com/Audio-AGI/AudioSep

  49. arXiv:2307.15942  [pdf, other

    cs.CV

    CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation

    Authors: Ruihao Xia, Chaoqiang Zhao, Meng Zheng, Ziyan Wu, Qiyu Sun, Yang Tang

    Abstract: Most nighttime semantic segmentation studies are based on domain adaptation approaches and image input. However, limited by the low dynamic range of conventional cameras, images fail to capture structural details and boundary information in low-light conditions. Event cameras, as a new form of vision sensors, are complementary to conventional cameras with their high dynamic range. To this end, we… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023

  50. arXiv:2307.13259  [pdf, other

    cs.CV

    GaitFormer: Revisiting Intrinsic Periodicity for Gait Recognition

    Authors: Qian Wu, Ruixuan Xiao, Kaixin Xu, Jingcheng Ni, Boxun Li, Ziyao Xu

    Abstract: Gait recognition aims to distinguish different walking patterns by analyzing video-level human silhouettes, rather than relying on appearance information. Previous research on gait recognition has primarily focused on extracting local or global spatial-temporal representations, while overlooking the intrinsic periodic features of gait sequences, which, when fully utilized, can significantly enhanc… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.