Skip to main content

Showing 1–50 of 1,133 results for author: Chen, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13205  [pdf, ps, other

    cs.CL

    Transformer-based Single-Cell Language Model: A Survey

    Authors: Wei Lan, Guohang He, Mingyang Liu, Qingfeng Chen, Junyue Cao, Wei Peng

    Abstract: The transformers have achieved significant accomplishments in the natural language processing as its outstanding parallel processing capabilities and highly flexible attention mechanism. In addition, increasing studies based on transformers have been proposed to model single-cell data. In this review, we attempt to systematically summarize the single-cell language models and applications based on… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.11700  [pdf, other

    cs.CV eess.IV

    Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

    Authors: Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin

    Abstract: Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challeng… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  5. arXiv:2407.10655  [pdf, other

    cs.CV

    OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer

    Authors: Yu Wang, Xiangbo Su, Qiang Chen, Xinyu Zhang, Teng Xi, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang

    Abstract: Open-vocabulary object detection focusing on detecting novel categories guided by natural language. In this report, we propose Open-Vocabulary Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly open-vocabulary detector with strong performance and low latency. Building upon OVLW-DETR, we provide an end-to-end training recipe that transferring knowledge from vision-language mode… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 4 pages

  6. arXiv:2407.10061  [pdf, other

    cs.CV

    InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation

    Authors: Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

    Abstract: Text-to-motion generation holds potential for film, gaming, and robotics, yet current methods often prioritize short motion generation, making it challenging to produce long motion sequences effectively: (1) Current methods struggle to handle long motion sequences as a single input due to prohibitively high computational cost; (2) Breaking down the generation of long motion sequences into shorter… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  7. arXiv:2407.09359  [pdf, other

    cs.CV

    A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

    Authors: Qiyu Chen, Huiyuan Luo, Chengkan Lv, Zhengtao Zhang

    Abstract: Anomaly synthesis strategies can effectively enhance unsupervised anomaly detection. However, existing strategies have limitations in the coverage and controllability of anomaly synthesis, particularly for weak defects that are very similar to normal regions. In this paper, we propose Global and Local Anomaly co-Synthesis Strategy (GLASS), a novel unified framework designed to synthesize a broader… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  8. arXiv:2407.08965  [pdf, other

    cs.CV cs.LG

    Lite-SAM Is Actually What You Need for Segment Everything

    Authors: Jianhai Fu, Yuanjie Yu, Ningchuan Li, Yi Zhang, Qichao Chen, Jianping Xiong, Jun Yin, Zhiyu Xiang

    Abstract: This paper introduces Lite-SAM, an efficient end-to-end solution for the SegEvery task designed to reduce computational costs and redundancy. Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN), a traditional prompt encoder, and a mask decoder. All these components are integrated within the SAM framewo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 Accepted

  9. arXiv:2407.08507  [pdf, other

    cs.CV

    Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement

    Authors: Zijie Yue, Miaojing Shi, Hanli Wang, Shuai Ding, Qijun Chen, Shanlin Yang

    Abstract: Facial video-based remote physiological measurement is a promising research area for detecting human vital signs (e.g., heart rate, respiration frequency) in a non-contact way. Conventional approaches are mostly supervised learning, requiring extensive collections of facial videos and synchronously recorded photoplethysmography (PPG) signals. To tackle it, self-supervised learning has recently gai… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.06846  [pdf, other

    cs.HC

    SilverCycling: Exploring the Impact of Bike-Based Locomotion on Spatial Orientation for Older Adults in VR

    Authors: Qiongyan Chen, Zhiqing Wu, Yucheng Liu, Lei Han, Zisu Li, Ge Lin Kan, Mingming Fan

    Abstract: Spatial orientation is essential for people to effectively navigate and interact with the environment in everyday life. With age-related cognitive decline, providing VR locomotion techniques with better spatial orientation performance for older adults becomes important. Such advancements not only make VR more accessible to older adults but also enable them to reap the potential health benefits of… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 19 pages, 6 figures

  11. arXiv:2407.06084  [pdf, other

    cs.CV

    3D Vision and Language Pretraining with Large-Scale Synthetic Data

    Authors: Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu

    Abstract: 3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can bridge 3D scenes with natural language, which is an important technique for embodied intelligence. However, current 3D-VLP datasets are hindered by limited scene-level diversity and insufficient fine-grained annotations (only 1.2K scenes and 280K textual annotations in ScanScribe), primarily due to the labor-inten… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: accepted by IJCAI2024

  12. arXiv:2407.05796  [pdf, other

    eess.IV cs.CV

    Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRI

    Authors: Yinsong Xu, Yipei Wang, Ziyi Shen, Iani J. M. B. Gayo, Natasha Thorley, Shonit Punwani, Aidong Men, Dean Barratt, Qingchao Chen, Yipeng Hu

    Abstract: The Gleason groups serve as the primary histological grading system for prostate cancer, providing crucial insights into the cancer's potential for growth and metastasis. In clinical practice, pathologists determine the Gleason groups based on specimens obtained from ultrasound-guided biopsies. In this study, we investigate the feasibility of directly estimating the Gleason groups from MRI scans t… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  13. arXiv:2407.05407  [pdf, other

    cs.SD cs.AI eess.AS

    CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

    Authors: Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

    Abstract: Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role… ▽ More

    Submitted 9 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: work in progress. arXiv admin note: substantial text overlap with arXiv:2407.04051

  14. arXiv:2407.05283  [pdf, other

    cs.CV

    SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

    Authors: Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan

    Abstract: Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reco… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Intelligent Vehicles. Code is available at https://mias.group/SCIPaD

  15. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  16. arXiv:2407.03179  [pdf, other

    cs.CV cs.AI cs.LG

    Motion meets Attention: Video Motion Prompts

    Authors: Qixiang Chen, Lei Wang, Piotr Koniusz, Tom Gedeon

    Abstract: Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms hav… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Research report

  17. arXiv:2407.02318  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

    Authors: Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu

    Abstract: In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  18. arXiv:2407.01885  [pdf, other

    cs.CL cs.AI

    Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

    Authors: Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen

    Abstract: Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models whil… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 28 pages

  19. arXiv:2406.19820  [pdf, other

    cs.CL cs.AI

    BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Haotian Wang, Kun Zhu, Xiyuan Du, Weijiang Yu, Ming Liu, Bing Qin

    Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-sourc… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  20. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: Jing Zou, Lanqing Liu, Qi Chen, Shujun Wang, Zhanli Hu, Xiaohan Xing, Jing Qin

    Abstract: Multi-modal MRI offers valuable complementary information for diagnosis and treatment; however, its utility is limited by prolonged scanning times. To accelerate the acquisition process, a practical approach is to reconstruct images of the target modality, which requires longer scanning times, from under-sampled k-space data using the fully-sampled reference modality with shorter scanning times as… ▽ More

    Submitted 7 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  21. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin Jin, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  22. arXiv:2406.17960  [pdf, other

    cs.CV cs.AI

    MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Mengjiao Shen, Jingwei Yang, Chengju Liu, Qijun Chen

    Abstract: Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  23. arXiv:2406.17608  [pdf, other

    cs.CV

    Test-Time Generative Augmentation for Medical Image Segmentation

    Authors: Xiao Ma, Yuhui Tao, Yuhan Zhang, Zexuan Ji, Yizhe Zhang, Qiang Chen

    Abstract: In this paper, we propose a novel approach to enhance medical image segmentation during test time. Instead of employing hand-crafted transforms or functions on the input test image to create multiple views for test-time augmentation, we advocate for the utilization of an advanced domain-fine-tuned generative model (GM), e.g., stable diffusion (SD), for test-time augmentation. Given that the GM has… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12pages, 2figures

  24. arXiv:2406.17223  [pdf, ps, other

    cs.IT

    On Zero-Error Capacity of Graphs with One Edge

    Authors: Qi Cao, Qi Chen, Baoming Bai

    Abstract: In this paper, we study the zero-error capacity of channels with memory, which are represented by graphs. We provide a method to construct code for any graph with one edge, thereby determining a lower bound on its zero-error capacity. Moreover, this code can achieve zero-error capacity when the symbols in a vertex with degree one are the same. We further apply our method to the one-edge graphs rep… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  25. arXiv:2406.16038  [pdf, other

    cs.CV

    LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

    Authors: Delin Qu, Qizhi Chen, Pingrui Zhang, Xianqiang Gao, Bin Zhao, Dong Wang, Xuelong Li

    Abstract: This paper aims to advance the progress of physical world interactive scene reconstruction by extending the interactive object reconstruction from single object level to complex scene level. To this end, we first construct one simulated and one real scene-level physical interaction dataset containing 28 scenes with multiple interactive objects per scene. Furthermore, to accurately model the intera… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  26. arXiv:2406.15330  [pdf, other

    cs.AI cs.CL

    Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance

    Authors: Haoling Li, Xin Zhang, Xiao Liu, Yeyun Gong, Yifan Wang, Yujiu Yang, Qi Chen, Peng Cheng

    Abstract: Large language models (LLMs) have revolutionized lots of fields of research. Although it is well-known that fine-tuning is essential for enhancing the capabilities of LLMs, existing research suggests that there is potential redundancy in the fine-tuning process and therefore proposes to update only a subset of parameters. However, these methods fail to leverage the task-specific information to ide… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  27. arXiv:2406.14927  [pdf, other

    cs.CV cs.RO

    Gaussian-Informed Continuum for Physical Property Identification and Simulation

    Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

    Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  28. arXiv:2406.13940  [pdf, other

    cs.CL

    AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought

    Authors: Yongheng Zhang, Qiguang Chen, Min Li, Wanxiang Che, Libo Qin

    Abstract: Cross-lingual chain-of-thought can effectively complete reasoning tasks across languages, which gains increasing attention. Recently, dominant approaches in the literature improve cross-lingual alignment capabilities by integrating reasoning knowledge from different languages. Despite achieving excellent performance, current methods still have two main challenges: (1) Manual language specification… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL2024 Findings

  29. arXiv:2406.13890  [pdf, other

    cs.CL cs.AI

    ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

    Authors: Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu

    Abstract: LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical eval… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  30. arXiv:2406.13583  [pdf, other

    cs.CV

    Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

    Authors: Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu

    Abstract: The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  31. arXiv:2406.13425  [pdf, other

    stat.ML cs.LG math.ST

    Coupled Input-Output Dimension Reduction: Application to Goal-oriented Bayesian Experimental Design and Global Sensitivity Analysis

    Authors: Qiao Chen, Elise Arnaud, Ricardo Baptista, Olivier Zahm

    Abstract: We introduce a new method to jointly reduce the dimension of the input and output space of a high-dimensional function. Choosing a reduced input subspace influences which output subspace is relevant and vice versa. Conventional methods focus on reducing either the input or output space, even though both are often reduced simultaneously in practice. Our coupled approach naturally supports goal-orie… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 65D40; 62F15; 62K05

  32. arXiv:2406.12548  [pdf, other

    cs.CL

    P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts

    Authors: Yuhao Dan, Jie Zhou, Qin Chen, Junfeng Tian, Liang He

    Abstract: Personalized large language models (LLMs) have attracted great attention in many applications, such as intelligent education and emotional support. Most work focuses on controlling the character settings based on the profile (e.g., age, skill, experience, and so on). Conversely, the psychological theory-based personality traits with implicit expression and behavior are not well modeled, limiting t… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  33. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  34. arXiv:2406.12036  [pdf, other

    cs.CL cs.AI

    MedCalc-Bench: Evaluating Large Language Models for Medical Calculations

    Authors: Nikhil Khandekar, Qiao Jin, Guangzhi Xiong, Soren Dunn, Serina S Applebaum, Zain Anwar, Maame Sarfo-Gyamfi, Conrad W Safranek, Abid A Anwar, Andrew Zhang, Aidan Gilson, Maxwell B Singer, Amisha Dave, Andrew Taylor, Aidong Zhang, Qingyu Chen, Zhiyong Lu

    Abstract: As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative e… ▽ More

    Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Github link: https://github.com/ncbi-nlp/MedCalc-Bench HuggingFace link: https://huggingface.co/datasets/nsk7153/MedCalc-Bench

  35. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  36. arXiv:2406.11274  [pdf, other

    cs.CL

    Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

    Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

    Abstract: The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 7 pages, 1 figure

  37. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  38. arXiv:2406.10671  [pdf

    cs.CL

    Augmenting Biomedical Named Entity Recognition with General-domain Resources

    Authors: Yu Yin, Hyunjae Kim, Xiao Xiao, Chih Hsuan Wei, Jaewoo Kang, Zhiyong Lu, Hua Xu, Meng Fang, Qingyu Chen

    Abstract: Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera

  39. arXiv:2406.10511  [pdf, other

    cs.DC cs.AR cs.PF math.NA

    Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV

    Authors: Qian Chen, Xiaofeng Yang, Shengli Lu

    Abstract: Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflow can be categorized into coarse and fine granularity. Coarse dataflow offers good spatial locality but suffers from low parallelism, while fine dataflow provides high parallelism but disrupts the spatial structure, leading to i… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  40. arXiv:2406.10505  [pdf, other

    cs.CL

    CroPrompt: Cross-task Interactive Prompting for Zero-shot Spoken Language Understanding

    Authors: Libo Qin, Fuxuan Wei, Qiguang Chen, Jingxuan Zhou, Shijue Huang, Jiasheng Si, Wenpeng Lu, Wanxiang Che

    Abstract: Slot filling and intent detection are two highly correlated tasks in spoken language understanding (SLU). Recent SLU research attempts to explore zero-shot prompting techniques in large language models to alleviate the data scarcity problem. Nevertheless, the existing prompting work ignores the cross-task interaction information for SLU, which leads to sub-optimal performance. To solve this proble… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  41. arXiv:2406.09095  [pdf, other

    cs.CL

    Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

    Authors: Yuhao Dan, Junfeng Tian, Jie Zhou, Ming Yan, Ji Zhang, Qin Chen, Liang He

    Abstract: Data-to-Text Generation (D2T), a classic natural language generation problem, aims at producing fluent descriptions for structured input data, such as a table. Existing D2T works mainly focus on describing the superficial associative relations among entities, while ignoring the deep comparative logical relations, such as A is better than B in a certain aspect with a corresponding opinion, which is… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  42. arXiv:2406.07983  [pdf, other

    cs.LG

    Meta-Learning Neural Procedural Biases

    Authors: Christian Raymond, Qi Chen, Bing Xue, Mengjie Zhang

    Abstract: The goal of few-shot learning is to generalize and achieve high performance on new unseen learning tasks, where each task has only a limited number of examples available. Gradient-based meta-learning attempts to address this challenging task by learning how to learn new tasks by embedding inductive biases informed by prior learning experiences into the components of the learning algorithm. In this… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  43. arXiv:2406.07115  [pdf, other

    cs.CL cs.AI cs.LG

    Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

    Authors: Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

    Abstract: Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to enhance their reasoning capabilities on complex tasks, thus taking on the role of intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2024] utilizes the depth-first search-based decision tree (DFSDT) method for reasoning with $16000+$ real-world APIs, whi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  44. arXiv:2406.05810  [pdf, other

    cs.CV

    ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving

    Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qian Wang, Qi Alfred Chen, Chao Shen

    Abstract: Recent research in adversarial machine learning has focused on visual perception in Autonomous Driving (AD) and has shown that printed adversarial patches can attack object detectors. However, it is important to note that AD visual perception encompasses more than just object detection; it also includes Multiple Object Tracking (MOT). MOT enhances the robustness by compensating for object detectio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  45. arXiv:2406.05800  [pdf, other

    cs.CV cs.CR

    SlowPerception: Physical-World Latency Attack against Visual Perception in Autonomous Driving

    Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qi Alfred Chen, Chao Shen

    Abstract: Autonomous Driving (AD) systems critically depend on visual perception for real-time object detection and multiple object tracking (MOT) to ensure safe driving. However, high latency in these visual perception components can lead to significant safety risks, such as vehicle collisions. While previous research has extensively explored latency attacks within the digital realm, translating these meth… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  46. arXiv:2406.05078  [pdf, other

    cs.IT

    Enhancing LEO Mega-Constellations with Inter-Satellite Links: Vision and Challenges

    Authors: Chenyu Wu, Shuai Han, Qian Chen, Yu Wang, Weixiao Meng, Abderrahim Benslimane

    Abstract: Low Earth orbit (LEO) satellites have been envisioned as a significant component of the sixth generation (6G) network architecture for achieving ubiquitous coverage and seamless access. However, the implementation of LEO satellites is largely restricted by the deployment of ground stations. Inter-satellite links (ISLs) have been regarded as a promising technique to fully exploit the potentials of… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 figures

  47. arXiv:2406.04713  [pdf, other

    cs.LG cond-mat.mtrl-sci cs.AI physics.comp-ph stat.ML

    FlowMM: Generating Materials with Riemannian Flow Matching

    Authors: Benjamin Kurt Miller, Ricky T. Q. Chen, Anuroop Sriram, Brandon M Wood

    Abstract: Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area ar… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: https://github.com/facebookresearch/flowmm

    Journal ref: ICML 2024

  48. arXiv:2406.04321  [pdf, other

    cs.CV cs.LG cs.MM cs.SD

    VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

    Authors: Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

    Abstract: In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music t… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: The code and datasets will be available at https://github.com/ZeyueT/VidMuse/

  49. arXiv:2406.03496  [pdf, other

    cs.CL cs.AI cs.LG

    Wings: Learning Multimodal LLMs without Text-only Forgetting

    Authors: Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, the MLLM catastrophically forgets the text-only instructions, which do not include images and can be addressed within the initial LLM. In this paper, we present Wings, a novel MLLM that excels in both text-only dialogues and multimodal compreh… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  50. arXiv:2406.03459  [pdf, other

    cs.CV

    LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

    Authors: Qiang Chen, Xiangbo Su, Xinyu Zhang, Jian Wang, Jiahui Chen, Yunpeng Shen, Chuchu Han, Ziliang Chen, Weixiang Xu, Fanrong Li, Shan Zhang, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang

    Abstract: In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for r… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.