Zum Hauptinhalt springen

Showing 1–50 of 715 results for author: yang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13686  [pdf, other

    cs.SE

    Perception-Guided Fuzzing for Simulated Scenario-Based Testing of Autonomous Driving Systems

    Authors: Tri Minh Triet Pham, Bo Yang, Jinqiu Yang

    Abstract: Autonomous Driving Systems (ADS) have made huge progress and started on-road testing or even commercializing trials. ADS are complex and difficult to test: they receive input data from multiple sensors and make decisions using a combination of multiple deep neural network models and code logic. The safety of ADS is of utmost importance as their misbehavior can result in costly catastrophes, includ… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  2. arXiv:2408.13653  [pdf, other

    cs.SE

    Evaluating the Robustness of LiDAR-based 3D Obstacles Detection and Its Impacts on Autonomous Driving Systems

    Authors: Tri Minh Triet Pham, Bo Yang, Jinqiu Yang

    Abstract: Autonomous driving systems (ADSs) require real-time input from multiple sensors to make time-sensitive decisions using deep neural networks. This makes the correctness of these decisions crucial to ADSs' adoption as errors can cause significant loss. Sensors such as LiDAR are sensitive to environmental changes and built-in inaccuracies and may fluctuate between frames. While there has been extensi… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  3. arXiv:2408.12816  [pdf, other

    cs.CV

    O-Mamba: O-shape State-Space Model for Underwater Image Enhancement

    Authors: Chenyu Dong, Chen Zhao, Weiling Cai, Bo Yang

    Abstract: Underwater image enhancement (UIE) face significant challenges due to complex underwater lighting conditions. Recently, mamba-based methods have achieved promising results in image enhancement tasks. However, these methods commonly rely on Vmamba, which focuses only on spatial information modeling and struggles to deal with the cross-color channel dependency problem in underwater images caused by… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  4. arXiv:2408.09362  [pdf, other

    cs.CV

    Angle of Arrival Estimation with Transformer: A Sparse and Gridless Method with Zero-Shot Capability

    Authors: Zhaoxuan Zhu, Chulong Chen, Bo Yang

    Abstract: Automotive Multiple-Input Multiple-Output (MIMO) radars have gained significant traction in Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles (AV) due to their cost-effectiveness, resilience to challenging operating conditions, and extended detection range. To fully leverage the advantages of MIMO radars, it is crucial to develop an Angle of Arrival (AOA) algorithm that delivers hi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 8 pages, 8 figures

  5. arXiv:2408.06772  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Exploring Domain Shift on Radar-Based 3D Object Detection Amidst Diverse Environmental Conditions

    Authors: Miao Zhang, Sherif Abdulatif, Benedikt Loesch, Marco Altmann, Marius Schwarz, Bin Yang

    Abstract: The rapid evolution of deep learning and its integration with autonomous driving systems have led to substantial advancements in 3D perception using multimodal sensors. Notably, radar sensors show greater robustness compared to cameras and lidar under adverse weather and varying illumination conditions. This study delves into the often-overlooked yet crucial issue of domain shift in 4D radar-based… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures, 3 tables, accepted in IEEE International Conference on Intelligent Transportation Systems (ITSC) 2024

  6. arXiv:2408.06701  [pdf, other

    cs.NI cs.LG

    DiffSG: A Generative Solver for Network Optimization with Diffusion Model

    Authors: Ruihuai Liang, Bo Yang, Zhiwen Yu, Bin Guo, Xuelin Cao, Mérouane Debbah, H. Vincent Poor, Chau Yuen

    Abstract: Diffusion generative models, famous for their performance in image generation, are popular in various cross-domain applications. However, their use in the communication community has been mostly limited to auxiliary tasks like data modeling and feature extraction. These models hold greater promise for fundamental problems in network optimization compared to traditional machine learning methods. Di… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  7. arXiv:2408.02174  [pdf, other

    math.OC cs.GT

    On the Equilibrium of a Class of Leader-Follower Games with Decision-Dependent Chance Constraints

    Authors: Jingxiang Wang, Zhaojian Wang, Bo Yang, Feng Liu, Xinping Guan

    Abstract: In this paper, we study the existence of equilibrium in a single-leader-multiple-follower game with decision-dependent chance constraints (DDCCs), where decision-dependent uncertainties (DDUs) exist in the constraints of followers. DDUs refer to the uncertainties impacted by the leader's strategy, while the leader cannot capture their exact probability distributions. To address such problems, we f… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  8. arXiv:2407.20053  [pdf, other

    cs.LG physics.ao-ph

    Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models

    Authors: Zhe Li, Ronghui Xu, Jilin Hu, Zhong Peng, Xi Lu, Chenjuan Guo, Bin Yang

    Abstract: Significant wave height (SWH) is a vital metric in marine science, and accurate SWH estimation is crucial for various applications, e.g., marine energy development, fishery, early warning systems for potential risks, etc. Traditional SWH estimation methods that are based on numerical models and physical theories are hindered by computational inefficiencies. Recently, machine learning has emerged a… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  9. arXiv:2407.19669  [pdf, other

    cs.CL cs.IR

    mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

    Authors: Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang

    Abstract: We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders). Then we construct a hybrid TRM and a cross-encoder reranker by contrastive lea… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 20 pages, 5 figures

  10. arXiv:2407.19422  [pdf, other

    cs.AI

    A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy

    Authors: Meng Jiang, Qing Zhao, Jianqiang Li, Fan Wang, Tianyu He, Xinyan Cheng, Bing Xiang Yang, Grace W. K. Ho, Guanghui Fu

    Abstract: Cognitive Behavioral Therapy (CBT) is a well-established intervention for mitigating psychological issues by modifying maladaptive cognitive and behavioral patterns. However, delivery of CBT is often constrained by resource limitations and barriers to access. Advancements in artificial intelligence (AI) have provided technical support for the digital transformation of CBT. Particularly, the emerge… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  11. arXiv:2407.15431  [pdf, other

    cs.SI cs.AI cs.LG

    Pre-Training and Prompting for Few-Shot Node Classification on Text-Attributed Graphs

    Authors: Huanjing Zhao, Beining Yang, Yukuo Cen, Junyu Ren, Chenhui Zhang, Yuxiao Dong, Evgeny Kharlamov, Shu Zhao, Jie Tang

    Abstract: The text-attributed graph (TAG) is one kind of important real-world graph-structured data with each node associated with raw texts. For TAGs, traditional few-shot node classification methods directly conduct training on the pre-processed node features and do not consider the raw texts. The performance is highly dependent on the choice of the feature pre-processing method. In this paper, we propose… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted to KDD'24

  12. arXiv:2407.14208  [pdf, other

    cs.CV

    Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model

    Authors: Pascal Schlachter, Simon Wagner, Bin Yang

    Abstract: In practice, domain shifts are likely to occur between training and test data, necessitating domain adaptation (DA) to adjust the pre-trained source model to the target domain. Recently, universal domain adaptation (UniDA) has gained attention for addressing the possibility of an additional category (label) shift between the source and target domain. This means new classes can appear in the target… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Submitted at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

  13. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  14. arXiv:2407.12217  [pdf, other

    cs.CV

    AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs

    Authors: Yunling Zheng, Zeyi Xu, Fanghui Xue, Biao Yang, Jiancheng Lyu, Shuai Zhang, Yingyong Qi, Jack Xin

    Abstract: We propose and demonstrate an alternating Fourier and image domain filtering approach for feature extraction as an efficient alternative to build a vision backbone without using the computationally intensive attention. The performance among the lightweight models reaches the state-of-the-art level on ImageNet-1K classification, and improves downstream tasks on object detection and segmentation con… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  15. arXiv:2407.10794  [pdf, other

    cs.CL cs.AI

    Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

    Authors: Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

    Abstract: Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local pe… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 11 figures, 13 tables. arXiv admin note: substantial text overlap with arXiv:2402.14293

  16. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  17. arXiv:2407.10084  [pdf, other

    cs.CV

    Part2Object: Hierarchical Unsupervised 3D Instance Segmentation

    Authors: Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, Sibei Yang

    Abstract: Unsupervised 3D instance segmentation aims to segment objects from a 3D point cloud without any annotations. Existing methods face the challenge of either too loose or too tight clustering, leading to under-segmentation or over-segmentation. To address this issue, we propose Part2Object, hierarchical clustering with object guidance. Part2Object employs multi-layer clustering from points to object… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accept to ECCV2024

  18. arXiv:2407.09164  [pdf, other

    cs.CR cs.AI

    TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

    Authors: Yuchen Yang, Hongwei Yao, Bingrun Yang, Yiling He, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

    Abstract: Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversaria… ▽ More

    Submitted 22 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  19. arXiv:2407.06881  [pdf, other

    cs.DS

    Efficient Stochastic Routing in Path-Centric Uncertain Road Networks -- Extended Version

    Authors: Chenjuan Guo, Ronghui Xu, Bin Yang, Ye Yuan, Tung Kieu, Yan Zhao, Christian S. Jensen

    Abstract: The availability of massive vehicle trajectory data enables the modeling of road-network constrained movement as travel-cost distributions rather than just single-valued costs, thereby capturing the inherent uncertainty of movement and enabling improved routing quality. Thus, stochastic routing has been studied extensively in the edge-centric model, where such costs are assigned to the edges in a… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  20. arXiv:2407.05615  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos

    Authors: Ziyang Song, Jinxi Li, Bo Yang

    Abstract: It has long been challenging to recover the underlying dynamic 3D scene representations from a monocular RGB video. Existing works formulate this problem into finding a single most plausible solution by adding various constraints such as depth priors and strong geometry constraints, ignoring the fact that there could be infinitely many 3D scene representations corresponding to a single dynamic vid… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICML 2024. Code and data are available at: https://github.com/vLAR-group/OSN

  21. arXiv:2407.05554  [pdf, other

    cs.CV

    PANS: Probabilistic Airway Navigation System for Real-time Robust Bronchoscope Localization

    Authors: Qingyao Tian, Zhen Chen, Huai Liao, Xinyan Huang, Bingyu Yang, Lujie Li, Hongbin Liu

    Abstract: Accurate bronchoscope localization is essential for pulmonary interventions, by providing six degrees of freedom (DOF) in airway navigation. However, the robustness of current vision-based methods is often compromised in clinical practice, and they struggle to perform in real-time and to generalize across cases unseen during training. To overcome these challenges, we propose a novel Probabilistic… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  22. arXiv:2407.02887  [pdf, other

    cs.CV

    Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

    Authors: Hang Xu, Chen Long, Wenxiao Zhang, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang

    Abstract: In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two m… ▽ More

    Submitted 22 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  23. arXiv:2407.00933  [pdf, other

    cs.DC eess.SP

    Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis

    Authors: Xueyao Zhang, Bo Yang, Zhiwen Yu, Xuelin Cao, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

    Abstract: This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  24. arXiv:2407.00875  [pdf, other

    cs.CL cs.AI

    MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting

    Authors: Tianhao Li, Shangjie Li, Binbin Xie, Deyi Xiong, Baosong Yang

    Abstract: The advent of large language models (LLMs) has predominantly catered to high-resource languages, leaving a disparity in performance for low-resource languages. Conventional Continual Training (CT) approaches to bridge this gap often undermine a model's original linguistic proficiency when expanding to multilingual contexts. Addressing this issue, we introduce a novel MoE-CT architecture, a paradig… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 13 pages, 2 figures

  25. arXiv:2407.00615  [pdf, other

    cs.LG

    GC-Bench: An Open and Unified Benchmark for Graph Condensation

    Authors: Qingyun Sun, Ziying Chen, Beining Yang, Cheng Ji, Xingcheng Fu, Sheng Zhou, Hao Peng, Jianxin Li, Philip S. Yu

    Abstract: Graph condensation (GC) has recently garnered considerable attention due to its ability to reduce large-scale graph datasets while preserving their essential properties. The core concept of GC is to create a smaller, more manageable graph that retains the characteristics of the original graph. Despite the proliferation of graph condensation methods developed in recent years, there is no comprehens… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

  26. arXiv:2406.20038  [pdf, other

    cs.CL

    BioMNER: A Dataset for Biomedical Method Entity Recognition

    Authors: Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

    Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  27. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  28. arXiv:2406.18192  [pdf, other

    cs.CL cs.AI

    Methodology of Adapting Large English Language Models for Specific Cultural Contexts

    Authors: Wenjing Zhang, Siqi Xiao, Xuejiao Lei, Ning Wang, Huazheng Zhang, Meijuan An, Bikun Yang, Zhaoxiang Liu, Kai Wang, Shiguo Lian

    Abstract: The rapid growth of large language models(LLMs) has emerged as a prominent trend in the field of artificial intelligence. However, current state-of-the-art LLMs are predominantly based on English. They encounter limitations when directly applied to tasks in specific cultural domains, due to deficiencies in domain-specific knowledge and misunderstandings caused by differences in cultural values. To… ▽ More

    Submitted 26 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, 2 figures

  29. arXiv:2406.18055  [pdf, other

    cs.IT eess.SP

    Filtering Reconfigurable Intelligent Computational Surface for RF Spectrum Purification

    Authors: Kaining Wang, Bo Yang, Zhiwen Yu, Xuelin Cao, Mérouane Debbah, Chau Yuen

    Abstract: The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-b… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  30. arXiv:2406.17962  [pdf, other

    cs.CL

    Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework

    Authors: Bohao Yang, Dong Liu, Chen Tang, Chenghao Xiao, Kun Zhao, Chao Li, Lin Yuan, Guang Yang, Lanxiao Huang, Chenghua Lin

    Abstract: Large Language Models (LLMs) demonstrate a remarkable ability to comprehend human instructions and generate high-quality text. This capability allows LLMs to function as agents that can emulate human beings at a more sophisticated level, beyond the mere replication of basic human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters from diverse aspects. In thi… ▽ More

    Submitted 16 August, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  31. arXiv:2406.17911  [pdf, other

    cs.CL

    X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms

    Authors: Kun Zhao, Chenghao Xiao, Chen Tang, Bohao Yang, Kai Ye, Noura Al Moubayed, Liang Zhan, Chenghua Lin

    Abstract: Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  32. arXiv:2406.15806  [pdf, other

    cs.RO

    Robust Dynamic Control Barrier Function Based Trajectory Planning for Mobile Manipulator

    Authors: Lihao Xu, Xiaogang Xiong, Bai Yang, Yunjiang Lou

    Abstract: High-dimensional robot dynamic trajectory planning poses many challenges for traditional planning algorithms. Existing planning methods suffer from issues such as long computation times, limited capacity to address intricate obstacle models, and lack of consideration for external disturbances and measurement inaccuracies in these high-dimensional systems. To tackle these challenges, this paper pro… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  33. arXiv:2406.13972  [pdf, other

    cs.SE

    CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

    Authors: Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu Jin

    Abstract: Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially ca… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  34. arXiv:2406.13527  [pdf, other

    cs.CV

    4K4DGen: Panoramic 4D Generation at 4K Resolution

    Authors: Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhiwen Fan

    Abstract: The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the needs of VR/AR applications. In this work, we tackle the challengin… ▽ More

    Submitted 4 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  35. arXiv:2406.13335  [pdf, other

    cs.NI eess.SP

    AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Optimizations

    Authors: Xuelin Cao, Bo Yang, Kaining Wang, Xinghua Li, Zhiwen Yu, Chau Yuen, Yan Zhang, Zhu Han

    Abstract: With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimiz… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  36. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  37. arXiv:2406.11432  [pdf, other

    cs.CV cs.AI

    AnyTrans: Translate AnyText in the Image with Large Scale Models

    Authors: Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, Rongrong Ji

    Abstract: This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI), which includes multilingual text translation and text fusion within images. Our framework leverages the strengths of large-scale models, such as Large Language Models (LLMs) and text-guided diffusion models, to incorporate contextual cues from both textual and visual elements during tr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  38. arXiv:2406.11327  [pdf, other

    cs.CV

    ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

    Authors: Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye

    Abstract: An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in tra… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://github.com/martian422/ClawMachine

  39. arXiv:2406.11194  [pdf, other

    cs.CL

    In-Context Editing: Learning Knowledge from Self-Induced Distributions

    Authors: Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng

    Abstract: The existing fine-tuning paradigm for language models is brittle in knowledge editing scenarios, where the model must incorporate new information without extensive retraining. This brittleness often results in overfitting, reduced performance, and unnatural language generation. To address this, we propose Consistent In-Context Editing (ICE), a novel approach that leverages the model's in-context l… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  40. arXiv:2406.10311  [pdf, other

    cs.CL cs.AI

    CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models

    Authors: Wenjing Zhang, Xuejiao Lei, Zhaoxiang Liu, Meijuan An, Bikun Yang, KaiKai Zhao, Kai Wang, Shiguo Lian

    Abstract: With the profound development of large language models(LLMs), their safety concerns have garnered increasing attention. However, there is a scarcity of Chinese safety benchmarks for LLMs, and the existing safety taxonomies are inadequate, lacking comprehensive safety detection capabilities in authentic Chinese scenarios. In this work, we introduce CHiSafetyBench, a dedicated safety benchmark for e… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 3 figures

  41. arXiv:2406.10307  [pdf, other

    cs.CL cs.AI

    What is the best model? Application-driven Evaluation for Large Language Models

    Authors: Shiguo Lian, Kaikai Zhao, Xinhui Liu, Xuejiao Lei, Bikun Yang, Wenjing Zhang, Kai Wang, Zhaoxiang Liu

    Abstract: General large language models enhanced with supervised fine-tuning and reinforcement learning from human feedback are increasingly popular in academia and industry as they generalize foundation models to various practical tasks in a prompt manner. To assist users in selecting the best model in practical application scenarios, i.e., choosing the model that meets the application requirements while m… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  42. arXiv:2406.06562  [pdf, other

    cs.CL cs.AI

    Achieving Sparse Activation in Small Language Models

    Authors: Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao

    Abstract: Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 15 pages

  43. arXiv:2406.06475  [pdf, other

    cs.IR cs.AI

    Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives

    Authors: Da Xu, Danqing Zhang, Guangyu Yang, Bo Yang, Shuyuan Xu, Lingling Zheng, Cindy Liang

    Abstract: Recently, generative AI (GAI), with their emerging capabilities, have presented unique opportunities for augmenting and revolutionizing industrial recommender systems (Recsys). Despite growing research efforts at the intersection of these fields, the integration of GAI into industrial Recsys remains in its infancy, largely due to the intricate nature of modern industrial Recsys infrastructure, ope… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  44. arXiv:2406.06073  [pdf, other

    cs.CL

    Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

    Authors: Yan Gao, Zhiwei Cao, Zhongjian Miao, Baosong Yang, Shiyu Liu, Min Zhang, Jinsong Su

    Abstract: To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient $λ$. Despite its success, $k$NN retrieval at each timestep leads to substantial time… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  45. Modeling User Retention through Generative Flow Networks

    Authors: Ziru Liu, Shuchang Liu, Bin Yang, Zhenghai Xue, Qingpeng Cai, Xiangyu Zhao, Zijian Zhang, Lantao Hu, Han Li, Peng Jiang

    Abstract: Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges includi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: KDD-ADS 2024

  46. arXiv:2406.00079  [pdf, other

    cs.LG

    Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

    Authors: Sili Huang, Jifeng Hu, Zhejian Yang, Liwei Yang, Tao Luo, Hechang Chen, Lichao Sun, Bo Yang

    Abstract: Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of a… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.20692. arXiv admin note: text overlap with arXiv:2405.20692; text overlap with arXiv:2305.16554, arXiv:2210.14215 by other authors

  47. arXiv:2405.20986  [pdf, other

    cs.LG cs.CV

    Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and Benchmarks

    Authors: Linlin Yu, Bowen Yang, Tianhao Wang, Kangshuo Li, Feng Chen

    Abstract: The fusion of raw features from multiple sensors on an autonomous vehicle to create a Bird's Eye View (BEV) representation is crucial for planning and control systems. There is growing interest in using deep learning models for BEV semantic segmentation. Anticipating segmentation errors and improving the explainability of DNNs is essential for autonomous driving, yet it is under-studied. This pape… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  48. arXiv:2405.20692  [pdf, other

    cs.LG cs.AI

    In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

    Authors: Sili Huang, Jifeng Hu, Hechang Chen, Lichao Sun, Bo Yang

    Abstract: In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  49. arXiv:2405.17478  [pdf, other

    cs.LG stat.ML

    ROSE: Register Assisted General Time Series Forecasting with Decomposed Frequency Learning

    Authors: Yihang Wang, Yuying Qiu, Peng Chen, Kai Zhao, Yang Shu, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo

    Abstract: With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  50. arXiv:2405.15924  [pdf, other

    cs.CL

    SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation

    Authors: Kun Zhao, Bohao Yang, Chen Tang, Chenghua Lin, Liang Zhan

    Abstract: The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the one-to-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonse… ▽ More

    Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL2024 Findings