Skip to main content

Showing 1–50 of 529 results for author: Gao, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11906  [pdf, other

    cs.CV cs.RO

    SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

    Authors: Hao Ding, Tuxun Lu, Yuqian Zhang, Ruixing Liang, Hongchao Shu, Lalithkumar Seenivasan, Yonghao Long, Qi Dou, Cong Gao, Mathias Unberath

    Abstract: Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's pe… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.10223  [pdf, other

    cs.LG cs.CR

    Practical Unlearning for Large Language Models

    Authors: Chongyang Gao, Lixu Wang, Chenkai Weng, Xiao Wang, Qi Zhu

    Abstract: While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training da… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures. The first two authors contribute equally and they are ordered alphabetically

  3. arXiv:2407.09793  [pdf, other

    cs.SE

    Uncovering Weaknesses in Neural Code Generation

    Authors: Xiaoli Lian, Shuaisong Wang, Jieping Ma, Fang Liu, Xin Tan, Li Zhang, Lin Shi, Cuiyun Gao

    Abstract: Code generation, the task of producing source code from prompts, has seen significant advancements with the advent of pre-trained large language models (PLMs). Despite these achievements, there lacks a comprehensive taxonomy of weaknesses about the benchmark and the generated code, which risks the community's focus on known issues at the cost of under-explored areas. Our systematic study aims to… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  4. arXiv:2407.09693  [pdf, other

    cs.LG cs.AI

    A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems

    Authors: Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor

    Abstract: The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symboli… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  5. arXiv:2407.09690  [pdf, other

    cs.LG cs.CR math.OC

    Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

    Authors: Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright

    Abstract: We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: The 41st International Conference on Machine Learning (ICML 2024)

  6. arXiv:2407.08931  [pdf, other

    cs.CV

    Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

    Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

    Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  7. arXiv:2407.08681  [pdf, other

    cs.RO cs.LG eess.SY

    Hardware Neural Control of CartPole and F1TENTH Race Car

    Authors: Marcin Paluch, Florian Bolli, Xiang Deng, Antonio Rios Navarro, Chang Gao, Tobi Delbruck

    Abstract: Nonlinear model predictive control (NMPC) has proven to be an effective control method, but it is expensive to compute. This work demonstrates the use of hardware FPGA neural network controllers trained to imitate NMPC with supervised learning. We use these Neural Controllers (NCs) implemented on inexpensive embedded FPGA hardware for high frequency control on physical cartpole and F1TENTH race ca… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.04451  [pdf, other

    cs.LG cs.AI

    Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

    Authors: Chen-Xiao Gao, Shengjun Fang, Chenjun Xiao, Yang Yu, Zongzhang Zhang

    Abstract: Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications. Existing works rely on extracting step-wise reward signals from trajectory-wise preference annotations, assuming that preferences correlate with the cumulative… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  9. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  10. arXiv:2407.03361  [pdf, ps, other

    cs.SD cs.AI eess.AS

    PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training

    Authors: Xiao Liang, Zijian Zhao, Weichao Zeng, Yutong He, Fupeng He, Yiyi Wang, Chengying Gao

    Abstract: Learning musical structures and composition patterns is necessary for both music generation and understanding, but current methods do not make uniform use of learned features to generate and comprehend music simultaneously. In this paper, we propose PianoBART, a pre-trained model that uses BART for both symbolic piano music generation and understanding. We devise a multi-level object selection str… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  11. arXiv:2407.01885  [pdf, other

    cs.CL cs.AI

    Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

    Authors: Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen

    Abstract: Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models whil… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 28 pages

  12. arXiv:2407.01183  [pdf, other

    cs.DB

    TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval

    Authors: Wenbo Xu, Liang Yan, Peiyi Han, Haifeng Zhu, Chuanyi Liu, Shaoming Duan, Cuiyun Gao, Yingwei Liang

    Abstract: Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we pr… ▽ More

    Submitted 12 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  13. arXiv:2406.19672  [pdf, other

    cs.CV

    Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

    Authors: Chengrui Gao, Ziyuan Yang, Andrew Beng Jin Teoh, Min Zhu

    Abstract: Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order text… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  14. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  15. arXiv:2406.16655  [pdf, other

    cs.CL

    Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

    Authors: Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated parts: knowledge retrieval and knowledge-free reasoning, and analyze the cross-lingual transferability of them. With adapted and const… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  16. arXiv:2406.16370  [pdf, other

    cs.RO

    An Active Search Strategy with Multiple Unmanned Aerial Systems for Multiple Targets

    Authors: Chuanxiang Gao, Xinyi Wang, Xi Chen, Ben M. Chen

    Abstract: The challenge of efficient target searching in vast natural environments has driven the need for advanced multi-UAV active search strategies. This paper introduces a novel method in which global and local information is adeptly merged to avoid issues such as myopia and redundant back-and-forth movements. In addition, a trajectory generation method is used to ensure the search pattern within contin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  17. arXiv:2406.16121  [pdf, other

    cs.LG cs.AI

    Diffusion Spectral Representation for Reinforcement Learning

    Authors: Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai

    Abstract: Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions. Despite existing methods being promising, the key challenge of extending existing methods for broader real-world applications lies in the computational cost at inference time, i.e., sampling from a diffusion model is considerably slow as it… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Under review

  18. arXiv:2406.13706  [pdf, other

    cs.CL cs.AI cs.CY

    Breaking News: Case Studies of Generative AI's Use in Journalism

    Authors: Natalie Grace Brigham, Chongjiu Gao, Tadayoshi Kohno, Franziska Roesner, Niloofar Mireshghallah

    Abstract: Journalists are among the many users of large language models (LLMs). To better understand the journalist-AI interactions, we conduct a study of LLM usage by two news agencies through browsing the WildChat dataset, identifying candidate interactions, and verifying them by matching to online published articles. Our analysis uncovers instances where journalists provide sensitive material such as con… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  19. arXiv:2406.13443  [pdf, other

    cs.CL

    Dual-Phase Accelerated Prompt Optimization

    Authors: Muchen Yang, Moxin Li, Yongle Li, Zijun Chen, Chongming Gao, Junqi Zhang, Yangyang Li, Fuli Feng

    Abstract: Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfa… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  20. arXiv:2406.12235  [pdf, other

    cs.CV

    Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

    Authors: Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Chuchu Han, Xiaonan Huang, Changxin Gao, Yuehuan Wang, Nong Sang

    Abstract: Towards open-ended Video Anomaly Detection (VAD), existing methods often exhibit biased detection when faced with challenging or unseen events and lack interpretability. To address these drawbacks, we propose Holmes-VAD, a novel framework that leverages precise temporal supervision and rich multimodal instructions to enable accurate anomaly localization and comprehensive explanations. Firstly, tow… ▽ More

    Submitted 29 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages, 9 figures

  21. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  22. arXiv:2406.10292  [pdf, other

    cs.AI cs.CL cs.LG

    Automatically Labeling $200B Life-Saving Datasets: A Large Clinical Trial Outcome Benchmark

    Authors: Chufan Gao, Jathurshan Pradeepkumar, Trisha Das, Shivashankar Thati, Jimeng Sun

    Abstract: The global cost of drug discovery and development exceeds $200 billion annually. The main results of drug discovery and development are the outcomes of clinical trials, which directly influence the regulatory approval of new drug candidates and ultimately affect patient outcomes. Despite their significance, large-scale, high-quality clinical trial outcome data are not readily available to the publ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  23. arXiv:2406.09829  [pdf, other

    cs.CV

    Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

    Authors: Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao

    Abstract: Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR2024

  24. arXiv:2406.09395  [pdf, other

    cs.CV

    Modeling Ambient Scene Dynamics for Free-view Synthesis

    Authors: Meng-Li Shih, Jia-Bin Huang, Changil Kim, Rajvi Shah, Johannes Kopf, Chen Gao

    Abstract: We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture bringing a immersive quality to the viewing experience. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes. Previous attempts to extend 3DGS to represent dynamics have been confined to bounded scenes or require m… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  25. arXiv:2406.09333  [pdf, other

    cs.CV

    Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

    Authors: Weiyi Wu, Chongyang Gao, Xinwen Xu, Siting Li, Jiang Gui

    Abstract: Whole Slide Images (WSIs) are crucial for modern pathological diagnosis, yet their gigapixel-scale resolutions and sparse informative regions pose significant computational challenges. Traditional dense attention mechanisms, widely used in computer vision and natural language processing, are impractical for WSI analysis due to the substantial data scale and the redundant processing of uninformativ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  26. Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps

    Authors: Shuqing Li, Cuiyun Gao, Jianping Zhang, Yujia Zhang, Yepang Liu, Jiazhen Gu, Yun Peng, Michael R. Lyu

    Abstract: The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This work has been accepted at the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. DOI: https://doi.org/10.1145/3660803

  27. arXiv:2406.07393  [pdf, other

    cs.CL

    Limited Out-of-Context Knowledge Reasoning in Large Language Models

    Authors: Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant facet of out-of-context reasoning: Out-of-Context… ▽ More

    Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  28. arXiv:2406.05906  [pdf, other

    cs.CL cs.AI

    TTM-RE: Memory-Augmented Document-Level Relation Extraction

    Authors: Chufan Gao, Xuan Wang, Jimeng Sun

    Abstract: Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-q… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted in ACL 2024 Main

  29. arXiv:2406.01188  [pdf, other

    cs.CV

    UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation

    Authors: Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou, Yingya Zhang, Luxin Yan, Nong Sang

    Abstract: Recent diffusion-based human image animation techniques have demonstrated impressive success in synthesizing videos that faithfully follow a given reference identity and a sequence of desired movement poses. Despite this, there are still two limitations: i) an extra reference model is required to align the identity image with the main video branch, which significantly increases the optimization bu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://unianimate.github.io/

  30. arXiv:2406.00380  [pdf, other

    cs.CL cs.AI

    The Best of Both Worlds: Toward an Honest and Helpful Large Language Model

    Authors: Chujie Gao, Qihui Zhang, Dongping Chen, Yue Huang, Siyuan Wu, Zhengyan Fu, Yao Wan, Xiangliang Zhang, Lichao Sun

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various industries due to their exceptional generative capabilities. However, for safe and effective real-world deployments, ensuring honesty and helpfulness is critical. This paper addresses the question: Can we prioritize the helpfulness of LLMs while preserving their honesty? To begin with, we establish exhaustive principles a… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  31. arXiv:2405.20044  [pdf, other

    cs.CV

    A Point-Neighborhood Learning Framework for Nasal Endoscope Image Segmentation

    Authors: Pengyu Jie, Wanquan Liu, Chenqiang Gao, Yihui Wen, Rui He, Pengcheng Li, Jintao Zhang, Deyu Meng

    Abstract: The lesion segmentation on endoscopic images is challenging due to its complex and ambiguous features. Fully-supervised deep learning segmentation methods can receive good performance based on entirely pixel-level labeled dataset but greatly increase experts' labeling burden. Semi-supervised and weakly supervised methods can ease labeling burden, but heavily strengthen the learning difficulty. To… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures,

  32. arXiv:2405.19846  [pdf, other

    cs.CL cs.AI

    Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

    Authors: Chaochen Gao, Xing Wu, Qi Fu, Songlin Hu

    Abstract: Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts. However, obtaining effective long-context data is challenging due to the scarcity and uneven distribution of long documents across different domains. To address this issue, we propose a Query-centric data synthesis method, abbreviated… ▽ More

    Submitted 19 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  33. arXiv:2405.18216  [pdf, other

    cs.SE

    A Survey on Modern Code Review: Progresses, Challenges and Opportunities

    Authors: Zezhou Yang, Cuiyun Gao, Zhaoqiang Guo, Zhenhao Li, Kui Liu, Xin Xia, Yuming Zhou

    Abstract: Over the past decade, modern code review (MCR) has been deemed as a crucial practice of software quality assurance, which is applied to improve software quality and transfer development knowledge within a software team. Despite its importance, MCR is often a complicated and time-consuming activity for practitioners. In recent years, many studies that are dedicated to the comprehension and the impr… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 62 pages

  34. arXiv:2405.15161  [pdf, other

    cs.CR cs.CV

    Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark

    Authors: Huali Ren, Anli Yan, Chong-zhi Gao, Hongyang Yan, Zhenxin Zhang, Jin Li

    Abstract: Visual Prompt Learning (VPL) differs from traditional fine-tuning methods in reducing significant resource consumption by avoiding updating pre-trained model parameters. Instead, it focuses on learning an input perturbation, a visual prompt, added to downstream task data for making predictions. Since learning generalizable prompts requires expert design and creation, which is technically demanding… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures,

  35. arXiv:2405.14377  [pdf, other

    cs.LG cs.AI

    CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

    Authors: Zi Yang, Samridhi Choudhary, Xinfeng Xie, Cao Gao, Siegfried Kunzmann, Zheng Zhang

    Abstract: Training large AI models such as deep learning recommendation systems and foundation language (or multi-modal) models costs massive GPUs and computing time. The high training cost has become only affordable to big tech companies, meanwhile also causing increasing concerns about the environmental impact. This paper presents CoMERA, a Computing- and Memory-Efficient training method via Rank-Adaptive… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  36. arXiv:2405.13816  [pdf, other

    cs.CL

    Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

    Authors: Shimao Zhang, Changjiang Gao, Wenhao Zhu, Jiajun Chen, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Recently, Large Language Models (LLMs) have shown impressive language capabilities. While most of the existing LLMs have very unbalanced performance across different languages, multilingual alignment based on translation parallel data is an effective method to enhance the LLMs' multilingual capabilities. In this work, we discover and comprehensively investigate the spontaneous multilingual alignme… ▽ More

    Submitted 18 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  37. arXiv:2405.12195  [pdf, other

    cs.SE

    Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey

    Authors: Thiago S. Vaillant, Felipe Deveza de Almeida, Paulo Anselmo M. S. Neto, Cuiyun Gao, Jan Bosch, Eduardo Santana de Almeida

    Abstract: As Large Language Models (LLMs), including ChatGPT and analogous systems, continue to advance, their robust natural language processing capabilities and diverse applications have garnered considerable attention. Nonetheless, despite the increasing acknowledgment of the convergence of Artificial Intelligence (AI) and Software Engineering (SE), there is a lack of studies involving the impact of this… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 31 pages, 9 figures

    ACM Class: D.2.0

  38. Modeling User Fatigue for Sequential Recommendation

    Authors: Nian Li, Xin Ban, Cheng Ling, Chen Gao, Lantao Hu, Peng Jiang, Kun Gai, Yong Li, Qingmin Liao

    Abstract: Recommender systems filter out information that meets user interests. However, users may be tired of the recommendations that are too similar to the content they have been exposed to in a short historical period, which is the so-called user fatigue. Despite the significance for a better user experience, user fatigue is seldom explored by existing recommenders. In fact, there are three main challen… ▽ More

    Submitted 22 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: SIGIR 2024

  39. arXiv:2405.11377  [pdf, other

    stat.ML cs.LG stat.ME

    Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model

    Authors: Chenyin Gao, Zhiming Zhang, Shu Yang

    Abstract: This study introduces an innovative method for analyzing the impact of various interventions on customer churn, using the potential outcomes framework. We present a new causal model, the tensorized latent factor block hazard model, which incorporates tensor completion methods for a principled causal analysis of customer churn. A crucial element of our approach is the formulation of a 1-bit tensor… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted for publication in ICML, 2024

  40. arXiv:2405.11233  [pdf, other

    cs.SE

    Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code

    Authors: Yujia Chen, Cuiyun Gao, Zezhou Yang, Hongyu Zhang, Qing Liao

    Abstract: In the field of code intelligence, effectively modeling long-range code poses a significant challenge. Existing pre-trained language models (PLMs) such as UniXcoder have achieved remarkable success, but they still face difficulties with long code inputs. This is mainly due to their limited capacity to maintain contextual continuity and memorize the key information over long-range code. To alleviat… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted by ISSTA 2024

  41. arXiv:2405.09508  [pdf, other

    cs.CL cs.LG

    Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming

    Authors: Bushi Xiao, Chao Gao, Demi Zhang

    Abstract: This study evaluates the performance of Recurrent Neural Network (RNN) and Transformer in replicating cross-language structural priming: a key indicator of abstract grammatical representations in human language processing. Focusing on Chinese-English priming, which involves two typologically distinct languages, we examine how these models handle the robust phenomenon of structural priming, where e… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 9 pages, 6 figures

  42. Treatment Effect Estimation for User Interest Exploration on Recommender Systems

    Authors: Jiaju Chen, Wenjie Wang, Chongming Gao, Peng Wu, Jianxiong Wei, Qingsong Hua

    Abstract: Recommender systems learn personalized user preferences from user feedback like clicks. However, user feedback is usually biased towards partially observed interests, leaving many users' hidden interests unexplored. Existing approaches typically mitigate the bias, increase recommendation diversity, or use bandit algorithms to balance exploration-exploitation trade-offs. Nevertheless, they fail to… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGIR 2024

  43. arXiv:2405.06964  [pdf, other

    cs.RO cs.AI

    ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots

    Authors: Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianyi Chen, Zhouliang Yu, Lin Shao

    Abstract: To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to dev… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  44. arXiv:2405.06662  [pdf, ps, other

    q-bio.BM cs.CL cs.LG

    Language Interaction Network for Clinical Trial Approval Estimation

    Authors: Chufan Gao, Tianfan Fu, Jimeng Sun

    Abstract: Clinical trial outcome prediction seeks to estimate the likelihood that a clinical trial will successfully reach its intended endpoint. This process predominantly involves the development of machine learning models that utilize a variety of data sources such as descriptions of the clinical trials, characteristics of the drug molecules, and specific disease conditions being targeted. Accurate predi… ▽ More

    Submitted 26 April, 2024; originally announced May 2024.

  45. arXiv:2405.06227  [pdf, other

    cs.CV

    MaskMatch: Boosting Semi-Supervised Learning Through Mask Autoencoder-Driven Feature Learning

    Authors: Wenjin Zhang, Keyi Li, Sen Yang, Chenyang Gao, Wanzhao Yang, Sifan Yuan, Ivan Marsic

    Abstract: Conventional methods in semi-supervised learning (SSL) often face challenges related to limited data utilization, mainly due to their reliance on threshold-based techniques for selecting high-confidence unlabeled data during training. Various efforts (e.g., FreeMatch) have been made to enhance data utilization by tweaking the thresholds, yet none have managed to use 100% of the available data. To… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  46. arXiv:2405.04994  [pdf, other

    cs.SE

    NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair

    Authors: Ruoke Wang, Zongjie Li, Chaozheng Wang, Yang Xiao, Cuiyun Gao

    Abstract: The rapid advancement of deep learning has led to the development of Large Language Models (LLMs). In the field of vulnerability repair, previous research has leveraged rule-based fixing, pre-trained models, and LLM's prompt engineering. However, existing approaches have limitations in terms of the integration of code structure with error types. Besides, due to certain features of C/C++ language,… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  47. arXiv:2405.04044  [pdf, other

    cs.CV

    DMOFC: Discrimination Metric-Optimized Feature Compression

    Authors: Changsheng Gao, Yiheng Jiang, Li Li, Dong Liu, Feng Wu

    Abstract: Feature compression, as an important branch of video coding for machines (VCM), has attracted significant attention and exploration. However, the existing methods mainly focus on intra-feature similarity, such as the Mean Squared Error (MSE) between the reconstructed and original features, while neglecting the importance of inter-feature relationships. In this paper, we analyze the inter-feature r… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  48. arXiv:2405.03905  [pdf, other

    cs.AR cs.CV cs.SD eess.AS

    A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

    Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network (ΔRNN) cla… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  49. arXiv:2405.03652  [pdf

    cs.CV

    Field-of-View Extension for Diffusion MRI via Deep Generative Models

    Authors: Chenyu Gao, Shunxing Bao, Michael Kim, Nancy Newlin, Praitayini Kanakaraj, Tianyuan Yao, Gaurav Rudravaram, Yuankai Huo, Daniel Moyer, Kurt Schilling, Walter Kukull, Arthur Toga, Derek Archer, Timothy Hohman, Bennett Landman, Zhiyuan Li

    Abstract: Purpose: In diffusion MRI (dMRI), the volumetric and bundle analyses of whole-brain tissue microstructure and connectivity can be severely impeded by an incomplete field-of-view (FOV). This work aims to develop a method for imputing the missing slices directly from existing dMRI scans with an incomplete FOV. We hypothesize that the imputed image with complete FOV can improve the whole-brain tracto… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 20 pages, 11 figures

  50. arXiv:2405.01615  [pdf, other

    cs.NE cs.LG

    Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning

    Authors: Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu

    Abstract: Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES, that all input features are task-… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 pages, including proofs in the appendix