Skip to main content

Showing 1–50 of 432 results for author: Zhou, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12940  [pdf, other

    cs.RO cs.CV

    KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation

    Authors: Jianbo Zhao, Jiaheng Zhuang, Qibin Zhou, Taiyu Ban, Ziyao Xu, Hangning Zhou, Junhe Wang, Guoan Wang, Zhiheng Li, Bin Li

    Abstract: Trajectory generation is a pivotal task in autonomous driving. Recent studies have introduced the autoregressive paradigm, leveraging the state transition model to approximate future trajectory distributions. This paradigm closely mirrors the real-world trajectory generation process and has achieved notable success. However, its potential is limited by the ineffective representation of realistic t… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12428  [pdf, other

    cs.SE

    Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

    Authors: Haipeng Wang, Zhengyuan Wei, Qilin Zhou, Wing-Kwong Chan

    Abstract: In the testing-retraining pipeline for enhancing the robustness property of deep learning (DL) models, many state-of-the-art robustness-oriented fuzzing techniques are metric-oriented. The pipeline generates adversarial examples as test cases via such a DL testing technique and retrains the DL model under test with test suites that contain these test cases. On the one hand, the strategies of these… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: The official version of this paper is to appear in ACM Transactions on Software Engineering and Methodology (accepted in July 2024)

  3. arXiv:2407.08801  [pdf, other

    cs.CV

    DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

    Authors: Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang

    Abstract: Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  4. arXiv:2407.08019  [pdf, other

    cs.CV

    Coherent and Multi-modality Image Inpainting via Latent Space Optimization

    Authors: Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: With the advancements in denoising diffusion probabilistic models (DDPMs), image inpainting has significantly evolved from merely filling information based on nearby regions to generating content conditioned on various prompts such as text, exemplar images, and sketches. However, existing methods, such as model fine-tuning and simple concatenation of latent vectors, often result in generation fail… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  5. arXiv:2407.07325  [pdf, other

    cs.CV cs.CL cs.MM eess.IV

    HiLight: Technical Report on the Motern AI Video Language Model

    Authors: Zhiting Wang, Qiangong Zhou, Kangjie Yang, Zongyang Liu, Xin Mao

    Abstract: This technical report presents the implementation of a state-of-the-art video encoder for video-text modal alignment and a video conversation framework called HiLight, which features dual visual towers. The work is divided into two main parts: 1.alignment of video and text modalities; 2.convenient and efficient way to interact with users. Our goal is to address the task of video comprehension in t… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.05396  [pdf, other

    cs.CR cs.AI

    Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

    Authors: Qi Zhou, Zipeng Ye, Yubo Tang, Wenjian Luo, Yuhui Shi, Yan Jia

    Abstract: Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effective… ▽ More

    Submitted 14 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  7. arXiv:2407.05285  [pdf, other

    cs.LG cs.AI cs.CR

    Gradient Diffusion: A Perturbation-Resilient Gradient Leakage Attack

    Authors: Xuan Liu, Siqi Cai, Qihua Zhou, Song Guo, Ruibin Li, Kaiwei Lin

    Abstract: Recent years have witnessed the vulnerability of Federated Learning (FL) against gradient leakage attacks, where the private training data can be recovered from the exchanged gradients, making gradient protection a critical issue for the FL training process. Existing solutions often resort to perturbation-based mechanisms, such as differential privacy, where each participating client injects a spe… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  8. arXiv:2407.04057  [pdf, other

    cs.LG

    TALENT: A Tabular Analytics and Learning Toolbox

    Authors: Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, Han-Jia Ye

    Abstract: Tabular data is one of the most common data sources in machine learning. Although a wide range of classical methods demonstrate practical utilities in this field, deep learning methods on tabular data are becoming promising alternatives due to their flexibility and ability to capture complex interactions within the data. Considering that deep tabular methods have diverse design philosophies, inclu… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  9. arXiv:2407.00956  [pdf, other

    cs.LG

    A Closer Look at Deep Learning on Tabular Data

    Authors: Han-Jia Ye, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, De-Chuan Zhan

    Abstract: Tabular data is prevalent across various domains in machine learning. Although Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones, in-depth evaluation of these methods is challenging due to varying performance ranks across diverse datasets. In this paper, we propose a comprehensive benchmark comprising 300 tabular datasets, covering a wide range… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.00934  [pdf, other

    cs.CL

    CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

    Authors: Jingheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su

    Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 16 pages, 8 tables, 2 figures. Under review

  11. arXiv:2406.17758  [pdf, other

    cs.CV

    MotionBooth: Motion-Aware Customized Text-to-Video Generation

    Authors: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

    Abstract: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance t… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project page at https://jianzongwu.github.io/projects/motionbooth

  12. arXiv:2406.13154  [pdf, other

    stat.ML cs.AI cs.LG

    Conditional score-based diffusion models for solving inverse problems in mechanics

    Authors: Agnimitra Dasgupta, Harisankar Ramaswamy, Javier Murgoitio Esandi, Ken Foo, Runze Li, Qifa Zhou, Brendan Kennedy, Assad Oberai

    Abstract: We propose a framework to perform Bayesian inference using conditional score-based diffusion models to solve a class of inverse problems in mechanics involving the inference of a specimen's spatially varying material properties from noisy measurements of its mechanical response to loading. Conditional score-based diffusion models are generative models that learn to approximate the score function o… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  13. arXiv:2406.12670  [pdf, other

    cs.AI cs.LG

    Stealth edits for provably fixing or attacking large language models

    Authors: Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin

    Abstract: We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundament… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 9 figures. Open source implementation: https://github.com/qinghua-zhou/stealth-edits

    MSC Class: 68T07; 68T50; 68W40 ACM Class: I.2.7; F.2.0

  14. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  15. arXiv:2406.09201  [pdf, other

    cs.CV

    Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

    Authors: Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou, Boning Wang, Yansong Peng, Hebei Li

    Abstract: In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including… ▽ More

    Submitted 21 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Journal ref: Second Place in CVPR 2024 Vast Vocabulary Visual Detection Challenge

  16. arXiv:2406.08634  [pdf, other

    eess.IV cs.CV cs.LG

    Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning

    Authors: Zhongao Sun, Jiameng Li, Yuhan Wang, Jiarong Cheng, Qing Zhou, Chun Li

    Abstract: Brain tumor segmentation remains a significant challenge, particularly in the context of multi-modal magnetic resonance imaging (MRI) where missing modality images are common in clinical settings, leading to reduced segmentation accuracy. To address this issue, we propose a novel strategy, which is called masked predicted pre-training, enabling robust feature learning from incomplete modality data… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.05852  [pdf, other

    cs.CV cs.GR

    RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering

    Authors: Rui Zhang, Tianyue Luo, Weidong Yang, Ben Fei, Jingyi Xu, Qingyuan Zhou, Keyi Liu, Ying He

    Abstract: 3D Gaussian Splatting (3D-GS) has made a notable advancement in the field of neural rendering, 3D scene reconstruction, and novel view synthesis. Nevertheless, 3D-GS encounters the main challenge when it comes to accurately representing physical reflections, especially in the case of total reflection and semi-reflection that are commonly found in real-world scenes. This limitation causes reflectio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  18. arXiv:2406.05397  [pdf, other

    cs.SE

    Metamorphic Relation Generation: State of the Art and Visions for Future Research

    Authors: Rui Li, Huai Liu, Pak-Lok Poon, Dave Towey, Chang-Ai Sun, Zheng Zheng, Zhi Quan Zhou, Tsong Yueh Chen

    Abstract: Metamorphic testing has become one mainstream technique to address the notorious oracle problem in software testing, thanks to its great successes in revealing real-life bugs in a wide variety of software systems. Metamorphic relations, the core component of metamorphic testing, have continuously attracted research interests from both academia and industry. In the last decade, a rapidly increasing… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by International Workshop on Software Engineering in 2030

  19. arXiv:2405.18649  [pdf, other

    cs.CL cs.AI cs.SE

    Training LLMs to Better Self-Debug and Explain Code

    Authors: Nan Jiang, Xiaopeng Li, Shiqi Wang, Qiang Zhou, Soneya Binta Hossain, Baishakhi Ray, Varun Kumar, Xiaofei Ma, Anoop Deoras

    Abstract: In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourc… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  20. arXiv:2405.18258  [pdf, other

    cs.CV cs.AI cs.CL

    Text-only Synthesis for Image Captioning

    Authors: Qing Zhou, Junlin Huang, Qiang Li, Junyu Gao, Qi Wang

    Abstract: From paired image-text training to text-only training for image captioning, the pursuit of relaxing the requirements for high-cost and large-scale annotation of good quality data remains consistent. In this paper, we propose Text-only Synthesis for Image Captioning (ToCa), which further advances this relaxation with fewer human labor and less computing time. Specifically, we deconstruct caption te… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  21. arXiv:2405.16940  [pdf, other

    cs.CV

    Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

    Authors: Fengfan Zhou, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Lizhuang Ma, Hefei Ling

    Abstract: Adversarial attacks on Face Recognition (FR) systems have proven highly effective in compromising pure FR models, yet adversarial examples may be ineffective to the complete FR systems as Face Anti-Spoofing (FAS) models are often incorporated and can detect a significant number of them. To address this under-explored and essential problem, we propose a novel setting of adversarially attacking both… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  22. arXiv:2405.16126  [pdf, other

    math.OC cs.LG

    Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

    Authors: Qihao Zhou, Haishan Ye, Luo Luo

    Abstract: This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within commun… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  23. arXiv:2405.15358  [pdf, ps, other

    stat.ML cs.LG

    Coordinated Multi-Neighborhood Learning on a Directed Acyclic Graph

    Authors: Stephen Smith, Qing Zhou

    Abstract: Learning the structure of causal directed acyclic graphs (DAGs) is useful in many areas of machine learning and artificial intelligence, with wide applications. However, in the high-dimensional setting, it is challenging to obtain good empirical and theoretical results without strong and often restrictive assumptions. Additionally, it is questionable whether all of the variables purported to be in… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures

  24. arXiv:2405.14444  [pdf

    cs.CV

    DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

    Authors: Yitong Yang, Xinli Xu, Haigen Hu, Haixia Long, Qianwei Zhou, Qiu Guan

    Abstract: Despite the recent progress in medical image segmentation with scribble-based annotations, the segmentation results of most models are still not ro-bust and generalizable enough in open environments. Evidential deep learn-ing (EDL) has recently been proposed as a promising solution to model predictive uncertainty and improve the reliability of medical image segmen-tation. However directly applying… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures

  25. arXiv:2405.13872  [pdf, other

    cs.AI cs.CL cs.CV

    Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

    Authors: Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT ha… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Correct the case title

  26. StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning

    Authors: Yi Wang, Qian Zhou, David Ledo

    Abstract: Automated plot generation for games enhances the player's experience by providing rich and immersive narrative experience that adapts to the player's actions. Traditional approaches adopt a symbolic narrative planning method which limits the scale and complexity of the generated plot by requiring extensive knowledge engineering work. Recent advancements use Large Language Models (LLMs) to drive th… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Proceedings of the 19th international conference on the foundations of digital games 2024

  27. arXiv:2405.12914  [pdf, other

    cs.CV

    An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation

    Authors: Zhiyu Tan, Mengping Yang, Luozheng Qin, Hao Yang, Ye Qian, Qiang Zhou, Cheng Zhang, Hao Li

    Abstract: One critical prerequisite for faithful text-to-image generation is the accurate understanding of text inputs. Existing methods leverage the text encoder of the CLIP model to represent input prompts. However, the pre-trained CLIP model can merely encode English with a maximum token length of 77. Moreover, the model capacity of the text encoder from CLIP is relatively limited compared to Large Langu… ▽ More

    Submitted 18 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: To appear in ECCV-2024, Project page: https://llm-conditioned-diffusion.github.io/

  28. arXiv:2405.12543  [pdf, other

    cs.CV cs.AI

    Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text

    Authors: Yuyu Jia, Qing Zhou, Wei Huang, Junyu Gao, Qi Wang

    Abstract: Few-shot learning aims to generalize the recognizer from seen categories to an entirely novel scenario. With only a few support samples, several advanced methods initially introduce class names as prior knowledge for identifying novel classes. However, obstacles still impede achieving a comprehensive understanding of how to harness the mutual advantages of visual and textual knowledge. In this pap… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  29. arXiv:2405.07668  [pdf, other

    cs.SE cs.AI cs.CR

    CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

    Authors: Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, W. K. Chan

    Abstract: Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, ex… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 23 pages, 2 figures, accepted by FSE 2024 (The ACM International Conference on the Foundations of Software Engineering)

  30. arXiv:2405.04536  [pdf, other

    cs.CV cs.AI cs.LG

    When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel Perspective

    Authors: Qiqi Zhou, Yichen Zhu

    Abstract: This paper investigates the Neural Tangent Kernel (NTK) to search vision transformers without training. In contrast with the previous observation that NTK-based metrics can effectively predict CNNs performance at initialization, we empirically show their inefficacy in the ViT search space. We hypothesize that the fundamental feature learning preference within ViT contributes to the ineffectiveness… ▽ More

    Submitted 15 March, 2024; originally announced May 2024.

    Comments: ICASSP2024 oral

  31. arXiv:2405.02653  [pdf, other

    cs.AI

    Isopignistic Canonical Decomposition via Belief Evolution Network

    Authors: Qianli Zhou, Tianxiang Zhan, Yong Deng

    Abstract: Developing a general information processing model in uncertain environments is fundamental for the advancement of explainable artificial intelligence. Dempster-Shafer theory of evidence is a well-known and effective reasoning method for representing epistemic uncertainty, which is closely related to subjective probability theory and possibility theory. Although they can be transformed to each othe… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  32. arXiv:2404.18155  [pdf, other

    cs.CV

    ShapeMoiré: Channel-Wise Shape-Guided Network for Image Demoiréing

    Authors: Jinming Cao, Sicheng Shen, Qiu Zhou, Yifang Yin, Yangyan Li, Roger Zimmermann

    Abstract: Photographing optoelectronic displays often introduces unwanted moiré patterns due to analog signal interference between the pixel grids of the display and the camera sensor arrays. This work identifies two problems that are largely ignored by existing image demoiréing approaches: 1) moiré patterns vary across different channels (RGB); 2) repetitive patterns are constantly observed. However, emplo… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 12 pages

  33. arXiv:2404.11996  [pdf, other

    cs.AI

    DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

    Authors: Songtao Huang, Hongjin Song, Tianqi Jiang, Akbar Telikani, Jun Shen, Qingguo Zhou, Binbin Yong, Qiang Wu

    Abstract: Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  34. arXiv:2404.11595  [pdf, other

    cs.SE

    A Deep Dive into Large Language Models for Automated Bug Localization and Repair

    Authors: Soneya Binta Hossain, Nan Jiang, Qiang Zhou, Xiaopeng Li, Wen-Hao Chiang, Yingjun Lyu, Hoan Nguyen, Omer Tripp

    Abstract: Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug fixing utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our… ▽ More

    Submitted 10 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  35. arXiv:2404.09778  [pdf, other

    cs.CV

    The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

    Authors: Yaohui Li, Qifeng Zhou, Haoxing Chen, Jianbing Zhang, Xinyu Dai, Hao Zhou

    Abstract: Contrastive Language-Image Pre-training (CLIP) has shown powerful zero-shot learning performance. Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'. Most existing methods either implicitly learn from the few shots by incorporating learnable prompts or adapters, or explicitly embed them in a cache model for inference. Howev… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  36. arXiv:2404.09245  [pdf, other

    cs.MM cs.CV

    Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

    Authors: Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Qihua Zhou, Yuanqing Xia

    Abstract: The advent of edge computing has made real-time intelligent video analytics feasible. Previous works, based on traditional model architecture (e.g., CNN, RNN, etc.), employ various strategies to filter out non-region-of-interest content to minimize bandwidth and computation consumption but show inferior performance in adverse environments. Recently, visual foundation models based on transformers h… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  37. arXiv:2404.07794  [pdf, other

    cs.CV

    DGMamba: Domain Generalization via Generalized State Space Model

    Authors: Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan

    Abstract: Domain generalization~(DG) aims at solving distribution shift problems in various scenes. Existing approaches are based on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs), which suffer from limited receptive fields or quadratic complexities issues. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. Despite this, it can… ▽ More

    Submitted 9 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  38. arXiv:2404.06892  [pdf, other

    cs.CV

    SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

    Authors: Diankun Zhang, Guoan Wang, Runwen Zhu, Jianbo Zhao, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, Feiyang Tan, Hangning Zhou, Ziyao Xu, Haotian Yao, Chi Zhang, Xiaojun Liu, Xiaoguang Di, Bin Li

    Abstract: End-to-End paradigms use a unified framework to implement multi-tasks in an autonomous driving system. Despite simplicity and clarity, the performance of end-to-end autonomous driving methods on sub-tasks is still far behind the single-task methods. Meanwhile, the widely used dense BEV features in previous end-to-end methods make it costly to extend to more modalities or tasks. In this paper, we p… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  39. arXiv:2404.05522  [pdf, other

    cs.MM

    3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

    Authors: Qingyuan Zhou, Weidong Yang, Ben Fei, Jingyi Xu, Rui Zhang, Keyi Liu, Yeqi Luo, Ying He

    Abstract: Noise is an inevitable aspect of point cloud acquisition, necessitating filtering as a fundamental task within the realm of 3D vision. Existing learning-based filtering methods have shown promising capabilities on small-scale synthetic or real-world datasets. Nonetheless, the effectiveness of these methods is constrained when dealing with a substantial quantity of point clouds. This limitation pri… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  40. arXiv:2404.04859  [pdf, other

    cs.LG stat.ML

    Demystifying Lazy Training of Neural Networks from a Macroscopic Viewpoint

    Authors: Yuqing Li, Tao Luo, Qixuan Zhou

    Abstract: In this paper, we advance the understanding of neural network training dynamics by examining the intricate interplay of various factors introduced by weight parameters in the initialization process. Motivated by the foundational work of Luo et al. (J. Mach. Learn. Res., Vol. 22, Iss. 1, No. 71, pp 3327-3373), we explore the gradient descent dynamics of neural networks through the lens of macroscop… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  41. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  42. arXiv:2404.00947  [pdf, other

    cs.IR

    Towards an In-Depth Comprehension of Case Relevance for Better Legal Retrieval

    Authors: Haitao Li, You Chen, Zhekai Ge, Qingyao Ai, Yiqun Liu, Quan Zhou, Shuai Huo

    Abstract: Legal retrieval techniques play an important role in preserving the fairness and equality of the judicial system. As an annually well-known international competition, COLIEE aims to advance the development of state-of-the-art retrieval models for legal texts. This paper elaborates on the methodology employed by the TQM team in COLIEE2024.Specifically, we explored various lexical matching and seman… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 16 pages

  43. arXiv:2404.00257   

    cs.CV cs.AI cs.LG eess.IV

    YOLOOC: YOLO-based Open-Class Incremental Object Detection with Novel Class Discovery

    Authors: Qian Wan, Xiang Xiang, Qinhao Zhou

    Abstract: Because of its use in practice, open-world object detection (OWOD) has gotten a lot of attention recently. The challenge is how can a model detect novel classes and then incrementally learn them without forgetting previously known classes. Previous approaches hinge on strongly-supervised or weakly-supervised novel-class data for novel-class detection, which may not apply to real applications. We c… ▽ More

    Submitted 22 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Withdrawn because it was submitted without consent of the first author. In addition, this submission has some errors

  44. arXiv:2403.19979  [pdf, other

    cs.CV cs.AI cs.LG

    Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer

    Authors: Yuwen Tan, Qinhao Zhou, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

    Abstract: Class-incremental learning (CIL) aims to enable models to continuously learn new classes while overcoming catastrophic forgetting. The introduction of pre-trained models has brought new tuning paradigms to CIL. In this paper, we revisit different parameter-efficient tuning (PET) methods within the context of continual learning. We observe that adapter tuning demonstrates superiority over prompt-ba… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: To appear at CVPR 2024

  45. arXiv:2403.19962  [pdf, other

    cs.CL cs.AI cs.LG

    Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning

    Authors: Qinhao Zhou, Zihan Zhang, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

    Abstract: Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities, making them highly successful in a variety of tasks. However, when used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4. As intelligent agents, LLMs need to have the capabilities… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: To appear at NAACL 2024

  46. arXiv:2403.19737  [pdf, ps, other

    math.CO cs.CG

    Piercing independent sets in graphs without large induced matching

    Authors: Jiangdong Ai, Hong Liu, Zixiang Xu, Qiang Zhou

    Abstract: Given a graph $G$, denote by $h(G)$ the smallest size of a subset of $V(G)$ which intersects every maximum independent set of $G$. We prove that any graph $G$ without induced matching of size $t$ satisfies $h(G)\le ω(G)^{3t-3+o(1)}$. This resolves a conjecture of Hajebi, Li and Spirkl (Hitting all maximum stable sets in $P_{5}$-free graphs, JCTB 2024).

    Submitted 28 March, 2024; originally announced March 2024.

  47. arXiv:2403.19334  [pdf, other

    cs.CV

    Test-Time Domain Generalization for Face Anti-Spoofing

    Authors: Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma

    Abstract: Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. While domain generalization (DG) methods have been developed to enhance FAS performance, they predominantly focus on learning domain-invariant features during training, which may not guarantee generalizability to unseen data that differs largely from the source distributions. Our insight is… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  48. arXiv:2403.18551  [pdf, other

    cs.CV

    Attention Calibration for Disentangled Text-to-Image Personalization

    Authors: Yanbing Zhang, Mengping Yang, Qin Zhou, Zhe Wang

    Abstract: Recent thrilling progress in large-scale text-to-image (T2I) models has unlocked unprecedented synthesis quality of AI-generated content (AIGC) including image generation, 3D and video composition. Further, personalized techniques enable appealing customized production of a novel concept given only several images as reference. However, an intriguing problem persists: Is it possible to capture mult… ▽ More

    Submitted 11 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 (Oral)

  49. arXiv:2403.12052  [pdf, other

    cs.CV

    A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models

    Authors: Rui Ma, Qiang Zhou, Yizhu Jin, Daquan Zhou, Bangjun Xiao, Xiuyu Li, Yi Qu, Aishani Singh, Kurt Keutzer, Jingtong Hu, Xiaodong Xie, Zhen Dong, Shanghang Zhang, Shiji Zhou

    Abstract: Copyright law confers upon creators the exclusive rights to reproduce, distribute, and monetize their creative works. However, recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. These technologies enable the unauthorized learning and replication of copyrighted content, artistic creations, and likenesses, leading to the proliferation of unregu… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 January, 2024; originally announced March 2024.

    Comments: 20 pages, 7 figures, 3 table

  50. The Effects of Generative AI on Design Fixation and Divergent Thinking

    Authors: Samangi Wadinambiarachchi, Ryan M. Kelly, Saumya Pareek, Qiushi Zhou, Eduardo Velloso

    Abstract: Generative AI systems have been heralded as tools for augmenting human creativity and inspiring divergent thinking, though with little empirical evidence for these claims. This paper explores the effects of exposure to AI-generated images on measures of design fixation and divergent thinking in a visual ideation task. Through a between-participants experiment (N=60), we found that support from an… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted at the CHI Conference on Human Factors in Computing Systems (CHI 24),18 pages, 15 figures,