Skip to main content

Showing 1–50 of 296 results for author: Hu, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10687  [pdf, other

    cs.CV cs.GR

    FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation

    Authors: Honghao Xu, Juzhan Xu, Zeyu Huang, Pengfei Xu, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce a novel method called FRI-Net for 2D floorplan reconstruction from 3D point cloud. Existing methods typically rely on corner regression or box regression, which lack consideration for the global shapes of rooms. To address these issues, we propose a novel approach using a room-wise implicit representation with structural regularization to characterize the shapes of room… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  2. arXiv:2407.08348  [pdf, other

    cs.AI cs.CL cs.LG

    Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

    Authors: Liang Zeng, Liangjun Zhong, Liang Zhao, Tianwen Wei, Liu Yang, Jujie He, Cheng Cheng, Rui Hu, Yang Liu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2407.08093  [pdf, other

    eess.IV cs.AI cs.CV eess.SP

    MemWarp: Discontinuity-Preserving Cardiac Registration with Memorized Anatomical Filters

    Authors: Hang Zhang, Xiang Chen, Renjiu Hu, Dongdong Liu, Gaolei Li, Rongguang Wang

    Abstract: Many existing learning-based deformable image registration methods impose constraints on deformation fields to ensure they are globally smooth and continuous. However, this assumption does not hold in cardiac image registration, where different anatomical regions exhibit asymmetric motions during respiration and movements due to sliding organs within the chest. Consequently, such global constraint… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 2 figure, 2 tables

  4. arXiv:2407.05578  [pdf, other

    cs.CV

    FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance

    Authors: Jiedong Zhuang, Jiaqi Hu, Lianrui Mu, Rui Hu, Xiaoyu Liang, Jiangnan Ye, Haoji Hu

    Abstract: CLIP has achieved impressive zero-shot performance after pre-training on a large-scale dataset consisting of paired image-text data. Previous works have utilized CLIP by incorporating manually designed visual prompts like colored circles and blur masks into the images to guide the model's attention, showing enhanced zero-shot performance in downstream tasks. Although these methods have achieved pr… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV2024

  5. arXiv:2407.01945  [pdf, other

    cs.CV

    Indoor 3D Reconstruction with an Unknown Camera-Projector Pair

    Authors: Zhaoshuai Qi, Yifeng Hao, Rui Hu, Wenyou Chang, Jiaqi Yang, Yanning Zhang

    Abstract: Structured light-based method with a camera-projector pair (CPP) plays a vital role in indoor 3D reconstruction, especially for scenes with weak textures. Previous methods usually assume known intrinsics, which are pre-calibrated from known objects, or self-calibrated from multi-view observations. It is still challenging to reliably recover CPP intrinsics from only two views without any known obje… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  6. arXiv:2406.20076  [pdf, other

    cs.CV

    EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

    Authors: Yuxuan Zhang, Tianheng Cheng, Rui Hu, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

    Abstract: Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Preprint. Code and models are available at: https://github.com/hustvl/EVF-SAM

  7. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, Jing Sun, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (11 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo… ▽ More

    Submitted 9 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.11145  [pdf, other

    cs.CV

    Federated Face Forgery Detection Learning with Personalized Representation

    Authors: Decheng Liu, Zhan Dang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat. Traditional forgery detection methods directly centralized training on data and lacked consideration of information sharing in non-public video data scenarios and data privacy. Naturally, the federated learning strategy can be applied for privacy protection, which aggregates m… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: The code is publicly available

  9. arXiv:2406.10933  [pdf, other

    cs.CV

    Improving Adversarial Robustness via Decoupled Visual Representation Masking

    Authors: Decheng Liu, Tao Chen, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Deep neural networks are proven to be vulnerable to fine-designed adversarial examples, and adversarial defense algorithms draw more and more attention nowadays. Pre-processing based defense is a major strategy, as well as learning robust feature representation has been proven an effective way to boost generalization. However, existing defense works lack considering different depth-level visual fe… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: The code is publicly available

  10. arXiv:2406.10125  [pdf, other

    cs.CV

    MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

    Authors: Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu

    Abstract: Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  11. arXiv:2406.06563  [pdf, other

    cs.CL cs.AI

    Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

    Authors: Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  12. arXiv:2406.01069  [pdf, other

    cs.CV

    UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

    Authors: Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li

    Abstract: Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. However, they neglect the underlying interconnectedness of both tasks, which hinders the learning of task-agnostic shared representations for hu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2406.00605  [pdf, other

    cs.CL cs.AI

    LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  14. arXiv:2405.19740  [pdf, other

    cs.CL cs.AI cs.CY

    PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

    Authors: Jiatong Li, Renjun Hu, Kunzhe Huang, Yan Zhuang, Qi Liu, Mengxiao Zhu, Xing Shi, Wei Lin

    Abstract: Expert-designed close-ended benchmarks serve as vital tools in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through k… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 12 figures, 10 tables

  15. arXiv:2405.19433  [pdf, other

    cs.CL

    Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals

    Authors: Yupei Wang, Renfen Hu, Zhe Zhao

    Abstract: While current automated essay scoring (AES) methods show high agreement with human raters, their scoring mechanisms are not fully explored. Our proposed method, using counterfactual intervention assisted by Large Language Models (LLMs), reveals that when scoring essays, BERT-like models primarily focus on sentence-level features, while LLMs are attuned to conventions, language complexity, as well… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  16. arXiv:2405.13325  [pdf, other

    cs.CL cs.AI cs.IR

    DEGAP: Dual Event-Guided Adaptive Prefixes for Templated-Based Event Argument Extraction with Slot Querying

    Authors: Guanghui Wang, Dexi Liu, Jian-Yun Nie, Qizhi Wan, Rong Hu, Xiping Liu, Wanlong Liu, Jiaming Liu

    Abstract: Recent advancements in event argument extraction (EAE) involve incorporating useful auxiliary information into models during training and inference, such as retrieved instances and event templates. These methods face two challenges: (1) the retrieval results may be irrelevant and (2) templates are developed independently for each event without considering their possible relationship. In this work,… ▽ More

    Submitted 15 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  17. LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model

    Authors: Haowen Sun, Ruikun Zheng, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific m… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 9 pages,7 figures, SIGGRAPH 2024

  18. arXiv:2405.03221  [pdf, other

    cs.CV cs.GR cs.LG

    Spatial and Surface Correspondence Field for Interaction Transfer

    Authors: Zeyu Huang, Honghao Xu, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce a new method for the task of interaction transfer. Given an example interaction between a source object and an agent, our method can automatically infer both surface and spatial relationships for the agent and target objects within the same category, yielding more accurate and valid transfers. Specifically, our method characterizes the example interaction using a combin… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGGRAPH 2024, project page at https://vcc.tech/research/2024/InterTransfer

  19. arXiv:2405.01258  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Consistent Object Detection via LiDAR-Camera Synergy

    Authors: Kai Luo, Hao Wu, Kefu Yi, Kailun Yang, Wei Hao, Rongdong Hu

    Abstract: As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. However, currently, no model exists that can simultaneously detect an object's position in both point clouds and images and ascertain their corresponding relatio… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/xifen523/COD

  20. arXiv:2404.17569  [pdf, other

    cs.CV

    MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

    Authors: Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

    Abstract: This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and… ▽ More

    Submitted 25 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: SIGGRAPH 2024. Project page: https://zju3dv.github.io/MaPa

  21. arXiv:2404.15596  [pdf, other

    cs.SE cs.CR

    VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection

    Authors: Xin-Cheng Wen, Xinchen Wang, Yujia Chen, Ruida Hu, David Lo, Cuiyun Gao

    Abstract: Deep Learning (DL)-based methods have proven to be effective for software vulnerability detection, with a potential for substantial productivity enhancements for detecting vulnerabilities. Current methods mainly focus on detecting single functions (i.e., intra-procedural vulnerabilities), ignoring the more complex inter-procedural vulnerability detection scenarios in practice. For example, develop… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 12 pages

  22. arXiv:2404.14949  [pdf, other

    cs.CV

    Multi-Modal Prompt Learning on Blind Image Quality Assessment

    Authors: Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

    Abstract: Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semant… ▽ More

    Submitted 18 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  23. arXiv:2404.13279  [pdf, other

    cs.CR eess.IV eess.SP

    Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications

    Authors: Yuan Zhou, Rose Qingyang Hu, Yi Qian

    Abstract: Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic comm… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE ICC 2024

  24. arXiv:2404.11035  [pdf, other

    cs.IT cs.DC cs.NI

    Approximate Wireless Communication for Lossy Gradient Updates in IoT Federated Learning

    Authors: Xiang Ma, Haijian Sun, Rose Qingyang Hu, Yi Qian

    Abstract: Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency. Instead of sharing raw data, FL exchanges intermediate learning parameters, such as gradients, among clients. This article presents an efficient wireless communication approach tailored for FL parameter transmission, espec… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: submitted to IEEE journals for publication

  25. arXiv:2404.10342  [pdf, other

    cs.CV cs.MM

    Referring Flexible Image Restoration

    Authors: Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

    Abstract: In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 15 pages, 19 figures

  26. arXiv:2404.10332  [pdf, other

    cs.CV cs.AI

    Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning

    Authors: Rui Hu, Yahan Tu, Jitao Sang

    Abstract: Despite achieving outstanding performance on various cross-modal tasks, current large vision-language models (LVLMs) still suffer from hallucination issues, manifesting as inconsistencies between their generated responses and the corresponding images. Prior research has implicated that the low quality of instruction data, particularly the skewed balance between positive and negative samples, is a… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  27. arXiv:2404.09150  [pdf, other

    cs.RO cs.GR

    Learning Cross-hand Policies for High-DOF Reaching and Grasping

    Authors: Qijin She, Shishun Zhang, Yunfan Ye, Ruizhen Hu, Kai Xu

    Abstract: Reaching-and-grasping is a fundamental skill for robotic manipulation, but existing methods usually train models on a specific gripper and cannot be reused on another gripper. In this paper, we propose a novel method that can learn a unified policy model that can be easily transferred to different dexterous grippers. Our method consists of two stages: a gripper-agnostic policy model that predicts… ▽ More

    Submitted 15 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ECCV 2024

  28. Proceedings 15th Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software

    Authors: Diana Costa, Raymond Hu

    Abstract: This volume contains the proceedings of PLACES 2024, the 15th edition of the Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software. The PLACES workshop series offers a forum for researchers from different fields to exchange new ideas about the challenges of modern and future programming, where concurrency and distribution are the norm rather than a marginal… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Journal ref: EPTCS 401, 2024

  29. arXiv:2403.16830  [pdf, other

    cs.NI eess.SP

    Exploring Communication Technologies, Standards, and Challenges in Electrified Vehicle Charging

    Authors: Xiang Ma, Yuan Zhou, Hanwen Zhang, Qun Wang, Haijian Sun, Hongjie Wang, Rose Qingyang Hu

    Abstract: As public awareness of environmental protection continues to grow, the trend of integrating more electric vehicles (EVs) into the transportation sector is rising. Unlike conventional internal combustion engine (ICE) vehicles, EVs can minimize carbon emissions and potentially achieve autonomous driving. However, several obstacles hinder the widespread adoption of EVs, such as their constrained driv… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: submitted to IET Communication as a survey paper

  30. arXiv:2403.16477  [pdf, other

    cs.IT eess.SP

    Safeguarding Next Generation Multiple Access Using Physical Layer Security Techniques: A Tutorial

    Authors: Lu Lv, Dongyang Xu, Rose Qingyang Hu, Yinghui Ye, Long Yang, Xianfu Lei, Xianbin Wang, Dong In Kim, Arumugam Nallanathan

    Abstract: Driven by the ever-increasing requirements of ultra-high spectral efficiency, ultra-low latency, and massive connectivity, the forefront of wireless research calls for the design of advanced next generation multiple access schemes to facilitate provisioning of these stringent demands. This inspires the embrace of non-orthogonal multiple access (NOMA) in future wireless communication networks. Neve… ▽ More

    Submitted 21 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Invited paper by Proceedings of the IEEE

  31. arXiv:2403.15612  [pdf, other

    cs.CV

    InterFusion: Text-Driven Generation of 3D Human-Object Interaction

    Authors: Sisi Dai, Wenhao Li, Haowen Sun, Haibin Huang, Chongyang Ma, Hui Huang, Kai Xu, Ruizhen Hu

    Abstract: In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with c… ▽ More

    Submitted 16 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: ECCV 2024; Project page https://sisidai.github.io/InterFusion/; Code available at https://github.com/sisidai/InterFusion

  32. arXiv:2403.14354  [pdf, other

    cs.CV

    LDTR: Transformer-based Lane Detection with Anchor-chain Representation

    Authors: Zhongyu Yang, Chen Shen, Wei Shao, Tengfei Xing, Runbo Hu, Pengfei Xu, Hua Chai, Ruini Xue

    Abstract: Despite recent advances in lane detection methods, scenarios with limited- or no-visual-clue of lanes due to factors such as lighting conditions and occlusion remain challenging and crucial for automated driving. Moreover, current lane representations require complex post-processing and struggle with specific instances. Inspired by the DETR architecture, we propose LDTR, a transformer-based model… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by CVM 2024 and CVMJ. 16 pages, 14 figures

  33. arXiv:2403.11873  [pdf, other

    cs.CL

    CO3: Low-resource Contrastive Co-training for Generative Conversational Query Rewrite

    Authors: Yifei Yuan, Chen Shi, Runze Wang, Liyi Chen, Renjun Hu, Zengming Zhang, Feijun Jiang, Wai Lam

    Abstract: Generative query rewrite generates reconstructed query rewrites using the conversation history while rely heavily on gold rewrite pairs that are expensive to obtain. Recently, few-shot learning is gaining increasing popularity for this task, whereas these methods are sensitive to the inherent noise due to limited data size. Besides, both attempts face performance degradation when there exists lang… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to COLING 2024

  34. arXiv:2403.11780  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

    Authors: Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

    Abstract: Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only… ▽ More

    Submitted 9 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL 2024 (main conference)

  35. arXiv:2403.11529  [pdf, other

    cs.CV

    Video Object Segmentation with Dynamic Query Modulation

    Authors: Hantao Zhou, Runze Hu, Xiu Li

    Abstract: Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS). However, these methods face two key limitations: 1) relying on non-local pixel-level matching to read memory, resulting in noisy retrieved features for segmentation; 2) segmenting ea… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME2024

  36. arXiv:2403.08191  [pdf, other

    cs.RO

    Synchronized Dual-arm Rearrangement via Cooperative mTSP

    Authors: Wenhao Li, Shishun Zhang, Sisi Dai, Hui Huang, Ruizhen Hu, Xiaohong Chen, Kai Xu

    Abstract: Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and util… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  37. arXiv:2403.06069  [pdf, other

    eess.IV cs.CV cs.LG

    Implicit Image-to-Image Schrodinger Bridge for CT Super-Resolution and Denoising

    Authors: Yuang Wang, Siyeop Yoon, Pengfei Jin, Matthew Tivnan, Zhennong Chen, Rui Hu, Li Zhang, Zhiqiang Chen, Quanzheng Li, Dufan Wu

    Abstract: Conditional diffusion models have gained recognition for their effectiveness in image restoration tasks, yet their iterative denoising process, starting from Gaussian noise, often leads to slow inference speeds. As a promising alternative, the Image-to-Image Schrödinger Bridge (I2SB) initializes the generative process from corrupted images and integrates training techniques from conditional diffus… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  38. arXiv:2402.13634  [pdf, other

    cs.RO cs.LG

    Learning Dual-arm Object Rearrangement for Cartesian Robots

    Authors: Shishun Zhang, Qijin She, Wenhao Li, Chenyang Zhu, Yongjun Wang, Ruizhen Hu, Kai Xu

    Abstract: This work focuses on the dual-arm object rearrangement problem abstracted from a realistic industrial scenario of Cartesian robots. The goal of this problem is to transfer all the objects from sources to targets with the minimum total completion time. To achieve the goal, the core idea is to develop an effective object-to-arm task assignment strategy for minimizing the cumulative task execution ti… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 7 pages, 9 figures, conference

  39. arXiv:2402.12688  [pdf, other

    cs.CR

    Robust-Wide: Robust Watermarking against Instruction-driven Image Editing

    Authors: Runyi Hu, Jie Zhang, Ting Xu, Jiwei Li, Tianwei Zhang

    Abstract: Instruction-driven image editing allows users to quickly edit an image according to text instructions in a forward pass. Nevertheless, malicious users can easily exploit this technique to create fake images, which could cause a crisis of trust and harm the rights of the original image owners. Watermarking is a common solution to trace such malicious behavior. Unfortunately, instruction-driven imag… ▽ More

    Submitted 16 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: European Conference on Computer Vision (ECCV) 2024

  40. arXiv:2402.08147  [pdf, other

    cs.SE cs.AI cs.LG cs.LO cs.PL

    VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search

    Authors: David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe Loughridge, Federico Cassano, Sabrina Ruixin Hu, Jianang Yang, William E. Byrd, Robert Zinkov, Nada Amin

    Abstract: Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound. In this paper, we present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq. VerMCTS uses a logical verifier in concert with an LLM to guide a modified Monte Carlo Tree Search (MCTS). This approach leverages the verifier to gain i… ▽ More

    Submitted 24 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  41. arXiv:2402.02334  [pdf, other

    cs.LG cs.AI

    Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

    Authors: Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin

    Abstract: Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetic… ▽ More

    Submitted 19 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: 11 pages, 8 figures, to be published to AAAI2024

    ACM Class: I.2.4

  42. arXiv:2401.13169  [pdf, other

    cs.CR cs.SE

    ReposVul: A Repository-Level High-Quality Vulnerability Dataset

    Authors: Xinchen Wang, Ruida Hu, Cuiyun Gao, Xin-Cheng Wen, Yujia Chen, Qing Liao

    Abstract: Open-Source Software (OSS) vulnerabilities bring great challenges to the software security and pose potential risks to our society. Enormous efforts have been devoted into automated vulnerability detection, among which deep learning (DL)-based approaches have proven to be the most effective. However, the current labeled data present the following limitations: (1) Tangled Patches: Developers may su… ▽ More

    Submitted 8 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted by ICSE 2024 Industry Challenge Track

  43. arXiv:2401.13020  [pdf, other

    eess.SY cs.LG

    A Safe Reinforcement Learning Algorithm for Supervisory Control of Power Plants

    Authors: Yixuan Sun, Sami Khairy, Richard B. Vilim, Rui Hu, Akshay J. Dave

    Abstract: Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-e… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  44. arXiv:2401.11949  [pdf, other

    cs.CV

    Feature Denoising Diffusion Model for Blind Image Quality Assessment

    Authors: Xudong Li, Jingyuan Zheng, Runze Hu, Yan Zhang, Ke Li, Yunhang Shen, Xiawu Zheng, Yutao Liu, ShengChuan Zhang, Pingyang Dai, Rongrong Ji

    Abstract: Blind Image Quality Assessment (BIQA) aims to evaluate image quality in line with human perception, without reference benchmarks. Currently, deep learning BIQA methods typically depend on using features from high-level tasks for transfer learning. However, the inherent differences between BIQA and these high-level tasks inevitably introduce noise into the quality-aware features. In this paper, we… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  45. arXiv:2401.11767  [pdf, other

    cs.CV

    Concealed Object Segmentation with Hierarchical Coherence Modeling

    Authors: Fengyang Xiao, Pan Zhang, Chunming He, Runze Hu, Yutao Liu

    Abstract: Concealed object segmentation (COS) is a challenging task that involves localizing and segmenting those concealed objects that are visually blended with their surrounding environments. Despite achieving remarkable success, existing COS segmenters still struggle to achieve complete segmentation results in extremely concealed scenarios. In this paper, we propose a Hierarchical Coherence Modeling (HC… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted to CICAI 2023. 13 pages, 6 figures, 4 tables

  46. arXiv:2401.09833  [pdf, other

    eess.IV cs.AI cs.CV

    Slicer Networks

    Authors: Hang Zhang, Xiang Chen, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li

    Abstract: In medical imaging, scans often reveal objects with varied contrasts but consistent internal intensities or textures. This characteristic enables the use of low-frequency approximations for tasks such as segmentation and deformation field estimation. Yet, integrating this concept into neural network architectures for medical image analysis remains underexplored. In this paper, we propose the Slice… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 8 figures and 3 tables

  47. arXiv:2401.08023  [pdf, other

    cs.IT cs.CV

    Spatial Channel State Information Prediction with Generative AI: Towards Holographic Communication and Digital Radio Twin

    Authors: Lihao Zhang, Haijian Sun, Yong Zeng, Rose Qingyang Hu

    Abstract: As 5G technology becomes increasingly established, the anticipation for 6G is growing, which promises to deliver faster and more reliable wireless connections via cutting-edge radio technologies. However, efficient management method of the large-scale antenna arrays deployed by those radio technologies is crucial. Traditional management methods are mainly reactive, usually based on feedback from u… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: submitted to IEEE for potential publication

  48. arXiv:2401.05646  [pdf, other

    cs.CV

    Masked Attribute Description Embedding for Cloth-Changing Person Re-identification

    Authors: Chunlei Peng, Boyu Wang, Decheng Liu, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Cloth-changing person re-identification (CC-ReID) aims to match persons who change clothes over long periods. The key challenge in CC-ReID is to extract clothing-independent features, such as face, hairstyle, body shape, and gait. Current research mainly focuses on modeling body shape using multi-modal biological features (such as silhouettes and sketches). However, it does not fully leverage the… ▽ More

    Submitted 2 July, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  49. arXiv:2401.05458  [pdf, other

    cs.LG cs.AI

    CoLafier: Collaborative Noisy Label Purifier With Local Intrinsic Dimensionality Guidance

    Authors: Dongyu Zhang, Ruofan Hu, Elke Rundensteiner

    Abstract: Deep neural networks (DNNs) have advanced many machine learning tasks, but their performance is often harmed by noisy labels in real-world data. Addressing this, we introduce CoLafier, a novel approach that uses Local Intrinsic Dimensionality (LID) for learning with noisy labels. CoLafier consists of two subnets: LID-dis and LID-gen. LID-dis is a specialized classifier. Trained with our uniquely c… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: This work is accepted by SIAM International Conference on Data Mining (SDM24)

  50. arXiv:2401.01571  [pdf, other

    cs.SE cs.PL

    CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

    Authors: Xiaoheng Xie, Gang Fan, Xiaojun Lin, Ang Zhou, Shijie Li, Xunjin Zheng, Yinan Liang, Yu Zhang, Na Yu, Haokun Li, Xinyu Chen, Yingzhuang Chen, Yi Zhen, Dejun Dong, Xianjin Fu, Jinzhou Su, Fuxiong Pan, Pengshuai Luo, Youzheng Feng, Ruoxiang Hu, Jing Fan, Jinguo Zhou, Xiao Xiao, Peng Di

    Abstract: In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data compu… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.