Zum Hauptinhalt springen

Showing 1–50 of 261 results for author: Peng, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16303  [pdf, other

    eess.IV cs.CV

    Enhanced Control for Diffusion Bridge in Image Restoration

    Authors: Conghan Yue, Zhengwei Peng, Junlong Ma, Dongyu Zhang

    Abstract: Image restoration refers to the process of restoring a damaged low-quality image back to its corresponding high-quality image. Typically, we use convolutional neural networks to directly learn the mapping from low-quality images to high-quality images achieving image restoration. Recently, a special type of diffusion bridge model has achieved more advanced results in image restoration. It can tran… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  2. arXiv:2408.14144  [pdf, other

    cs.LG cs.DC

    Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness

    Authors: Boyuan Li, Zihao Peng, Yafei Li, Mingliang Xu, Shengbo Chen, Baofeng Ji, Cong Shen

    Abstract: Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techni… ▽ More

    Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  3. arXiv:2408.12880  [pdf, other

    cs.AI

    Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey

    Authors: Qika Lin, Yifan Zhu, Xin Mei, Ling Huang, Jingying Ma, Kai He, Zhen Peng, Erik Cambria, Mengling Feng

    Abstract: The rapid development of artificial intelligence has constantly reshaped the field of intelligent healthcare and medicine. As a vital technology, multimodal learning has increasingly garnered interest due to data complementarity, comprehensive modeling form, and great application potential. Currently, numerous researchers are dedicating their attention to this field, conducting extensive studies a… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 21 pages, 6 figures

  4. arXiv:2408.11744  [pdf

    cs.AI cs.CV

    JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet

    Authors: Yujia Gu, Haofeng Li, Xinyu Fang, Zihan Peng, Yinan Peng

    Abstract: This study proposes a novel approach to extract stylistic features of Jiehua: the utilization of the Fine-tuned Stable Diffusion Model with ControlNet (FSDMC) to refine depiction techniques from artists' Jiehua. The training data for FSDMC is based on the opensource Jiehua artist's work collected from the Internet, which were subsequently manually constructed in the format of (Original Image, Cann… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: accepted by ICCSMT 2024

  5. arXiv:2408.09357  [pdf, other

    cs.GR cs.AI cs.SD eess.AS

    Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

    Authors: Xukun Zhou, Fengxin Li, Ziqiao Peng, Kejian Wu, Jun He, Biao Qin, Zhaoxin Fan, Hongyan Liu

    Abstract: Audio-driven 3D face animation is increasingly vital in live streaming and augmented reality applications. While remarkable progress has been observed, most existing approaches are designed for specific individuals with predefined speaking styles, thus neglecting the adaptability to varied speaking styles. To address this limitation, this paper introduces MetaFace, a novel methodology meticulously… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2408.08561  [pdf

    cs.CV

    A New Chinese Landscape Paintings Generation Model based on Stable Diffusion using DreamBooth

    Authors: Yujia Gu, Xinyu Fang, Xueyuan Deng, Zihan Peng, Yinan Peng

    Abstract: This study mainly introduces a method combining the Stable Diffusion Model (SDM) and Parameter-Efficient Fine-Tuning method for generating Chinese Landscape Paintings. This training process is accelerated by combining LoRA with pre-trained SDM and DreamBooth with pre-trained SDM, respectively. On the Chinese Landscape Paintings Internet dataset used in this paper, this study finds that SDM combine… ▽ More

    Submitted 22 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: accepted by AHPCAI

  7. arXiv:2408.08444  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering

    Authors: Jinming Nian, Zhiyuan Peng, Qifan Wang, Yi Fang

    Abstract: In knowledge-intensive tasks such as open-domain question answering (OpenQA), Large Language Models (LLMs) often struggle to generate factual answers relying solely on their internal (parametric) knowledge. To address this limitation, Retrieval-Augmented Generation (RAG) systems enhance LLMs by retrieving relevant information from external sources, thereby positioning the retriever as a pivotal co… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.04325  [pdf, other

    eess.AS cs.CL

    HydraFormer: One Encoder For All Subsampling Rates

    Authors: Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang

    Abstract: In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequently increasing associated costs. To address this issue, we propose HydraFormer, comprising HydraSub, a Conformer-based encoder, and a BiTransformer-… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: accepted by ICME 2024

  9. arXiv:2408.03191  [pdf, other

    cs.RO

    Integrated Intention Prediction and Decision-Making with Spectrum Attention Net and Proximal Policy Optimization

    Authors: Xiao Zhou, Chengzhen Meng, Wenru Liu, Zengqi Peng, Ming Liu, Jun Ma

    Abstract: For autonomous driving in highly dynamic environments, it is anticipated to predict the future behaviors of surrounding vehicles (SVs) and make safe and effective decisions. However, modeling the inherent coupling effect between the prediction and decision-making modules has been a long-standing challenge, especially when there is a need to maintain appropriate computational efficiency. To tackle… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  10. LessonPlanner: Assisting Novice Teachers to Prepare Pedagogy-Driven Lesson Plans with Large Language Models

    Authors: Haoxiang Fan, Guanzheng Chen, Xingbo Wang, Zhenhui Peng

    Abstract: Preparing a lesson plan, e.g., a detailed road map with strategies and materials for instructing a 90-minute class, is beneficial yet challenging for novice teachers. Large language models (LLMs) can ease this process by generating adaptive content for lesson plans, which would otherwise require teachers to create from scratch or search existing resources. In this work, we first conduct a formativ… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 20 pages

  11. arXiv:2407.20053  [pdf, other

    cs.LG physics.ao-ph

    Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models

    Authors: Zhe Li, Ronghui Xu, Jilin Hu, Zhong Peng, Xi Lu, Chenjuan Guo, Bin Yang

    Abstract: Significant wave height (SWH) is a vital metric in marine science, and accurate SWH estimation is crucial for various applications, e.g., marine energy development, fishery, early warning systems for potential risks, etc. Traditional SWH estimation methods that are based on numerical models and physical theories are hindered by computational inefficiencies. Recently, machine learning has emerged a… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  12. arXiv:2407.18487  [pdf, other

    cs.CV

    SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks

    Authors: Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng

    Abstract: Infrared ship detection (IRSD) has received increasing attention in recent years due to the robustness of infrared images to adverse weather. However, a large number of false alarms may occur in complex scenes. To address these challenges, we propose the Scene Semantic Prior-Assisted Multi-Task Perception Network (SMPISD-MTPNet), which includes three stages: scene semantic extraction, deep feature… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  13. arXiv:2407.18064  [pdf, other

    cs.HC

    ComPeer: A Generative Conversational Agent for Proactive Peer Support

    Authors: Tianjian Liu, Hongzheng Zhao, Yuheng Liu, Xingbo Wang, Zhenhui Peng

    Abstract: Conversational Agents (CAs) acting as peer supporters have been widely studied and demonstrated beneficial for people's mental health. However, previous peer support CAs either are user-initiated or follow predefined rules to initiate the conversations, which may discourage users to engage and build relationships with the CAs for long-term benefits. In this paper, we develop ComPeer, a generative… ▽ More

    Submitted 5 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: To appear at the 2024 ACM Symposium on User Interface Software and Technology (UIST); 22 pages (7 figures, 7 tables)

  14. arXiv:2407.16397  [pdf, other

    cs.LG cs.AI

    On ADMM in Heterogeneous Federated Learning: Personalization, Robustness, and Fairness

    Authors: Shengkun Zhu, Jinshan Zeng, Sheng Wang, Yuan Sun, Xiaodong Li, Yuan Yao, Zhiyong Peng

    Abstract: Statistical heterogeneity is a root cause of tension among accuracy, fairness, and robustness of federated learning (FL), and is key in paving a path forward. Personalized FL (PFL) is an approach that aims to reduce the impact of statistical heterogeneity by developing personalized models for individual users, while also inherently providing benefits in terms of fairness and robustness. However, e… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.06756

  15. Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering

    Authors: Jiahao Cui, Wei Jiang, Zhan Peng, Zhiyu Pan, Zhiguo Cao

    Abstract: High dynamic range (HDR) video rendering from low dynamic range (LDR) videos where frames are of alternate exposure encounters significant challenges, due to the exposure change and absence at each time stamp. The exposure change and absence make existing methods generate flickering HDR results. In this paper, we propose a novel paradigm to render HDR frames via completing the absent exposure info… ▽ More

    Submitted 4 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages, 6 figures, accepted by ACM-MM 2024 (poster)

  16. arXiv:2407.12051  [pdf, other

    q-bio.GN cs.AI cs.LG

    Dy-mer: An Explainable DNA Sequence Representation Scheme using Sparse Recovery

    Authors: Zhiyuan Peng, Yuanbo Tang, Yang Li

    Abstract: DNA sequences encode vital genetic and biological information, yet these unfixed-length sequences cannot serve as the input of common data mining algorithms. Hence, various representation schemes have been developed to transform DNA sequences into fixed-length numerical representations. However, these schemes face difficulties in learning high-quality representations due to the complexity and spar… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  17. arXiv:2407.05726  [pdf, other

    cs.CV eess.IV

    Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis

    Authors: Zirui Zhou, Junhao Liang, Zizhao Peng, Chao Fan, Fengwei An, Shiqi Yu

    Abstract: Scoliosis presents significant diagnostic challenges, particularly in adolescents, where early detection is crucial for effective treatment. Traditional diagnostic and follow-up methods, which rely on physical examinations and radiography, face limitations due to the need for clinical expertise and the risk of radiation exposure, thus restricting their use for widespread early screening. In respon… ▽ More

    Submitted 23 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  18. arXiv:2407.00286  [pdf, other

    cs.NI cs.LG

    Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks

    Authors: Zifan Zhang, Yuchen Liu, Zhiyuan Peng, Mingzhe Chen, Dongkuan Xu, Shuguang Cui

    Abstract: Optimizing edge caching is crucial for the advancement of next-generation (nextG) wireless networks, ensuring high-speed and low-latency services for mobile users. Existing data-driven optimization approaches often lack awareness of the distribution of random data variables and focus solely on optimizing cache hit rates, neglecting potential reliability concerns, such as base station overload and… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Journal on Selected Areas in Communications (JSAC)

  19. arXiv:2406.19195  [pdf, other

    cs.LG cs.AI

    Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights

    Authors: Zeqin Yang, Weilin Chen, Ruichu Cai, Yuguang Yan, Zhifeng Hao, Zhipeng Yu, Zhichao Zou, Zhen Peng, Jiecheng Guo

    Abstract: Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  20. arXiv:2406.16907  [pdf, other

    eess.SP cs.LG

    RayProNet: A Neural Point Field Framework for Radio Propagation Modeling in 3D Environments

    Authors: Ge Cao, Zhen Peng

    Abstract: The radio wave propagation channel is central to the performance of wireless communication systems. In this paper, we introduce a novel machine learning-empowered methodology for wireless channel modeling. The key ingredients include a point-cloud-based neural network and a Spherical Harmonics encoder with light probes. Our approach offers several significant advantages, including the flexibility… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2406.16866  [pdf, other

    cs.CV

    Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

    Authors: Jierun Chen, Fangyun Wei, Jinjing Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S. -H. Gary Chan, Hongyang Zhang

    Abstract: Referring expression comprehension (REC) involves localizing a target instance based on a textual description. Recent advancements in REC have been driven by large multimodal models (LMMs) like CogVLM, which achieved 92.44% accuracy on RefCOCO. However, this study questions whether existing benchmarks such as RefCOCO, RefCOCO+, and RefCOCOg, capture LMMs' comprehensive capabilities. We begin with… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  22. arXiv:2406.16557  [pdf, other

    cs.LG cs.CY

    Efficient k-means with Individual Fairness via Exponential Tilting

    Authors: Shengkun Zhu, Jinshan Zeng, Yuan Sun, Sheng Wang, Xiaodong Li, Zhiyong Peng

    Abstract: In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve indivi… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  23. arXiv:2406.14880  [pdf, other

    cs.LG cs.LO

    Pathformer: Recursive Path Query Encoding for Complex Logical Query Answering

    Authors: Chongzhi Zhang, Zhiping Peng, Junhao Zheng, Linghao Wang, Ruifeng Shi, Qianli Ma

    Abstract: Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE

  24. arXiv:2406.09386  [pdf, other

    cs.CV

    SimGen: Simulator-conditioned Driving Scene Generation

    Authors: Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou

    Abstract: Controllable synthetic data generation can substantially lower the annotation cost of training data in autonomous driving research and development. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. However, those models are trained on small-scale datasets like nuScenes, which lack appearance and layout diversity. Moreover, the trained models can only… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  25. arXiv:2406.08756  [pdf, other

    cs.DC cs.LG

    Optimizing Large Model Training through Overlapped Activation Recomputation

    Authors: Ping Chen, Wenjie Zhang, Shuibing He, Yingjie Gu, Zhuwei Peng, Kexin Huang, Xuan Zhan, Weijian Chen, Yi Zheng, Zhefeng Wang, Yanlong Yin, Gang Chen

    Abstract: Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a ne… ▽ More

    Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 13 pages

  26. arXiv:2406.07539  [pdf, other

    cs.RO

    BAKU: An Efficient Transformer for Multi-Task Policy Learning

    Authors: Siddhant Haldar, Zhuoran Peng, Lerrel Pinto

    Abstract: Training generalist agents capable of solving diverse tasks is challenging, often requiring large datasets of expert demonstrations. This is particularly problematic in robotics, where each data point requires physical execution of actions in the real world. Thus, there is a pressing need for architectures that can effectively leverage the available training data. In this work, we present BAKU, a… ▽ More

    Submitted 16 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.04035  [pdf, other

    cs.LG cs.AI

    STEMO: Early Spatio-temporal Forecasting with Multi-Objective Reinforcement Learning

    Authors: Wei Shao, Yufan Kang, Ziyan Peng, Xiao Xiao, Lei Wang, Yuhui Yang, Flora D Salim

    Abstract: Accuracy and timeliness are indeed often conflicting goals in prediction tasks. Premature predictions may yield a higher rate of false alarms, whereas delaying predictions to gather more information can render them too late to be useful. In applications such as wildfires, crimes, and traffic jams, timely forecasting are vital for safeguarding human life and property. Consequently, finding a balanc… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted paper in KDD 2024

  28. arXiv:2405.20654  [pdf, other

    cs.CL cs.IR

    Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

    Authors: Xuyang Wu, Zhiyuan Peng, Krishna Sravanthi Rajanala Sai, Hsin-Tai Wu, Yi Fang

    Abstract: Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-… ▽ More

    Submitted 20 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted at Gen-IR@SIGIR24

  29. arXiv:2405.20589  [pdf, other

    cs.LG cs.AI cs.DC

    Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

    Authors: Zheng Wang, Zheng Wang, Zhaopeng Peng, Zihui Wang, Cheng Wang

    Abstract: Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  30. arXiv:2405.18840  [pdf, other

    cs.CV

    Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

    Authors: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Lingxi Xie, Qi Tian, Wei Shen

    Abstract: Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  31. arXiv:2405.18291  [pdf, other

    cs.LG cs.AI cs.DC

    FedSAC: Dynamic Submodel Allocation for Collaborative Fairness in Federated Learning

    Authors: Zihui Wang, Zheng Wang, Lingjuan Lyu, Zhaopeng Peng, Zhicheng Yang, Chenglu Wen, Rongshan Yu, Cheng Wang, Xiaoliang Fan

    Abstract: Collaborative fairness stands as an essential element in federated learning to encourage client participation by equitably distributing rewards based on individual contributions. Existing methods primarily focus on adjusting gradient allocations among clients to achieve collaborative fairness. However, they frequently overlook crucial factors such as maintaining consistency across local models and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD'24

  32. arXiv:2405.17891  [pdf, other

    cs.CV

    A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction

    Authors: Bin Zhang, Bi Zeng, Zexin Peng

    Abstract: In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering qual… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  33. RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

    Authors: Qiaoyi Chen, Siyu Liu, Kaihui Huang, Xingbo Wang, Xiaojuan Ma, Junkai Zhu, Zhenhui Peng

    Abstract: Reading and repeatedly retelling a short story is a common and effective approach to learning the meanings and usages of target words. However, learners often struggle with comprehending, recalling, and retelling the story contexts of these target words. Inspired by the Cognitive Theory of Multimedia Learning, we propose a computational workflow to generate relevant images paired with stories. Bas… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  34. arXiv:2405.03300  [pdf, other

    cs.IT eess.SP

    Active RIS-Aided Massive MIMO With Imperfect CSI and Phase Noise

    Authors: Zhangjie Peng, Jianchen Zhu, Cunhua Pan, Zaichen Zhang, Daniel Benevides da Costa, Maged Elkashlan, George K. Karagiannidis

    Abstract: Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system i… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  35. arXiv:2405.02973  [pdf, other

    cs.CR

    FairRelay: Fair and Cost-Efficient Peer-to-Peer Content Delivery through Payment Channel Networks

    Authors: Jingyu Liu, Yingjie Xue, Zifan Peng, Chao Lin, Xinyi Huang

    Abstract: Peer-to-Peer (P2P) content delivery, known for scalability and resilience, offers a decentralized alternative to traditional centralized Content Delivery Networks (CDNs). A significant challenge in P2P content delivery remains: the fair compensation of relayers for their bandwidth contributions. Existing solutions employ blockchains for payment settlements, however, they are not practical due to h… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 27 pages, 21 figures

  36. X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD

    Authors: Zhexi Peng, Yin Yang, Tianjia Shao, Chenfanfu Jiang, Kun Zhou

    Abstract: We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters throug… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: To be published in ACM SIGGRAPH 2024

  37. RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting

    Authors: Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou

    Abstract: We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant col… ▽ More

    Submitted 8 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: To be published in ACM SIGGRAPH 2024

  38. arXiv:2404.18213  [pdf, other

    cs.CV cs.AI

    S$^2$Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification

    Authors: Guanchun Wang, Xiangrong Zhang, Zelin Peng, Tianyang Zhang, Licheng Jiao

    Abstract: Land cover analysis using hyperspectral images (HSI) remains an open problem due to their low spatial resolution and complex spectral information. Recent studies are primarily dedicated to designing Transformer-based architectures for spatial-spectral long-range dependencies modeling, which is computationally expensive with quadratic complexity. Selective structured state space model (Mamba), whic… ▽ More

    Submitted 13 August, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 12 pages, 7 figures

  39. arXiv:2404.17528  [pdf, other

    cs.CV

    Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

    Authors: Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao

    Abstract: Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://gefucvpr24.github.io

  40. arXiv:2404.16407  [pdf, other

    cs.CL eess.AS

    U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

    Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  41. arXiv:2404.13875  [pdf, ps, other

    cs.IT eess.SP

    Active RIS-Aided Massive MIMO Uplink Systems with Low-Resolution ADCs

    Authors: Zhangjie Peng, Zecheng Lu, Xue Liu, Cunhua Pan, Jiangzhou Wang

    Abstract: This letter considers an active reconfigurable intelligent surface (RIS)-aided multi-user uplink massive multipleinput multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs). The letter derives the closedform approximate expression for the sum achievable rate (AR), where the maximum ratio combination (MRC) processing and low-resolution ADCs are applied at the base st… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  42. arXiv:2404.12887  [pdf, other

    cs.CV eess.IV

    3D Multi-frame Fusion for Video Stabilization

    Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao

    Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  43. arXiv:2404.11536  [pdf, other

    cs.LG cs.AI

    FedPFT: Federated Proxy Fine-Tuning of Foundation Models

    Authors: Zhaopeng Peng, Xiaoliang Fan, Yufan Chen, Zheng Wang, Shirui Pan, Chenglu Wen, Ruisheng Zhang, Cheng Wang

    Abstract: Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuni… ▽ More

    Submitted 28 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI'24

  44. arXiv:2404.07644  [pdf, other

    cs.RO

    2DLIW-SLAM:2D LiDAR-Inertial-Wheel Odometry with Real-Time Loop Closure

    Authors: Bin Zhang, Zexin Peng, Bi Zeng, Junjie Lu

    Abstract: Due to budgetary constraints, indoor navigation typically employs 2D LiDAR rather than 3D LiDAR. However, the utilization of 2D LiDAR in Simultaneous Localization And Mapping (SLAM) frequently encounters challenges related to motion degeneracy, particularly in geometrically similar environments. To address this problem, this paper proposes a robust, accurate, and multi-sensor-fused 2D LiDAR SLAM s… ▽ More

    Submitted 23 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by Measurement Science and Technology: https://iopscience.iop.org/article/10.1088/1361-6501/ad3ea3/meta

  45. arXiv:2404.04522  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

    Authors: Zhiyuan Peng, Xuyang Wu, Qifan Wang, Sravanthi Rajanala, Yi Fang

    Abstract: Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized in Large Language Models (LLMs) to improve the down-streaming tasks without the cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively use PEFT for fine-tuning LLMs in ranking tasks with convincing performance; there are some limitations, including the learned prompt being fixed for different doc… ▽ More

    Submitted 11 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

  46. arXiv:2404.02475  [pdf, other

    cs.HC

    PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts

    Authors: Tian Huang, Chun Yu, Weinan Shi, Zijian Peng, David Yang, Weiqi Sun, Yuanchun Shi

    Abstract: Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend v… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 34 pages

  47. arXiv:2403.16812  [pdf, other

    cs.HC cs.AI

    Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

    Authors: Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, Xiaojuan Ma

    Abstract: In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to p… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  48. arXiv:2403.16265  [pdf, other

    cs.CL

    Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs

    Authors: Zhuoyi Peng, Yi Yang

    Abstract: We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. As patent documents employ legal and highly technical language, existing semantic textual similarity methods that use localized contextual information do not perform satisfactorily in inferring patent phrase similarity. To address this, we introduce a graph-augmented approach to… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Findings of NAACL 2024

  49. arXiv:2403.13674  [pdf, other

    cs.RO

    Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections

    Authors: Zengqi Peng, Xiao Zhou, Lei Zheng, Yubin Wang, Jun Ma

    Abstract: In this work, we present a reward-driven automated curriculum reinforcement learning approach for interaction-aware self-driving at unsignalized intersections, taking into account the uncertainties associated with surrounding vehicles (SVs). These uncertainties encompass the uncertainty of SVs' driving intention and also the quantity of SVs. To deal with this problem, the curriculum set is specifi… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures

  50. arXiv:2403.09974  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

    Authors: Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, Ming-Ming Cheng

    Abstract: Given unlabelled datasets containing both old and new categories, generalized category discovery (GCD) aims to accurately discover new classes while correctly classifying old classes, leveraging the class concepts learned from labeled samples. Current GCD methods only use a single visual modality of information, resulting in poor classification of visually similar classes. As a different modality,… ▽ More

    Submitted 10 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.