Zum Hauptinhalt springen

Showing 1–50 of 181 results for author: Zhu, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14089  [pdf, other

    cs.IT eess.SP

    Mini-Slot-Assisted Short Packet URLLC:Differential or Coherent Detection?

    Authors: Canjian Zheng, Fu-Chun Zheng, Jingjing Luo, Pengcheng Zhu, Xiaohu You, Daquan Feng

    Abstract: One of the primary challenges in short packet ultra-reliable and low-latency communications (URLLC) is to achieve reliable channel estimation and data detection while minimizing the impact on latency performance. Given the small packet size in mini-slot-assisted URLLC, relying solely on pilot-based coherent detection is almost impossible to meet the seemingly contradictory requirements of high cha… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 14 pages, 8 figures, journal

  2. arXiv:2408.12247  [pdf, other

    cs.AI

    Enhanced Fine-Tuning of Lightweight Domain-Specific Q&A Model Based on Large Language Models

    Authors: Shenglin Zhang, Pengtian Zhu, Minghua Ma, Jiagang Wang, Yongqian Sun, Dongwen Li, Jingyu Wang, Qianying Guo, Xiaolei Hua, Lin Zhu, Dan Pei

    Abstract: Large language models (LLMs) excel at general question-answering (Q&A) but often fall short in specialized domains due to a lack of domain-specific knowledge. Commercial companies face the dual challenges of privacy protection and resource constraints when involving LLMs for fine-tuning. This paper propose a novel framework, Self-Evolution, designed to address these issues by leveraging lightweigh… ▽ More

    Submitted 22 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  3. arXiv:2408.11411  [pdf, other

    cs.CV

    SelfDRSC++: Self-Supervised Learning for Dual Reversed Rolling Shutter Correction

    Authors: Wei Shang, Dongwei Ren, Wanying Zhang, Qilong Wang, Pengfei Zhu, Wangmeng Zuo

    Abstract: Modern consumer cameras commonly employ the rolling shutter (RS) imaging mechanism, via which images are captured by scanning scenes row-by-row, resulting in RS distortion for dynamic scenes. To correct RS distortion, existing methods adopt a fully supervised learning manner that requires high framerate global shutter (GS) images as ground-truth for supervision. In this paper, we propose an enhanc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 13 pages, 9 figures, and the code is available at \url{https://github.com/shangwei5/SelfDRSC_plusplus}

    ACM Class: I.4.3

  4. arXiv:2408.10861  [pdf, other

    cs.RO cs.HC

    DVRP-MHSI: Dynamic Visualization Research Platform for Multimodal Human-Swarm Interaction

    Authors: Pengming Zhu, Zhiwen Zeng, Weijia Yao, Wei Dai, Huimin Lu, Zongtan Zhou

    Abstract: In recent years, there has been a significant amount of research on algorithms and control methods for distributed collaborative robots. However, the emergence of collective behavior in a swarm is still difficult to predict and control. Nevertheless, human interaction with the swarm helps render the swarm more predictable and controllable, as human operators can utilize intuition or knowledge that… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.10581  [pdf, other

    cs.CV

    Multi-view Hand Reconstruction with a Point-Embedded Transformer

    Authors: Lixin Yang, Licheng Zhong, Pengxiang Zhu, Xinyu Zhan, Junxiao Kong, Jian Xu, Cewu Lu

    Abstract: This work introduces a novel and generalizable multi-view Hand Mesh Reconstruction (HMR) model, named POEM, designed for practical use in real-world hand motion capture scenarios. The advances of the POEM model consist of two main aspects. First, concerning the modeling of the problem, we propose embedding a static basis point within the multi-view stereo space. A point represents a natural form o… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Generalizable multi-view Hand Mesh Reconstruction (HMR) model. Extension of the original work at CVPR2023

  6. arXiv:2408.10463  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded ac… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  7. arXiv:2408.09698  [pdf, other

    cs.IR cs.AI

    Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

    Authors: Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, Hui Xiong

    Abstract: Recent advances in Large Language Models (LLMs) have demonstrated significant potential in the field of Recommendation Systems (RSs). Most existing studies have focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown in multimodal recommendation systems tha… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  8. arXiv:2408.00952  [pdf, ps, other

    cs.IT eess.SP

    A Primer on Near-Field Communications for Next-Generation Multiple Access

    Authors: Chongjun Ouyang, Zhaolin Wang, Yan Chen, Xidong Mu, Peiying Zhu

    Abstract: Multiple-antenna technologies are advancing toward the development of extremely large aperture arrays and the utilization of extremely high frequencies, driving the progress of next-generation multiple access (NGMA). This evolution is accompanied by the emergence of near-field communications (NFC), characterized by spherical-wave propagation, which introduces additional range dimensions to the cha… ▽ More

    Submitted 8 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 34 pages

  9. arXiv:2407.20455  [pdf, other

    cs.CV

    Learning Feature-Preserving Portrait Editing from Generated Pairs

    Authors: Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo

    Abstract: Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  10. arXiv:2407.18879  [pdf, other

    cs.SD cs.LG eess.AS

    Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time f… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

  11. arXiv:2407.16840  [pdf, other

    eess.AS cs.AI

    Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments

    Authors: Pai Zhu, Dhruuv Agarwal, Jacob W. Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang

    Abstract: One of the challenges in developing a high quality custom keyword spotting (KWS) model is the lengthy and expensive process of collecting training data covering a wide range of languages, phrases and speaking styles. We introduce Synth4Kws - a framework to leverage Text to Speech (TTS) synthesized data for custom KWS in different resource settings. With no real data, we found increasing TTS phrase… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures, 2 tables The paper is accepted in Interspeech SynData4GenAI 2024 Workshop - https://syndata4genai.org/#call-for-papers

  12. arXiv:2407.05376  [pdf, other

    cs.RO

    Rethinking Closed-loop Planning Framework for Imitation-based Model Integrating Prediction and Planning

    Authors: Jiayu Guo, Mingyue Feng, Pengfei Zhu, Chengjun Li, Jian Pu

    Abstract: In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and pl… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 7 pages,5 figures

  13. arXiv:2406.12297  [pdf, other

    cs.LG cs.AI

    Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

    Authors: Ji Xu, Tianlong Xiao, Jinye Yang, Panpan Zhu

    Abstract: Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper presents a novel approach FaithPDP that takes advantages of both hardware (multi-core architecture of CPU) and modern programming language (Python or Matlab for efficient vector and matrix computation) to achieve clustering result identical to vanilla DP algorithm, while the computing complexity is reduced to pseudo-linear

  14. arXiv:2406.03751  [pdf, other

    cs.LG

    Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

    Authors: Yifan Hu, Peiyuan Liu, Peng Zhu, Dawei Cheng, Tao Dai

    Abstract: Transformer-based and MLP-based methods have emerged as leading approaches in time series forecasting (TSF). While Transformer-based methods excel in capturing long-range dependencies, they suffer from high computational complexities and tend to overfit. Conversely, MLP-based methods offer computational efficiency and adeptness in modeling temporal dynamics, but they struggle with capturing comple… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  15. Node Injection Attack Based on Label Propagation Against Graph Neural Network

    Authors: Peican Zhu, Zechen Pan, Keke Tang, Xiaodong Cui, Jinhuan Wang, Qi Xuan

    Abstract: Graph Neural Network (GNN) has achieved remarkable success in various graph learning tasks, such as node classification, link prediction and graph classification. The key to the success of GNN lies in its effective structure information representation through neighboring aggregation. However, the attacker can easily perturb the aggregation process through injecting fake nodes, which reveals that G… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by TCSS;DOI:10.1109/TCSS.2024.3395794

  16. arXiv:2405.11276  [pdf, other

    cs.CV

    Visible and Clear: Finding Tiny Objects in Difference Map

    Authors: Bing Cao, Haiyu Yao, Pengfei Zhu, Qinghua Hu

    Abstract: Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it… ▽ More

    Submitted 11 July, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted by ECCV 2024

  17. arXiv:2405.06241  [pdf, other

    cs.CV cs.RO

    MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization

    Authors: Pengcheng Zhu, Yaoming Zhuang, Baoquan Chen, Li Li, Chengdong Wu, Zhanlin Liu

    Abstract: This letter introduces a novel framework for dense Visual Simultaneous Localization and Mapping (VSLAM) based on Gaussian Splatting. Recently Gaussian Splatting-based SLAM has yielded promising results, but rely on RGB-D input and is weak in tracking. To address these limitations, we uniquely integrates advanced sparse visual odometry with a dense Gaussian Splatting scene representation for the fi… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  18. arXiv:2405.04000  [pdf, other

    cs.RO eess.SY

    Distributed Invariant Kalman Filter for Cooperative Localization using Matrix Lie Groups

    Authors: Yizhi Zhou, Yufan Liu, Pengxiang Zhu, Xuan Wang

    Abstract: This paper studies the problem of Cooperative Localization (CL) for multi-robot systems, where a group of mobile robots jointly localize themselves by using measurements from onboard sensors and shared information from other robots. We propose a novel distributed invariant Kalman Filter (DInEKF) based on the Lie group theory, to solve the CL problem in a 3-D environment. Unlike the standard EKF wh… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  19. arXiv:2404.19578  [pdf, ps, other

    cs.IT

    New EVENODD+ Codes with More Flexible Parameters and Lower Complexity

    Authors: Panyu Zhu

    Abstract: EVENODD+ codes are binary maximum distance separable (MDS) array codes for correcting double disk failures in RAID-6 with asymptotically optimal encoding/decoding/update complexities. However, the number of bits stored in each disk of EVENODD+ codes should be an odd number minus one. In this paper, we present a new construction of EVENODD+ codes that have more flexible parameters. The number of bi… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  20. arXiv:2404.19242  [pdf, other

    cs.CV eess.IV stat.ME

    A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

    Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

    Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

  21. arXiv:2404.15744  [pdf, other

    cs.LG cs.AI cs.CR

    A General Black-box Adversarial Attack on Graph-based Fake News Detectors

    Authors: Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang

    Abstract: Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024

  22. arXiv:2404.14109  [pdf, other

    cs.CV

    CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

    Authors: Wencheng Zhu, Xin Zhou, Pengfei Zhu, Yu Wang, Qinghua Hu

    Abstract: In this paper, we present a simple yet effective contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints. Unlike traditional knowledge distillation methods that concentrate on maximizing feature similarities or preserving class-wise semantic correlations between teacher and student features, our method attempt… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  23. arXiv:2404.09401  [pdf, other

    cs.CV cs.AI

    Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

    Authors: Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka

    Abstract: Diffusion Models (DMs) have shown remarkable capabilities in various image-generation tasks. However, there are growing concerns that DMs could be used to imitate unauthorized creations and thus raise copyright issues. To address this issue, we propose a novel framework that embeds personal watermarks in the generation of adversarial examples. Such examples can force DMs to generate images with vi… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: updated references

  24. arXiv:2404.08958  [pdf, other

    cs.CV cs.CL cs.LG

    AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

    Authors: Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introdu… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  25. arXiv:2404.07721  [pdf, other

    eess.SP cs.IT

    Trainable Joint Channel Estimation, Detection and Decoding for MIMO URLLC Systems

    Authors: Yi Sun, Hong Shen, Bingqing Li, Wei Xu, Pengcheng Zhu, Nan Hu, Chunming Zhao

    Abstract: The receiver design for multi-input multi-output (MIMO) ultra-reliable and low-latency communication (URLLC) systems can be a tough task due to the use of short channel codes and few pilot symbols. Consequently, error propagation can occur in traditional turbo receivers, leading to performance degradation. Moreover, the processing delay induced by information exchange between different modules may… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 17 pages, 12 figures, accepted by IEEE Transactions on Wireless Communications

  26. arXiv:2403.18923  [pdf, other

    cs.NE cs.AI cs.LG

    Nature-Guided Cognitive Evolution for Predicting Dissolved Oxygen Concentrations in North Temperate Lakes

    Authors: Runlong Yu, Robert Ladwig, Xiang Xu, Peijun Zhu, Paul C. Hanson, Yiqun Xie, Xiaowei Jia

    Abstract: Predicting dissolved oxygen (DO) concentrations in north temperate lakes requires a comprehensive study of phenological patterns across various ecosystems, which highlights the significance of selecting phenological features and feature interactions. Process-based models are limited by partial process knowledge or oversimplified feature representations, while machine learning models face challenge… ▽ More

    Submitted 15 February, 2024; originally announced March 2024.

  27. arXiv:2403.15765  [pdf, other

    cs.CV cs.AI cs.IR

    Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents

    Authors: Hao Wang, Tang Li, Chenhui Chu, Nengjun Zhu, Rui Wang, Pinpin Zhu

    Abstract: Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related t… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 13 pages, 7 figures, accepted by LERC-COLING2024

  28. arXiv:2403.12494  [pdf, other

    cs.CV

    Task-Customized Mixture of Adapters for General Image Fusion

    Authors: Pengfei Zhu, Yang Sun, Bing Cao, Qinghua Hu

    Abstract: General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusio… ▽ More

    Submitted 23 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  29. arXiv:2403.06687  [pdf, other

    cs.LG cs.CV

    Advancing Graph Neural Networks with HL-HGAT: A Hodge-Laplacian and Attention Mechanism Approach for Heterogeneous Graph-Structured Data

    Authors: Jinghan Huang, Qiufeng Chen, Yijun Bian, Pengli Zhu, Nanguang Chen, Moo K. Chung, Anqi Qiu

    Abstract: Graph neural networks (GNNs) have proven effective in capturing relationships among nodes in a graph. This study introduces a novel perspective by considering a graph as a simplicial complex, encompassing nodes, edges, triangles, and $k$-simplices, enabling the definition of graph-structured data on any $k$-simplices. Our contribution is the Hodge-Laplacian heterogeneous graph attention network (H… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  30. arXiv:2403.03346  [pdf, other

    cs.CV

    Enhancing Vision-Language Pre-training with Rich Supervisions

    Authors: Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto

    Abstract: We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localiza… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  31. arXiv:2403.00014  [pdf, other

    cs.SI cs.AI cs.LG

    GIN-SD: Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion

    Authors: Le Cheng, Peican Zhu, Keke Tang, Chao Gao, Zhen Wang

    Abstract: Source detection in graphs has demonstrated robust efficacy in the domain of rumor source identification. Although recent solutions have enhanced performance by leveraging deep neural networks, they often require complete user data. In this paper, we address a more challenging task, rumor source detection with incomplete user data, and propose a novel framework, i.e., Source Detection in Graphs wi… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: The paper is accepted by AAAI24

    Report number: Vol. 38, No. 1, 55-63

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 2024

  32. arXiv:2402.16699  [pdf, other

    cs.RO

    SwarmPRM: Probabilistic Roadmap Motion Planning for Large-Scale Swarm Robotic Systems

    Authors: Yunze Hu, Xuru Yang, Kangjie Zhou, Qinghang Liu, Kang Ding, Han Gao, Pingping Zhu, Chang Liu

    Abstract: Large-scale swarm robotic systems consisting of numerous cooperative agents show considerable promise for performing autonomous tasks across various sectors. Nonetheless, traditional motion planning approaches often face a trade-off between scalability and solution quality due to the exponential growth of the joint state space of robots. In response, this work proposes SwarmPRM, a hierarchical, sc… ▽ More

    Submitted 24 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Submitted to IROS 2024

  33. arXiv:2402.16690  [pdf, other

    cs.RO

    Risk-Aware Non-Myopic Motion Planner for Large-Scale Robotic Swarm Using CVaR Constraints

    Authors: Xuru Yang, Yunze Hu, Han Gao, Kang Ding, Zhaoyang Li, Pingping Zhu, Ying Sun, Chang Liu

    Abstract: Swarm robotics has garnered significant attention due to its ability to accomplish elaborate and synchronized tasks. Existing methodologies for motion planning of swarm robotic systems mainly encounter difficulties in scalability and safety guarantee. To address these limitations, we propose a Risk-aware swarm mOtion planner using conditional ValuE at Risk (ROVER) that systematically navigates lar… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: accepted to IROS 2024

  34. arXiv:2402.11091  [pdf, other

    cs.MA cs.RO

    A Novel Multivariate Skew-Normal Mixture Model and Its Application in Path-Planning for Very-Large-Scale Robotic Systems

    Authors: Pingping Zhu, Chang Liu, Peter Estephan

    Abstract: This paper addresses the path-planning challenge for very large-scale robotic systems (VLSR) operating in complex and cluttered environments. VLSR systems consist of numerous cooperative agents or robots working together autonomously. Traditionally, many approaches for VLSR systems are developed based on Gaussian mixture models (GMMs), where the GMMs represent agents' evolving spatial distribution… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: American Control Conference (ACC) 2024, July 10 - 12, 2024

  35. arXiv:2402.10586  [pdf, other

    cs.CL cs.AI

    Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs

    Authors: Zae Myung Kim, Kwang Hee Lee, Preston Zhu, Vipul Raheja, Dongyeop Kang

    Abstract: With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel metho… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 26 pages, accepted at ACL 2024 (Main)

  36. arXiv:2401.06595  [pdf, other

    cs.LG cs.AI

    Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

    Authors: Pengfei Zhu, Qian Wang, Yu Wang, Jialu Li, Qinghua Hu

    Abstract: Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes wh… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  37. arXiv:2401.06521  [pdf, other

    cs.CV

    Exploring Diverse Representations for Open Set Recognition

    Authors: Yu Wang, Junxian Mu, Pengfei Zhu, Qinghua Hu

    Abstract: Open set recognition (OSR) requires the model to classify samples that belong to closed sets while rejecting unknown samples during test. Currently, generative models often perform better than discriminative models in OSR, but recent studies show that generative models may be computationally infeasible or unstable on complex tasks. In this paper, we provide insights into OSR and find that learning… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: 9 pages, 4 figures. Accepted to AAAI 2024

  38. arXiv:2401.02916  [pdf, other

    cs.CV

    Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

    Authors: Yuxin Yang, Pengfei Zhu, Mengshi Qi, Huadong Ma

    Abstract: Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constr… ▽ More

    Submitted 8 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  39. arXiv:2312.16850  [pdf, other

    cs.SD eess.AS

    Accent-VITS:accent transfer for end-to-end TTS

    Authors: Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie

    Abstract: Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by NCMMSC2023

  40. arXiv:2312.16409  [pdf, other

    cs.LG cs.CV

    Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

    Authors: Yan Fan, Yu Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Continual learning (CL) has shown promising results and comparable performance to learning at once in a fully supervised manner. However, CL strategies typically require a large number of labeled samples, making their real-life deployment challenging. In this work, we focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown c… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  41. arXiv:2312.10611  [pdf, other

    cs.CV cs.AI

    Bi-directional Adapter for Multi-modal Tracking

    Authors: Bing Cao, Junliang Guo, Pengfei Zhu, Qinghua Hu

    Abstract: Due to the rapid development of computer vision, single-modal (RGB) object tracking has made significant progress in recent years. Considering the limitation of single imaging sensor, multi-modal images (RGB, Infrared, etc.) are introduced to compensate for this deficiency for all-weather object tracking in complex environments. However, as acquiring sufficient multi-modal tracking data is hard wh… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024. Code is available at https://github.com/SparkTempest/BAT

  42. How does spatial structure affect psychological restoration? A method based on Graph Neural Networks and Street View Imagery

    Authors: Haoran Ma, Yan Zhang, Pengyuan Liu, Fan Zhang, Pengyu Zhu

    Abstract: The Attention Restoration Theory (ART) presents a theoretical framework with four essential indicators (being away, extent, fascinating, and compatibility) for comprehending urban and natural restoration quality. However, previous studies relied on non-sequential data and non-spatial dependent methods, which overlooks the impact of spatial structure defined here as the positional relationships bet… ▽ More

    Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 33 pages, 7 figures, Under review

  43. arXiv:2311.08623  [pdf, other

    cs.CV cs.CL cs.LG

    DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

    Authors: Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha

    Abstract: Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding. To accelerate the inference, we propose an approach of performing Dynamic Early Exit on Decoder (DEED). We build a multi-exit encoder-decoder transformer model… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  44. arXiv:2311.05717  [pdf, other

    cs.RO

    PL-CVIO: Point-Line Cooperative Visual-Inertial Odometry

    Authors: Yanyu Zhang, Pengxiang Zhu, Wei Ren

    Abstract: Low-feature environments are one of the main Achilles' heels of geometric computer vision (CV) algorithms. In most human-built scenes often with low features, lines can be considered complements to points. In this paper, we present a multi-robot cooperative visual-inertial navigation system (VINS) using both point and line features. By utilizing the covariance intersection (CI) update within the m… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  45. arXiv:2311.03650  [pdf, other

    cs.CV

    Image Generation and Learning Strategy for Deep Document Forgery Detection

    Authors: Yamato Okamoto, Osada Genki, Iu Yahiro, Rintaro Hasegawa, Peifei Zhu, Hirokatsu Kataoka

    Abstract: In recent years, document processing has flourished and brought numerous benefits. However, there has been a significant rise in reported cases of forged document images. Specifically, recent advancements in deep neural network (DNN) methods for generative tasks may amplify the threat of document forgery. Traditional approaches for forged document images created by prevalent copy-move methods are… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  46. arXiv:2311.03419  [pdf, other

    eess.AS cs.LG cs.SD

    Personalizing Keyword Spotting with Speaker Information

    Authors: Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno

    Abstract: Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  47. arXiv:2311.02835  [pdf

    cs.CV

    Flexible Multi-Generator Model with Fused Spatiotemporal Graph for Trajectory Prediction

    Authors: Peiyuan Zhu, Fengxia Han, Hao Deng

    Abstract: Trajectory prediction plays a vital role in automotive radar systems, facilitating precise tracking and decision-making in autonomous driving. Generative adversarial networks with the ability to learn a distribution over future trajectories tend to predict out-of-distribution samples, which typically occurs when the distribution of forthcoming paths comprises a blend of various manifolds that may… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  48. arXiv:2310.14581  [pdf, other

    cs.CV cs.AI

    Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

    Authors: Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka

    Abstract: Large web crawl datasets have already played an important role in learning multimodal features with high generalization capabilities. However, there are still very limited studies investigating the details or improvements of data design. Recently, a DataComp challenge has been designed to propose the best training data with the fixed models. This paper presents our solution to both filtering track… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted at the ICCV 2023 Workshop on Towards the Next Generation of Computer Vision Datasets: DataComp Track

  49. arXiv:2310.00592  [pdf, other

    quant-ph cs.ET

    Nearest neighbor synthesis of CNOT circuits on general quantum architectures

    Authors: Xinyu Chen, Mingqiang Zhu, Xueyun Cheng, Pengcheng Zhu, Zhijin Guan

    Abstract: In recent years, quantum computing has entered the Noisy Intermediate-Scale Quantum (NISQ). However, NISQ devices have inherent limitations in terms of connectivity and hardware noise, necessitating the transformation of quantum logic circuits for correct execution on NISQ chips. The synthesis of CNOT circuits considering physical constraints can transform quantum algorithms into low-level quantum… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  50. arXiv:2309.15496  [pdf, other

    eess.AS cs.SD

    DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC… ▽ More

    Submitted 18 January, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP2024