Skip to main content

Showing 1–50 of 436 results for author: Xiao, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12229  [pdf, other

    eess.AS cs.AI eess.SP

    Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

    Authors: Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, including NVs. This paper introduces EmoCtrl-TTS, an emotion-controllable zero-shot TTS that can generate highly emotional speech with NVs for any speaker. Em… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: See https://aka.ms/emoctrl-tts for demo samples

  2. arXiv:2407.11084  [pdf, other

    eess.IV cs.CV

    A Survey of Distance-Based Vessel Trajectory Clustering: Data Pre-processing, Methodologies, Applications, and Experimental Evaluation

    Authors: Maohan Liang, Ryan Wen Liu, Ruobin Gao, Zhe Xiao, Xiaocai Zhang, Hua Wang

    Abstract: Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. I… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  3. arXiv:2407.10550  [pdf, other

    cs.CV

    Learning Natural Consistency Representation for Face Forgery Video Detection

    Authors: Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, Shiming Ge

    Abstract: Face Forgery videos have elicited critical social public concerns and various detectors have been proposed. However, fully-supervised detectors may lead to easily overfitting to specific forgery methods or videos, and existing self-supervised detectors are strict on auxiliary tasks, such as requiring audio or multi-modalities, leading to limited generalization and robustness. In this paper, we exa… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.05502  [pdf, other

    cs.CL cs.AI cs.IR

    Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

    Authors: Nikhil Sharma, Kenton Murray, Ziang Xiao

    Abstract: With Retrieval Augmented Generation (RAG), Large Language Models (LLMs) are playing a pivotal role in information search and are being adopted globally. Although the multilingual capability of LLMs offers new opportunities to bridge the language barrier, do these capabilities translate into real-life scenarios where linguistic divide and knowledge conflicts between multilingual sources are known o… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  5. arXiv:2407.04224  [pdf, other

    cs.RO

    PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots

    Authors: Zhiyuan Xiao, Xinyu Zhang, Xiang Zhou, Qingrui Zhang

    Abstract: Numerous locomotion controllers have been designed based on Reinforcement Learning (RL) to facilitate blind quadrupedal locomotion traversing challenging terrains. Nevertheless, locomotion control is still a challenging task for quadruped robots traversing diverse terrains amidst unforeseen disturbances. Recently, privileged learning has been employed to learn reliable and robust quadrupedal locom… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 8 pages, Accepted by IROS 2024

  6. arXiv:2407.04064  [pdf, other

    cs.RO

    Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

    Authors: Jiafan Zhuang, Zihao Xia, Gaofei Han, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie,… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  7. arXiv:2407.04056  [pdf, other

    cs.RO

    Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

    Authors: Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  8. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  9. arXiv:2407.02012  [pdf, other

    cs.HC cs.GR

    A Study on Activity Visualization for Smart Watches

    Authors: Zhouxuan Xia, Yu Liu, Fabiola Polidoro

    Abstract: This paper investigates the use of visualization to display activity data on smartwatches by surveying the data visual presentations proposed by 80 smartwatch models currently available on the Chinese e-commerce platform JD.com and, later, surveying the preferences of 41 users concerning these visualizations. The results show that despite radial bar charts are the most popular visualization for ac… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 2 pages, 3 figures, 2023 IEEE 16th Pacific Visualization Symposium (PacificVis)

  10. arXiv:2407.01967  [pdf, other

    cs.CV

    Unleash the Power of Local Representations for Few-Shot Classification

    Authors: Shi Tang, Guiming Luo, Xinchen Ye, Zhiyi Xia

    Abstract: Generalizing to novel classes unseen during training is a key challenge of few-shot classification. Recent metric-based methods try to address this by local representations. However, they are unable to take full advantage of them due to (i) improper supervision for pretraining the feature extractor, and (ii) lack of adaptability in the metric for handling various possible compositions of local fea… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  11. arXiv:2407.01902  [pdf, other

    cs.CR cs.AI cs.CL

    SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

    Authors: Yan Yang, Zeguan Xiao, Xin Lu, Hongru Wang, Hailiang Huang, Guanhua Chen, Yun Chen

    Abstract: The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SoP, a simple yet effective framework to design jailbreak prompts automatically. I… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  12. arXiv:2407.01085  [pdf, other

    cs.LG cs.CL

    Rethinking LLM-based Preference Evaluation

    Authors: Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, Jingang Wang, Zhenyu Chen, Jieyu Zhao, Hui Xiong

    Abstract: Recently, large language model (LLM)-based preference evaluation has been widely adopted to compare pairs of model responses. However, a severe bias towards lengthy responses has been observed, raising concerns about the reliability of this evaluation method. In this work, we designed a series of controlled experiments to study the major impacting factors of the metric of LLM-based preference eval… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  13. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.17745  [pdf, ps, other

    cs.IR cs.LG

    Light-weight End-to-End Graph Interest Network for CTR Prediction in E-commerce Search

    Authors: Pipi Peng, Yunqing Jia, Ziqiang Zhou, murmurhash, Zichong Xiao

    Abstract: Click-through-rate (CTR) prediction has an essential impact on improving user experience and revenue in e-commerce search. With the development of deep learning, graph-based methods are well exploited to utilize graph structure extracted from user behaviors and other information to help embedding learning. However, most of the previous graph-based methods mainly focus on recommendation scenarios,… ▽ More

    Submitted 4 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures

    ACM Class: H.3.3

  15. arXiv:2406.16083  [pdf, other

    eess.IV cs.CV

    Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

    Authors: Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong

    Abstract: Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 17 pages,7 figures

  16. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  17. arXiv:2406.13086  [pdf, other

    cs.RO cs.AI

    NaviSplit: Dynamic Multi-Branch Split DNNs for Efficient Distributed Autonomous Navigation

    Authors: Timothy K Johnsen, Ian Harshbarger, Zixia Xia, Marco Levorato

    Abstract: Lightweight autonomous unmanned aerial vehicles (UAV) are emerging as a central component of a broad range of applications. However, autonomous navigation necessitates the implementation of perception algorithms, often deep neural networks (DNN), that process the input of sensor observations, such as that from cameras and LiDARs, for control logic. The complexity of such algorithms clashes with th… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Journal ref: WoWMoM 2024

  18. arXiv:2406.10808  [pdf, other

    cs.LG

    Diffusion Model With Optimal Covariance Matching

    Authors: Zijing Ou, Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yingzhen Li, David Barber

    Abstract: The probabilistic diffusion model has become highly effective across various domains. Typically, sampling from a diffusion model involves using a denoising distribution characterized by a Gaussian with a learned mean and either fixed or learned covariances. In this paper, we leverage the recently proposed full covariance moment matching technique and introduce a novel method for learning covarianc… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  19. arXiv:2406.10090  [pdf, other

    cs.LG

    Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

    Authors: Zhang Chen, Luca Demetrio, Srishti Gupta, Xiaoyi Feng, Zhaoqiang Xia, Antonio Emanuele Cinà, Maura Pintor, Luca Oneto, Ambra Demontis, Battista Biggio, Fabio Roli

    Abstract: Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    MSC Class: 68T10 ACM Class: I.5

  20. arXiv:2406.09044  [pdf, other

    cs.CL

    MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

    Authors: Hanqing Wang, Zeguan Xiao, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen

    Abstract: Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while keeping the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might have interference with the well-learned sub… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  21. arXiv:2406.08723  [pdf, other

    cs.CL

    ECBD: Evidence-Centered Benchmark Design for NLP

    Authors: Yu Lu Liu, Su Lin Blodgett, Jackie Chi Kit Cheung, Q. Vera Liao, Alexandra Olteanu, Ziang Xiao

    Abstract: Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  22. arXiv:2406.06485  [pdf, other

    cs.CL cs.AI

    Can Language Models Serve as Text-Based World Simulators?

    Authors: Ruoyao Wang, Graham Todd, Ziang Xiao, Xingdi Yuan, Marc-Alexandre Côté, Peter Clark, Peter Jansen

    Abstract: Virtual environments play a key role in benchmarking advances in complex planning and decision-making tasks but are expensive and complicated to build by hand. Can current language models themselves serve as world simulators, correctly predicting how actions change different world states, thus bypassing the need for extensive manual coding? Our goal is to answer this question in the context of tex… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  23. arXiv:2406.05224  [pdf, other

    cs.NE

    ON-OFF Neuromorphic ISING Machines using Fowler-Nordheim Annealers

    Authors: Zihao Chen, Zhili Xiao, Mahmoud Akl, Johannes Leugring, Omowuyi Olajide, Adil Malik, Nik Dennler, Chad Harper, Subhankar Bose, Hector A. Gonzalez, Jason Eshraghian, Riccardo Pignari, Gianvito Urgese, Andreas G. Andreou, Sadasivan Shankar, Christian Mayr, Gert Cauwenberghs, Shantanu Chakrabartty

    Abstract: We introduce NeuroSA, a neuromorphic architecture specifically designed to ensure asymptotic convergence to the ground state of an Ising problem using an annealing process that is governed by the physics of quantum mechanical tunneling using Fowler-Nordheim (FN). The core component of NeuroSA consists of a pair of asynchronous ON-OFF neurons, which effectively map classical simulated annealing (SA… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 36 pages, 8 figures

  24. arXiv:2406.04344  [pdf, other

    cs.LG cs.CL cs.CV

    Verbalized Machine Learning: Revisiting Machine Learning with Language Models

    Authors: Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu

    Abstract: Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Technical Report v1 (92 pages, 15 figures)

  25. arXiv:2406.01587  [pdf, other

    cs.RO

    PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

    Authors: Yupeng Zheng, Zebin Xing, Qichao Zhang, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao

    Abstract: Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  26. arXiv:2406.00474  [pdf, other

    cs.CV

    Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth

    Authors: Zimin Xia, Yujiao Shi, Hongdong Li, Julian F. P. Kooij

    Abstract: Given a ground-level query image and a geo-referenced aerial image that covers the query's local surroundings, fine-grained cross-view localization aims to estimate the location of the ground camera inside the aerial image. Recent works have focused on developing advanced networks trained with accurate ground truth (GT) locations of ground images. However, the trained models always suffer a perfor… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  27. arXiv:2405.20614  [pdf, other

    cs.CV

    EPIDetect: Video-based convulsive seizure detection in chronic epilepsy mouse model for anti-epilepsy drug screening

    Authors: Junming Ren, Zhoujian Xiao, Yujia Zhang, Yujie Yang, Ling He, Ezra Yoon, Stephen Temitayo Bello, Xi Chen, Dapeng Wu, Micky Tortorella, Jufang He

    Abstract: In the preclinical translational studies, drug candidates with remarkable anti-epileptic efficacy demonstrate long-term suppression of spontaneous recurrent seizures (SRSs), particularly convulsive seizures (CSs), in mouse models of chronic epilepsy. However, the current methods for monitoring CSs have limitations in terms of invasiveness, specific laboratory settings, high cost, and complex opera… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  28. RealityEffects: Augmenting 3D Volumetric Videos with Object-Centric Annotation and Dynamic Visual Effects

    Authors: Jian Liao, Kevin Van, Zhijie Xia, Ryo Suzuki

    Abstract: This paper introduces RealityEffects, a desktop authoring interface designed for editing and augmenting 3D volumetric videos with object-centric annotations and visual effects. RealityEffects enhances volumetric capture by introducing a novel method for augmenting captured physical motion with embedded, responsive visual effects, referred to as object-centric augmentation. In RealityEffects, users… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: DIS 2024

  29. arXiv:2405.16852  [pdf, other

    cs.LG cs.AI stat.ML

    EM Distillation for One-step Diffusion Models

    Authors: Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying Nian Wu, Kevin Patrick Murphy, Tim Salimans, Ben Poole, Ruiqi Gao

    Abstract: While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Disti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  30. arXiv:2405.16605  [pdf, other

    cs.CV

    Demystify Mamba in Vision: A Linear Attention Perspective

    Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

    Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  31. arXiv:2405.15843  [pdf, other

    cs.CV cs.AI

    SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception

    Authors: Louis Foucard, Samar Khanna, Yi Shi, Chi-Kuei Liu, Quinn Z Shen, Thuyen Ngo, Zi-Xiang Xia

    Abstract: In this paper, we propose SpotNet: a fast, single stage, image-centric but LiDAR anchored approach for long range 3D object detection. We demonstrate that our approach to LiDAR/image sensor fusion, combined with the joint learning of 2D and 3D detection tasks, can lead to accurate 3D object detection with very sparse LiDAR support. Unlike more recent bird's-eye-view (BEV) sensor-fusion methods whi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  32. arXiv:2405.15125  [pdf, other

    cs.CV

    HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

    Authors: Yuanhao Cai, Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille

    Abstract: High dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: The first 3D Gaussian Splatting-based method for HDR imaging

  33. arXiv:2405.14870  [pdf, other

    cs.CV cs.RO

    An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

    Authors: Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

    Abstract: In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient tra… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 4 figures, 7 tables; Code at https://github.com/open-mmlab/mmdetection3d

  34. arXiv:2405.14864  [pdf, other

    cs.CV

    Video Diffusion Models are Training-free Motion Interpreter and Controller

    Authors: Zeqi Xiao, Yifan Zhou, Shuai Yang, Xingang Pan

    Abstract: Video generation primarily aims to model authentic and customized motion across frames, making understanding and controlling the motion a crucial topic. Most diffusion-based studies on video motion focus on motion customization with training-based paradigms, which, however, demands substantial training resources and necessitates retraining for diverse models. Crucially, these approaches do not exp… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Project Page: https://xizaoqu.github.io/moft/

  35. arXiv:2405.11868  [pdf, other

    cs.LG cs.AI cs.CE cs.IR cs.SI

    Towards Graph Contrastive Learning: A Survey and Beyond

    Authors: Wei Ju, Yifan Wang, Yifang Qin, Zhengyang Mao, Zhiping Xiao, Junyu Luo, Junwei Yang, Yiyang Gu, Dongjie Wang, Qingqing Long, Siyu Yi, Xiao Luo, Ming Zhang

    Abstract: In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  36. arXiv:2405.09341  [pdf, other

    cs.CL cs.AI

    Large Language Model Bias Mitigation from the Perspective of Knowledge Editing

    Authors: Ruizhe Chen, Yichen Li, Zikai Xiao, Zuozhu Liu

    Abstract: Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assess… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  37. arXiv:2405.08748  [pdf, other

    cs.CV

    Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

    Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

    Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Project Page: https://dit.hunyuan.tencent.com/

  38. arXiv:2405.07012  [pdf, other

    cs.CV

    Incorporating Degradation Estimation in Light Field Spatial Super-Resolution

    Authors: Zeyu Xiao, Zhiwei Xiong

    Abstract: Recent advancements in light field super-resolution (SR) have yielded impressive results. In practice, however, many existing methods are limited by assuming fixed degradation models, such as bicubic downsampling, which hinders their robustness in real-world scenarios with complex degradations. To address this limitation, we present LF-DEST, an effective blind Light Field SR method that incorporat… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  39. arXiv:2405.04773  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Hypergraph-enhanced Dual Semi-supervised Graph Classification

    Authors: Wei Ju, Zhengyang Mao, Siyu Yi, Yifang Qin, Yiyang Gu, Zhiping Xiao, Yifan Wang, Xiao Luo, Ming Zhang

    Abstract: In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreove… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  40. arXiv:2405.03413  [pdf, other

    cs.RO

    A real-time, robust and versatile visual-SLAM framework based on deep learning networks

    Authors: Zhang Xiao, Shuaixin Li

    Abstract: This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports… ▽ More

    Submitted 4 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  41. arXiv:2405.03218  [pdf, other

    cs.CV

    Elevator, Escalator or Neither? Classifying Pedestrian Conveyor State Using Inertial Navigation System

    Authors: Tianlang He, Zhiqiu Xia, S. -H. Gary Chan

    Abstract: Classifying a pedestrian in one of the three conveyor states of "elevator," "escalator" and "neither" is fundamental to many applications such as indoor localization and people flow analysis. We estimate, for the first time, the pedestrian conveyor state given the inertial navigation system (INS) readings of accelerometer, gyroscope and magnetometer sampled from the phone. Our problem is challengi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  42. arXiv:2405.02844  [pdf, other

    cs.CV

    SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

    Authors: Ziyun Qian, Zeyu Xiao, Zhenyi Wu, Dingkang Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Dongliang Kou, Lihua Zhang

    Abstract: Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, wh… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  43. arXiv:2405.01066  [pdf, other

    cs.CV cs.AI cs.HC

    HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

    Authors: Zixun Jiao, Xihan Wang, Zhaoqiang Xia, Lianhe Shao, Quanli Gao

    Abstract: Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high comput… ▽ More

    Submitted 14 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures

  44. arXiv:2404.15643  [pdf, ps, other

    cs.IT eess.SP

    Dynamic Beam Coverage for Satellite Communications Aided by Movable-Antenna Array

    Authors: Lipeng Zhu, Xiangyu Pi, Wenyan Ma, Zhenyu Xiao, Rui Zhang

    Abstract: Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  45. arXiv:2404.12850  [pdf, other

    cs.LG cs.DC

    CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

    Authors: Zeke Xia, Ming Hu, Dengke Yan, Xiaofei Xie, Tianlin Li, Anran Li, Junlong Zhou, Mingsong Chen

    Abstract: Federated Learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL a… ▽ More

    Submitted 17 July, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  46. arXiv:2404.12846  [pdf, other

    cs.LG

    KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

    Authors: Zeke Xia, Ming Hu, Dengke Yan, Ruixuan Liu, Anran Li, Xiaofei Xie, Mingsong Chen

    Abstract: Although Split Federated Learning (SFL) is good at enabling knowledge sharing among resource-constrained clients, it suffers from the problem of low training accuracy due to the neglect of data heterogeneity and catastrophic forgetting. To address this issue, we propose a novel SFL approach named KoReA-SFL, which adopts a multi-model aggregation mechanism to alleviate gradient divergence caused by… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  47. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  48. arXiv:2404.08679  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

    Authors: Andi Zhang, Tim Z. Xiao, Weiyang Liu, Robert Bamler, Damon Wischik

    Abstract: We revisit the likelihood ratio between a pretrained large language model (LLM) and its finetuned variant as a criterion for out-of-distribution (OOD) detection. The intuition behind such a criterion is that, the pretrained LLM has the prior knowledge about OOD data due to its large amount of training data, and once finetuned with the in-distribution data, the LLM has sufficient knowledge to disti… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  49. arXiv:2404.06486  [pdf, other

    cs.LG cs.CV

    GO4Align: Group Optimization for Multi-Task Alignment

    Authors: Jiayi Shen, Cheems Wang, Zehao Xiao, Nanne Van Noord, Marcel Worring

    Abstract: This paper proposes \textit{GO4Align}, a multi-task optimization approach that tackles task imbalance by explicitly aligning the optimization across tasks. To achieve this, we design an adaptive group risk minimization strategy, compromising two crucial techniques in implementation: (i) dynamical group assignment, which clusters similar tasks based on task interactions; (ii) risk-guided group indi… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  50. arXiv:2404.04884  [pdf, other

    cs.CV

    LRNet: Change detection of high-resolution remote sensing imagery via strategy of localization-then-refinement

    Authors: Huan Zhong, Chen Wu, Ziqi Xiao

    Abstract: Change detection, as a research hotspot in the field of remote sensing, has witnessed continuous development and progress. However, the discrimination of boundary details remains a significant bottleneck due to the complexity of surrounding elements between change areas and backgrounds. Discriminating the boundaries of large change areas results in misalignment, while connecting boundaries occurs… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 18 pages, 11 figures