Zum Hauptinhalt springen

Showing 1–50 of 8,391 results for author: Guo

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17383  [pdf, other

    cs.LG cs.AI

    MoRe Fine-Tuning with 10x Fewer Parameters

    Authors: Wenxuan Tan, Nicholas Roberts, Tzu-Heng Huang, Jitian Zhao, John Cooper, Samuel Guo, Chengyu Duan, Frederic Sala

    Abstract: Parameter-efficient fine-tuning (PEFT) techniques have unlocked the potential to cheaply and easily specialize large pretrained models. However, the most prominent approaches, like low-rank adapters (LoRA), depend on heuristics or rules-of-thumb for their architectural choices -- potentially limiting their performance for new models and architectures. This limitation suggests that techniques from… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.17175  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

    Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for audio tokenization. However, these codecs were or… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  3. arXiv:2408.17064  [pdf, other

    cs.CV cs.AI cs.LG

    Instant Adversarial Purification with Adversarial Consistency Distillation

    Authors: Chun Tong Lei, Hon Ming Yam, Zhongliang Guo, Chun Pong Lau

    Abstract: Neural networks, despite their remarkable performance in widespread applications, including image classification, are also known to be vulnerable to subtle adversarial noise. Although some diffusion-based purification methods have been proposed, for example, DiffPure, those methods are time-consuming. In this paper, we propose One Step Control Purification (OSCP), a diffusion-based purification mo… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  4. arXiv:2408.17051  [pdf, other

    cs.SI

    Service-Oriented AoI Modeling and Analysis for Non-Terrestrial Networks

    Authors: Zheng Guo, Qian Chen, Weixiao Meng

    Abstract: To achieve truly seamless global intelligent connectivity, non-terrestrial networks (NTN) mainly composed of low earth orbit (LEO) satellites and drones are recognized as important components of the future 6G network architecture. Meanwhile, the rapid advancement of the Internet of Things (IoT) has led to the proliferation of numerous applications with stringent requirements for timely information… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures

  5. arXiv:2408.17031  [pdf, other

    cs.CR

    Meta-UAD: A Meta-Learning Scheme for User-level Network Traffic Anomaly Detection

    Authors: Tongtong Feng, Qi Qi, Lingqi Guo, Jingyu Wang

    Abstract: Accuracy anomaly detection in user-level network traffic is crucial for network security. Compared with existing models that passively detect specific anomaly classes with large labeled training samples, user-level network traffic contains sizeable new anomaly classes with few labeled samples and has an imbalance, self-similar, and data-hungry nature. Motivation on those limitations, in this paper… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Under reviewing. arXiv admin note: substantial text overlap with arXiv:2408.14884

  6. arXiv:2408.16768  [pdf, other

    cs.CV cs.AI cs.CL

    SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

    Authors: Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Chengzhuo Tong, Peng Gao, Chunyuan Li, Pheng-Ann Heng

    Abstract: We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for zero-shot and promptable 3D segmentation. SAM2Point interprets any 3D data as a series of multi-directional videos, and leverages SAM 2 for 3D-space segmentation, without further training or 2D-3D projection. Our framework supports various prompt types, including 3D points, boxes, and masks, and can gen… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Work in progress. Online Demo: https://huggingface.co/spaces/ZiyuG/SAM2Point . Code: https://github.com/ZiyuGuo99/SAM2Point

  7. arXiv:2408.16732  [pdf, other

    q-bio.NC cs.SD eess.AS q-bio.QM

    Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

    Authors: Cong Zhang, Wenxing Guo, Hongsheng Dai

    Abstract: This study addresses the TAUKADIAL challenge, focusing on the classification of speech from people with Mild Cognitive Impairment (MCI) and neurotypical controls. We conducted three experiments comparing five machine-learning methods: Random Forests, Sparse Logistic Regression, k-Nearest Neighbors, Sparse Support Vector Machine, and Decision Tree, utilizing 1076 acoustic features automatically ext… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  8. arXiv:2408.16702  [pdf, other

    cs.HC

    VMC: A Grammar for Visualizing Statistical Model Checks

    Authors: Ziyang Guo, Alex Kale, Matthew Kay, Jessica Hullman

    Abstract: Visualizations play a critical role in validating and improving statistical models. However, the design space of model check visualizations is not well understood, making it difficult for authors to explore and specify effective graphical model checks. VMC defines a model check visualization using four components: (1) samples of distributions of checkable quantities generated from the model, inclu… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16699  [pdf, other

    cs.PL

    To Be or Not To Be: Adding Integrity Constraints to stableKanren to Make a Decision

    Authors: Xiangyu Guo, Ajay Bansal

    Abstract: We integrate integrity constraints to stableKanren to enable a new problem-solving paradigm in combinatorial search problems. stableKanren extends miniKanren to reasoning about contradictions under stable model semantics. However, writing programs to solve combinatorial search problems in stableKanren did not fully utilize the contradiction reasoning. This is mainly due to the lack of control over… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 figures, ICFP '24 The miniKanren and Relational Programming Workshop

    MSC Class: 03B70; 68T27; 68T30 ACM Class: D.3.0

  10. arXiv:2408.16498  [pdf, other

    cs.SE

    A Survey on Evaluating Large Language Models in Code Generation Tasks

    Authors: Liguo Chen, Qi Guo, Hongrui Jia, Zhengran Zeng, Xin Wang, Yijiang Xu, Jian Wu, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang

    Abstract: This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applicatio… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  11. CooTest: An Automated Testing Approach for V2X Communication Systems

    Authors: An Guo, Xinyu Gao, Zhenyu Chen, Yuan Xiao, Jiakai Liu, Xiuting Ge, Weisong Sun, Chunrong Fang

    Abstract: Perceiving the complex driving environment precisely is crucial to the safe operation of autonomous vehicles. With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) collaboration has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. However, despite spectacular progress, several co… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Journal ref: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '24), September 16--20, 2024, Vienna, Austria

  12. arXiv:2408.16467  [pdf, other

    cs.NE cs.CV

    Spiking Diffusion Models

    Authors: Jiahang Cao, Hanzhong Guo, Ziqing Wang, Deming Zhou, Hao Cheng, Qiang Zhang, Renjing Xu

    Abstract: Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional Artificial Neural Networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this paper, we propose the Spiking D… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Artificial Intelligence

  13. arXiv:2408.16400  [pdf, other

    cs.CR

    Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

    Authors: Yuejun Guo, Constantinos Patsakis, Qiang Hu, Qiang Tang, Fran Casino

    Abstract: The significant increase in software production driven by automation and faster development lifecycles has resulted in a corresponding surge in software vulnerabilities. In parallel, the evolving landscape of software vulnerability detection, highlighting the shift from traditional methods to machine learning and large language models (LLMs), provides massive opportunities at the cost of resource-… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to ESORICS 2024

  14. arXiv:2408.16373  [pdf, other

    cs.SD eess.AS

    Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis

    Authors: Zehai Tu, Guangyan Zhang, Yiting Lu, Adaeze Adigwe, Simon King, Yiwen Guo

    Abstract: Tokenising continuous speech into sequences of discrete tokens and modelling them with language models (LMs) has led to significant success in text-to-speech (TTS) synthesis. Although these models can generate speech with high quality and naturalness, their synthesised samples can still suffer from artefacts, mispronunciation, word repeating, etc. In this paper, we argue these undesirable properti… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  15. arXiv:2408.16300  [pdf, other

    cs.NE math.OC

    A Distance Similarity-based Genetic Optimization Algorithm for Satellite Ground Network Planning Considering Feeding Mode

    Authors: Yingying Ren, Qiuli Li, Yangyang Guo, Witold Pedrycz, Lining Xing, Anfeng Liu, Yanjie Song

    Abstract: With the rapid development of the satellite industry, the information transmission network based on communication satellites has gradually become a major and important part of the future satellite ground integration network. However, the low transmission efficiency of the satellite data relay back mission has become a problem that is currently constraining the construction of the system and needs… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 25 pages

  16. arXiv:2408.16257  [pdf, other

    cs.PL

    Improving stableKanren's Backward Compatibility

    Authors: Xiangyu Guo, Ajay Bansal

    Abstract: We improve the backward compatibility of stableKanren to run miniKanren programs. stableKanren is a miniKanren extension capable of non-monotonic reasoning through stable model semantics. However, standard miniKanren programs that produce infinite results do not run as expected in stableKanren. According to stable model semantics, the contradictions are created by negations. A standard miniKanren'… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 7 pages, 2 figures, ICFP '24 The miniKanren and Relational Programming Workshop

    MSC Class: 03B70; 68T27; 68T30 ACM Class: D.3.0

  17. arXiv:2408.16251  [pdf, other

    cs.IT eess.SP

    Neural Network-Assisted Hybrid Model Based Message Passing for Parametric Holographic MIMO Near Field Channel Estimation

    Authors: Zhengdao Yuan, Yabo Guo, Dawei Gao, Qinghua Guo, Zhongyong Wang, Chongwen Huang, Ming Jin, Kai-Kit Wong

    Abstract: Holographic multiple-input and multiple-output (HMIMO) is a promising technology with the potential to achieve high energy and spectral efficiencies, enhance system capacity and diversity, etc. In this work, we address the challenge of HMIMO near field (NF) channel estimation, which is complicated by the intricate model introduced by the dyadic Green's function. Despite its complexity, the channel… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  18. arXiv:2408.16212  [pdf, other

    astro-ph.EP astro-ph.SR cs.LG

    The Application of Machine Learning in Tidal Evolution Simulation of Star-Planet Systems

    Authors: Shuaishuai Guo, Jianheng Guo, KaiFan Ji, Hui Liu, Lei Xing

    Abstract: With the release of a large amount of astronomical data, an increasing number of close-in hot Jupiters have been discovered. Calculating their evolutionary curves using star-planet interaction models presents a challenge. To expedite the generation of evolutionary curves for these close-in hot Jupiter systems, we utilized tidal interaction models established on MESA to create 15,745 samples of sta… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  19. arXiv:2408.16028  [pdf, other

    cs.CR cs.LG cs.SE

    ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data

    Authors: Weizhou Wang, Eric Liu, Xiangyu Guo, David Lie

    Abstract: Supervised learning-based software vulnerability detectors often fall short due to the inadequate availability of labelled training data. In contrast, Large Language Models (LLMs) such as GPT-4, are not trained on labelled data, but when prompted to detect vulnerabilities, LLM prediction accuracy is only marginally better than random guessing. In this paper, we explore a different approach by refr… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  20. arXiv:2408.15999  [pdf

    q-bio.QM cs.LG

    Q-MRS: A Deep Learning Framework for Quantitative Magnetic Resonance Spectra Analysis

    Authors: Christopher J. Wu, Lawrence S. Kegeles, Jia Guo

    Abstract: Magnetic resonance spectroscopy (MRS) is an established technique for studying tissue metabolism, particularly in central nervous system disorders. While powerful and versatile, MRS is often limited by challenges associated with data quality, processing, and quantification. Existing MRS quantification methods face difficulties in balancing model complexity and reproducibility during spectral model… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures, and 3 tables for the main body; 9 pages, 4 figures, and 3 tables for the supplementary material

  21. arXiv:2408.15915  [pdf, other

    cs.CV cs.AI cs.CL

    Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

    Authors: Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yucheng Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu

    Abstract: The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) mode… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 28 pages, 12 tables, 10 figures

  22. arXiv:2408.15663  [pdf, other

    cs.RO

    NeuroVE: Brain-inspired Linear-Angular Velocity Estimation with Spiking Neural Networks

    Authors: Xiao Li, Xieyuanli Chen, Ruibin Guo, Yujie Wu, Zongtan Zhou, Fangwen Yu, Huimin Lu

    Abstract: Vision-based ego-velocity estimation is a fundamental problem in robot state estimation. However, the constraints of frame-based cameras, including motion blur and insufficient frame rates in dynamic settings, readily lead to the failure of conventional velocity estimation techniques. Mammals exhibit a remarkable ability to accurately estimate their ego-velocity during aggressive movement. Hence,… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  23. arXiv:2408.15580  [pdf, other

    cs.CV

    Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection

    Authors: Jinglun Li, Xinyu Zhou, Pinxue Guo, Yixuan Sun, Yiwen Huang, Weifeng Ge, Wenqiang Zhang

    Abstract: Detecting out-of-distribution inputs for visual recognition models has become critical in safe deep learning. This paper proposes a novel hierarchical visual category modeling scheme to separate out-of-distribution data from in-distribution data through joint representation learning and statistical modeling. We learn a mixture of Gaussian models for each in-distribution category. There are many Ga… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ICCV2023

  24. arXiv:2408.15566  [pdf, other

    cs.CV

    TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

    Authors: Jinglun Li, Xinyu Zhou, Kaixun Jiang, Lingyi Hong, Pinxue Guo, Zhaoyu Chen, Weifeng Ge, Wenqiang Zhang

    Abstract: Multimodal fusion, leveraging data like vision and language, is rapidly gaining traction. This enriched data representation improves performance across various tasks. Existing methods for out-of-distribution (OOD) detection, a critical area where AI models encounter unseen data in real-world scenarios, rely heavily on whole-image features. These image-level features can include irrelevant informat… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ACMMM2024

  25. arXiv:2408.15251  [pdf, other

    cs.CV cs.LG

    TrajFM: A Vehicle Trajectory Foundation Model for Region and Task Transferability

    Authors: Yan Lin, Tonglong Wei, Zeyu Zhou, Haomin Wen, Jilin Hu, Shengnan Guo, Youfang Lin, Huaiyu Wan

    Abstract: Vehicle trajectories provide valuable movement information that supports various downstream tasks and powers real-world applications. A desirable trajectory learning model should transfer between different regions and tasks without retraining, thus improving computational efficiency and effectiveness with limited training data. However, a model's ability to transfer across regions is limited by th… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  26. arXiv:2408.15076  [pdf, other

    cs.LG cs.AI

    MiWaves Reinforcement Learning Algorithm

    Authors: Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

    Abstract: The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforc… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.17739

  27. arXiv:2408.14977  [pdf, other

    eess.IV cs.CV

    LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

    Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

    Abstract: Accurate segmentation of rectal lymph nodes is crucial for the staging and treatment planning of rectal cancer. However, the complexity of the surrounding anatomical structures and the scarcity of annotated data pose significant challenges. This study introduces a novel lymph node synthesis technique aimed at generating diverse and realistic synthetic rectal lymph node samples to mitigate the reli… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 8 pages

  28. arXiv:2408.14972  [pdf, other

    cs.CL

    AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

    Authors: Chi-Min Chan, Jianxuan Yu, Weize Chen, Chunyang Jiang, Xinyu Liu, Weijie Shi, Zhiyuan Liu, Wei Xue, Yike Guo

    Abstract: The rapid advancement of large language models (LLMs) has led to the rise of LLM-based agents. Recent research shows that multi-agent systems (MAS), where each agent plays a specific role, can outperform individual LLMs. However, configuring an MAS for a task remains challenging, with performance only observable post-execution. Inspired by scaling laws in LLM development, we investigate whether MA… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  29. arXiv:2408.14957  [pdf, other

    cs.CV

    Applying ViT in Generalized Few-shot Semantic Segmentation

    Authors: Liyuan Geng, Jinhong Xia, Yuanhe Guo

    Abstract: This paper explores the capability of ViT-based models under the generalized few-shot semantic segmentation (GFSS) framework. We conduct experiments with various combinations of backbone models, including ResNets and pretrained Vision Transformer (ViT)-based models, along with decoders featuring a linear classifier, UPerNet, and Mask Transformer. The structure made of DINOv2 and linear classifier… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures

  30. arXiv:2408.14909  [pdf, other

    cs.CL cs.LG cs.NE

    SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

    Authors: Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng

    Abstract: Known as low energy consumption networks, spiking neural networks (SNNs) have gained a lot of attention within the past decades. While SNNs are increasing competitive with artificial neural networks (ANNs) for vision tasks, they are rarely used for long sequence tasks, despite their intrinsic temporal dynamics. In this work, we develop spiking state space models (SpikingSSMs) for long sequence lea… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  31. arXiv:2408.14757  [pdf, other

    cs.CV cs.LG

    Learning effective pruning at initialization from iterative pruning

    Authors: Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

    Abstract: Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  32. arXiv:2408.14744  [pdf, other

    cs.CV cs.AI

    RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models

    Authors: Junyao Ge, Yang Zheng, Kaitai Guo, Jimin Liang

    Abstract: Abundant, well-annotated multimodal data in remote sensing are pivotal for aligning complex visual remote sensing (RS) scenes with human language, enabling the development of specialized vision language models across diverse RS interpretation tasks. However, annotating RS images with rich linguistic semantics at scale demands expertise in RS and substantial human labor, making it costly and often… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Submitted to ISPRS

    ACM Class: I.4.8; I.2.10

  33. arXiv:2408.14690  [pdf, other

    cs.CL cs.AI

    Training-Free Activation Sparsity in Large Language Models

    Authors: James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun

    Abstract: Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass. However, existing methods face limitations that inhibit widespread adoption. Some approaches are tailored towards older models with ReLU-based sparsity, while others require extensive continued pre-train… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  34. arXiv:2408.14600  [pdf, other

    cs.CV

    PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

    Authors: Yidi Li, Jiahao Wen, Bin Ren, Wenhao Li, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features within regions of interest can lead to information loss and limitations in local feature representation. To tackle these challenges, we propose a novel two… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 3D Object Detection

  35. arXiv:2408.14585  [pdf, other

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Audio-Visual Speaker Tracking with Incomplete Modalities

  36. arXiv:2408.14472  [pdf, other

    cs.RO cs.AI eess.SY

    Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

    Authors: Xinyang Gu, Yen-Jen Wang, Xiang Zhu, Chengming Shi, Yanjiang Guo, Yichen Liu, Jianyu Chen

    Abstract: Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinfor… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Robotics: Science and Systems (RSS), 2024. (Best Paper Award Finalist)

  37. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  38. arXiv:2408.13988  [pdf, other

    cs.CV cs.AI

    Automatic Medical Report Generation: Methods and Applications

    Authors: Li Guo, Anas M. Tahir, Dong Zhang, Z. Jane Wang, Rabab K. Ward

    Abstract: The increasing demand for medical imaging has surpassed the capacity of available radiologists, leading to diagnostic delays and potential misdiagnoses. Artificial intelligence (AI) techniques, particularly in automatic medical report generation (AMRG), offer a promising solution to this dilemma. This review comprehensively examines AMRG methods from 2021 to 2024. It (i) presents solutions to prim… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 42 pages and 9 figures

  39. arXiv:2408.13896  [pdf, other

    cs.CV cs.CR

    RT-Attack: Jailbreaking Text-to-Image Models via Random Token

    Authors: Sensen Gao, Xiaojun Jia, Yihao Huang, Ranjie Duan, Jindong Gu, Yang Liu, Qing Guo

    Abstract: Recently, Text-to-Image(T2I) models have achieved remarkable success in image generation and editing, yet these models still have many potential issues, particularly in generating inappropriate or Not-Safe-For-Work(NSFW) content. Strengthening attacks and uncovering such vulnerabilities can advance the development of reliable and practical T2I models. Most of the previous works treat T2I models as… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  40. arXiv:2408.13893  [pdf, other

    cs.SD cs.CL eess.AS

    SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng

    Abstract: Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level, previous large-scale TTS models can be categorized into either Auto-regressive (AR) based (\textit{e.g.}, VALL-E) or Non-auto-regressive (NAR) based models (\textit{e.g.}, NaturalSpeech 2/3). Although these works dem… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Submit to TASLP

  41. arXiv:2408.13877  [pdf, other

    cs.CV

    Camouflaged_Object_Tracking__A_Benchmark

    Authors: Xiaoyu Guo, Pengzhi Zhong, Hao Zhang, Ling Huang, Defeng Huang, Shuiwang Li

    Abstract: Visual tracking has seen remarkable advancements, largely driven by the availability of large-scale training datasets that have enabled the development of highly accurate and robust algorithms. While significant progress has been made in tracking general objects, research on more challenging scenarios, such as tracking camouflaged objects, remains limited. Camouflaged objects, which blend seamless… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  42. arXiv:2408.13759  [pdf, other

    cs.RO

    MASQ: Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion

    Authors: Qi Liu, Jingxiang Guo, Sixu Lin, Shuaikang Ma, Jinxuan Zhu, Yanjie Li

    Abstract: This paper proposes a novel method to improve locomotion learning for a single quadruped robot using multi-agent deep reinforcement learning (MARL). Many existing methods use single-agent reinforcement learning for an individual robot or MARL for the cooperative task in multi-robot systems. Unlike existing methods, this paper proposes using MARL for the locomotion learning of a single quadruped ro… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  43. arXiv:2408.13750  [pdf, other

    cs.AI cs.MA

    Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective

    Authors: Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li

    Abstract: Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the fir… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  44. arXiv:2408.13454  [pdf, other

    cs.CV

    AdaOcc: Adaptive-Resolution Occupancy Prediction

    Authors: Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren

    Abstract: Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computationa… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  45. arXiv:2408.13399  [pdf, other

    cs.IR cs.AI

    Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement Learning

    Authors: Dillon Davis, Huiji Gao, Weiwei Guo, Thomas Legrand, Malay Haldar, Alex Deng, Han Zhao, Liwei He, Sanjeev Katariya

    Abstract: The Airbnb search system grapples with many unique challenges as it continues to evolve. We oversee a marketplace that is nuanced by geography, diversity of homes, and guests with a variety of preferences. Crafting an efficient search system that can accommodate diverse guest needs, while showcasing relevant homes lies at the heart of Airbnb's success. Airbnb search has many challenges that parall… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  46. arXiv:2408.13370  [pdf, other

    cs.CV cs.GR

    BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting

    Authors: Zhenyuan Liu, Yu Guo, Xinyuan Li, Bernd Bickel, Ran Zhang

    Abstract: We present Bidirectional Gaussian Primitives, an image-based novel view synthesis technique designed to represent and render 3D objects with surface and volumetric materials under dynamic illumination. Our approach integrates light intrinsic decomposition into the Gaussian splatting framework, enabling real-time relighting of 3D objects. To unify surface and volumetric material within a cohesive a… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  47. arXiv:2408.13193  [pdf, other

    cs.CG

    Critical Point Extraction from Multivariate Functional Approximation

    Authors: Guanqun Ma, David Lenz, Tom Peterka, Hanqi Guo, Bei Wang

    Abstract: Advances in high-performance computing require new ways to represent large-scale scientific data to support data storage, data transfers, and data analysis within scientific workflows. Multivariate functional approximation (MFA) has recently emerged as a new continuous meshless representation that approximates raw discrete data with a set of piecewise smooth functions. An MFA model of data thus of… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: TopoInVis 2024, 11 pages with 1-page appendix

  48. arXiv:2408.13126  [pdf, other

    cs.CV

    CathAction: A Benchmark for Endovascular Intervention Understanding

    Authors: Baoru Huang, Tuan Vo, Chayun Kongtongvattana, Giulio Dagnino, Dennis Kundrat, Wenqiang Chi, Mohamed Abdelaziz, Trevor Kwok, Tudor Jianu, Tuong Do, Hieu Le, Minh Nguyen, Hoan Nguyen, Erman Tjiputra, Quang Tran, Jianyang Xie, Yanda Meng, Binod Bhattarai, Zhaorui Tan, Hongbin Liu, Hong Seng Gan, Wei Wang, Xi Yang, Qiufeng Wang, Jionglong Su , et al. (13 additional authors not shown)

    Abstract: Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale datase… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 10 pages. Webpage: https://airvlab.github.io/cathaction/

  49. arXiv:2408.13036  [pdf, other

    cs.CV

    S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points

    Authors: Bing He, Yunuo Chen, Guo Lu, Li Song, Wenjun Zhang

    Abstract: Recently, the dynamic scene reconstruction using Gaussians has garnered increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in the canonical space. However, the inherently low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scen… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  50. arXiv:2408.13005  [pdf, other

    cs.CV

    EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation

    Authors: Cong Wang, Jiaxi Gu, Panwen Hu, Haoyu Zhao, Yuanfan Guo, Jianhua Han, Hang Xu, Xiaodan Liang

    Abstract: Following the advancements in text-guided image generation technology exemplified by Stable Diffusion, video generation is gaining increased attention in the academic community. However, relying solely on text guidance for video generation has serious limitations, as videos contain much richer content than images, especially in terms of motion. This information can hardly be adequately described w… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.