Skip to main content

Showing 1–50 of 653 results for author: Yu, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.10701  [pdf, other

    cs.CL

    DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

    Authors: Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu

    Abstract: Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple reading comprehension tasks. Consequently, these systems have been carefully designed to tackle challenges such as file parsing, metadata extraction, m… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Work in progress

  3. Towards Robust Recommendation via Decision Boundary-aware Graph Contrastive Learning

    Authors: Jiakai Tang, Sunhao Dai, Zexu Sun, Xu Chen, Jun Xu, Wenhui Yu, Lantao Hu, Peng Jiang, Han Li

    Abstract: In recent years, graph contrastive learning (GCL) has received increasing attention in recommender systems due to its effectiveness in reducing bias caused by data sparsity. However, most existing GCL models rely on heuristic approaches and usually assume entity independence when constructing contrastive views. We argue that these methods struggle to strike a balance between semantic invariance an… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: KDD 2024

  4. arXiv:2407.09324  [pdf, other

    cs.LG cs.AI cs.IT

    Provable Privacy Advantages of Decentralized Federated Learning via Distributed Optimization

    Authors: Wenrui Yu, Qiongxiu Li, Milan Lopuhaä-Zwakenberg, Mads Græsbøll Christensen, Richard Heusdens

    Abstract: Federated learning (FL) emerged as a paradigm designed to improve data privacy by enabling data to reside at its source, thus embedding privacy as a core consideration in FL architectures, whether centralized or decentralized. Contrasting with recent findings by Pasquini et al., which suggest that decentralized FL does not empirically offer any additional privacy or security benefits over centrali… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  5. arXiv:2407.09013  [pdf, ps, other

    cs.AI cs.LG

    Procedural Content Generation via Generative Artificial Intelligence

    Authors: Xinyu Mao, Wanli Yu, Kazunori D Yamada, Michael R. Zielewski

    Abstract: The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is e… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  6. PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

    Authors: Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong

    Abstract: Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emission… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  8. arXiv:2407.07304  [pdf, other

    cs.AI

    Inference Performance Optimization for Large Language Models on CPUs

    Authors: Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie

    Abstract: Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardw… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 5 pages, 6 figure, ICML 2024 on Foundation Models in the Wild

  9. arXiv:2407.05540  [pdf, other

    cs.CV

    GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

    Authors: Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan

    Abstract: Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph fo… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  10. arXiv:2407.05413  [pdf, other

    cs.AI cs.CL cs.LG

    SBoRA: Low-Rank Adaptation with Regional Weight Updates

    Authors: Lai-Man Po, Yuyang Liu, Haoxuan Wu, Tianqi Zhang, Wing-Yin Yu, Zeyu Jiang, Kun Li

    Abstract: This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models that builds upon the pioneering works of Low-Rank Adaptation (LoRA) and Orthogonal Adaptation. SBoRA further reduces the computational and memory requirements of LoRA while enhancing learning performance. By leveraging orthogonal standard basis vectors to initialize one of… ▽ More

    Submitted 10 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: 15 pages, 2 figures

  11. arXiv:2407.03971  [pdf, other

    cs.CV

    MineNetCD: A Benchmark for Global Mining Change Detection on Remote Sensing Imagery

    Authors: Weikang Yu, Xiaokang Zhang, Xiao Xiang Zhu, Richard Gloaguen, Pedram Ghamisi

    Abstract: Monitoring changes triggered by mining activities is crucial for industrial controlling, environmental management and regulatory compliance, yet it poses significant challenges due to the vast and often remote locations of mining sites. Remote sensing technologies have increasingly become indispensable to detect and analyze these changes over time. We thus introduce MineNetCD, a comprehensive benc… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  12. arXiv:2407.02190  [pdf, other

    cs.RO

    I2EKF-LO: A Dual-Iteration Extended Kalman Filter Based LiDAR Odometry

    Authors: Wenlu Yu, Jie Xu, Chengwei Zhao, Lijun Zhao, Thien-Minh Nguyen, Shenghai Yuan, Mingming Bai, Lihua Xie

    Abstract: LiDAR odometry is a pivotal technology in the fields of autonomous driving and autonomous mobile robotics. However, most of the current works focus on nonlinear optimization methods, and still existing many challenges in using the traditional Iterative Extended Kalman Filter (IEKF) framework to tackle the problem: IEKF only iterates over the observation equation, relying on a rough estimate of the… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024

  13. arXiv:2407.01950  [pdf, other

    cs.RO cs.AI

    LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance

    Authors: Wenhao Yu, Jie Peng, Huanyu Yang, Junrui Zhang, Yifan Duan, Jianmin Ji, Yanyong Zhang

    Abstract: The conditional diffusion model has been demonstrated as an efficient tool for learning robot policies, owing to its advancement to accurately model the conditional distribution of policies. The intricate nature of real-world scenarios, characterized by dynamic obstacles and maze-like structures, underscores the complexity of robot local navigation decision-making as a conditional distribution pro… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures, accepted by IROS 2024

  14. arXiv:2407.01875  [pdf, ps, other

    cs.AI

    Spatio-Temporal Graphical Counterfactuals: An Overview

    Authors: Mingyu Kang, Duxin Chen, Ziyuan Pu, Jianxi Gao, Wenwu Yu

    Abstract: Counterfactual thinking is a critical yet challenging topic for artificial intelligence to learn knowledge from data and ultimately improve their performances for new scenarios. Many research works, including Potential Outcome Model and Structural Causal Model, have been proposed to realize it. However, their modelings, theoretical foundations and application approaches are usually different. More… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2407.01029  [pdf, other

    cs.CV

    EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

    Authors: Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan

    Abstract: 3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accpeted by MICCAI2024

  16. arXiv:2407.00029  [pdf, other

    cs.DC

    Distributed Inference Performance Optimization for LLMs on CPUs

    Authors: Pujiang He, Shan Zhou, Changqing Li, Wenhuan Huang, Weifei Yu, Duyi Wang, Chen Meng, Sheng Gui

    Abstract: Large language models (LLMs) hold tremendous potential for addressing numerous real-world challenges, yet they typically demand significant computational resources and memory. Deploying LLMs onto a resource-limited hardware device with restricted memory capacity presents considerable challenges. Distributed computing emerges as a prevalent strategy to mitigate single-node memory constraints and ex… ▽ More

    Submitted 16 May, 2024; originally announced July 2024.

    Comments: 4 pages, 3 figures, Practical ML for Low Resource Settings Workshop @ ICLR 2024

  17. arXiv:2406.20019  [pdf, other

    cs.IT

    Capacity Bounds for Broadcast Channels with Bidirectional Conferencing Decoders

    Authors: Reza K. Farsani, Wei Yu

    Abstract: The two-user broadcast channel (BC) with receivers connected by cooperative links of given capacities, known as conferencing decoders, is considered. A novel outer bound on the capacity region is established. This outer bound is derived using multiple applications of the Csiszár-Körner identity. New achievable rate regions are also presented. A first achievable rate region is derived by applying M… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  18. arXiv:2406.19820  [pdf, other

    cs.CL cs.AI

    BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Haotian Wang, Kun Zhu, Xiyuan Du, Weijiang Yu, Ming Liu, Bing Qin

    Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-sourc… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  19. arXiv:2406.18361  [pdf, other

    cs.CV cs.AI eess.IV

    Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process

    Authors: Tianyu Lin, Zhiguang Chen, Zhonghao Yan, Weijiang Yu, Fudan Zheng

    Abstract: Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first laten… ▽ More

    Submitted 9 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted at MICCAI 2024. Code and citation info see https://github.com/lin-tianyu/Stable-Diffusion-Seg

  20. arXiv:2406.18008  [pdf, other

    cs.IT

    Rate-Distortion-Perception Tradeoff for Gaussian Vector Sources

    Authors: Jingjing Qian, Sadaf Salehkalaibar, Jun Chen, Ashish Khisti, Wei Yu, Wuxian Shi, Yiqun Ge, Wen Tong

    Abstract: This paper studies the rate-distortion-perception (RDP) tradeoff for a Gaussian vector source coding problem where the goal is to compress the multi-component source subject to distortion and perception constraints. The purpose of imposing a perception constraint is to ensure visually pleasing reconstructions. This paper studies this RDP setting with either the Kullback-Leibler (KL) divergence or… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  21. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  22. arXiv:2406.15704  [pdf, other

    cs.CV

    video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

  23. arXiv:2406.12050  [pdf, other

    cs.CL

    Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

    Authors: Zhihan Zhang, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, Meng Jiang

    Abstract: Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation techniques, which is effective for standard single-round question-answering settings. Our work introduces a novel technique aimed at cultivating a deeper under… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.11551  [pdf, other

    cs.CV

    Simple Yet Efficient: Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment

    Authors: Jianan Jiang, Di Wu, Zhilin Jiang, Weiren Yu

    Abstract: Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose a simple yet efficient approach to narrow the gap between the two modes. It mainly facilitate… ▽ More

    Submitted 22 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages,8 figures, 4 tables

  25. arXiv:2406.11507  [pdf, other

    cs.CV

    Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

    Authors: Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

    Abstract: Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industrial Informatics

  26. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  27. arXiv:2406.09742  [pdf, other

    cs.IR

    IFA: Interaction Fidelity Attention for Entire Lifelong Behaviour Sequence Modeling

    Authors: Wenhui Yu, Chao Feng, Yanze Zhang, Lantao Hu, Peng Jiang, Han Li

    Abstract: The lifelong user behavior sequence provides abundant information of user preference and gains impressive improvement in the recommendation task, however increases computational consumption significantly. To meet the severe latency requirement in online service, a short sub-sequence is sampled based on similarity to the target item. Unfortunately, items not in the sub-sequence are abandoned, leadi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures

  28. arXiv:2406.09295  [pdf, other

    cs.CL cs.CV

    AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

    Authors: Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

    Abstract: Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically des… ▽ More

    Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  29. arXiv:2406.09166  [pdf, other

    cs.CV cs.AI

    Fine-Grained Domain Generalization with Feature Structuralization

    Authors: Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu

    Abstract: Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distributi… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  30. arXiv:2406.07914  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Understand Spatial Audio?

    Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  31. arXiv:2406.07333  [pdf, other

    cs.CV

    Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection

    Authors: Haiming Yao, Wei Luo, Yunkang Cao, Yiheng Zhang, Wenyong Yu, Weiming Shen

    Abstract: Texture surface anomaly detection finds widespread applications in industrial settings. However, existing methods often necessitate gathering numerous samples for model training. Moreover, they predominantly operate within a close-set detection framework, limiting their ability to identify anomalies beyond the training dataset. To tackle these challenges, this paper introduces a novel zero-shot te… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SUBMISSION TO IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

  32. arXiv:2406.06420  [pdf, other

    cs.LG

    An Improved Empirical Fisher Approximation for Natural Gradient Descent

    Authors: Xiaodong Wu, Wenyi Yu, Chao Zhang, Philip Woodland

    Abstract: Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementati… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 33 pages, 11 figures, 7 tables

  33. arXiv:2406.05491  [pdf, other

    cs.CV cs.CR

    One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

    Authors: Hao Fang, Jiawei Kong, Wenbo Yu, Bin Chen, Jiawei Li, Shutao Xia, Ke Xu

    Abstract: Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  34. arXiv:2406.04649  [pdf, other

    cs.CV

    SMART: Scene-motion-aware human action recognition framework for mental disorder group

    Authors: Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Qi Wu, Zhen Sun, Wenxian Yu, Ling Pei

    Abstract: Patients with mental disorders often exhibit risky abnormal actions, such as climbing walls or hitting windows, necessitating intelligent video behavior monitoring for smart healthcare with the rising Internet of Things (IoT) technology. However, the development of vision-based Human Action Recognition (HAR) for these actions is hindered by the lack of specialized algorithms and datasets. In this… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  35. arXiv:2406.01549  [pdf, other

    cs.CL cs.AI

    An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation

    Authors: Kun Zhu, Xiaocheng Feng, Xiyuan Du, Yuxuan Gu, Weijiang Yu, Haotian Wang, Qianglong Chen, Zheng Chu, Jingchang Chen, Bing Qin

    Abstract: Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottlenec… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  36. arXiv:2406.01152  [pdf, other

    cs.RO

    Learning-based legged locomotion; state of the art and future perspectives

    Authors: Sehoon Ha, Joonho Lee, Michiel van de Panne, Zhaoming Xie, Wenhao Yu, Majid Khadiv

    Abstract: Legged locomotion holds the premise of universal mobility, a critical capability for many real-world robotic applications. Both model-based and learning-based approaches have advanced the field of legged locomotion in the past three decades. In recent years, however, a number of factors have dramatically accelerated progress in learning-based methods, including the rise of deep learning, rapid pro… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  37. arXiv:2406.00630  [pdf, other

    stat.ML cs.LG

    On Non-asymptotic Theory of Recurrent Neural Networks in Temporal Point Processes

    Authors: Zhiheng Chen, Guanhua Fang, Wen Yu

    Abstract: Temporal point process (TPP) is an important tool for modeling and predicting irregularly timed events across various domains. Recently, the recurrent neural network (RNN)-based TPPs have shown practical advantages over traditional parametric TPP models. However, in the current literature, it remains nascent in understanding neural TPPs from theoretical viewpoints. In this paper, we establish the… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  38. arXiv:2406.00276  [pdf

    cs.LG cs.AI cs.CE physics.data-an

    Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

    Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

    Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    ACM Class: J.2; G.3

  39. arXiv:2405.20868  [pdf, other

    cs.CV cs.CY

    Responsible AI for Earth Observation

    Authors: Pedram Ghamisi, Weikang Yu, Andrea Marinoni, Caroline M. Gevaert, Claudio Persello, Sivasakthy Selvakumaran, Manuela Girotto, Benjamin P. Horton, Philippe Rufin, Patrick Hostert, Fabio Pacifici, Peter M. Atkinson

    Abstract: The convergence of artificial intelligence (AI) and Earth observation (EO) technologies has brought geoscience and remote sensing into an era of unparalleled capabilities. AI's transformative impact on data analysis, particularly derived from EO platforms, holds great promise in addressing global challenges such as environmental monitoring, disaster response and climate change analysis. However, t… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  40. arXiv:2405.20725  [pdf, other

    cs.AI cs.CV

    GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search

    Authors: Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu

    Abstract: Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, rese… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  41. arXiv:2405.20224  [pdf, other

    cs.CV

    EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

    Authors: Wangbo Yu, Chaoran Feng, Jiye Tang, Xu Jia, Li Yuan, Yonghong Tian

    Abstract: 3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis. However, its training heavily depends on high-quality, sharp images and accurate camera poses. Fulfilling these requirements can be challenging in non-ideal real-world scenarios, where motion-blurred images are commonly encountered in high-speed moving cameras or low-light e… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project Page: https://drexubery.github.io/EvaGaussians/

  42. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  43. arXiv:2405.19444  [pdf, other

    cs.AI

    MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

    Authors: Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang, Xiangliang Zhang, Dong Yu

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires multi turn or interactive information exchanges, and the performance of LLMs on these tasks is still underexplored. This paper introduces MathChat, a… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  44. arXiv:2405.18966  [pdf, other

    cs.MS

    svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition

    Authors: Xu Feng, Wenjian Yu, Yuyang Xie

    Abstract: This article presents svds-C, an open-source and high-performance C program for accurately and robustly computing truncated SVD, e.g. computing several largest singular values and corresponding singular vectors. We have re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS and multi-thread computing to obtain the parallel program named svds-C. svds-C running on shared-memory… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 20 pages, accepted by SoftwareX

  45. arXiv:2405.14691  [pdf, other

    cs.AI cs.MA

    CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System

    Authors: Qinghua Guan, Jinhui Ouyang, Di Wu, Weiren Yu

    Abstract: The spatiotemporal data generated by massive sensors in the Internet of Things (IoT) is extremely dynamic, heterogeneous, large scale and time-dependent. It poses great challenges (e.g. accuracy, reliability, and stability) in real-time analysis and decision making for different IoT applications. The complexity of IoT data prevents the common people from gaining a deeper understanding of it. Agent… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  46. arXiv:2405.13427  [pdf, ps, other

    cs.LG

    Adaptive Fuzzy C-Means with Graph Embedding

    Authors: Qiang Chen, Weizhong Yu, Feiping Nie, Xuelong Li

    Abstract: Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods. However, for almost all existing FCM based methods, how to automatically selecting proper membership degree hyper-parameter values remains a challenging and unsolved problem. Mixture model based methods, while circumventing the difficulty of manually adjus… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  47. arXiv:2405.09113  [pdf, ps, other

    cs.LG

    Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

    Authors: Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson

    Abstract: Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which effectively jailbreaks several open-source LLMs. Our approach relaxes the discrete jailbreak optimization into a continuous optimization and prog… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  48. arXiv:2405.08301  [pdf, ps, other

    cs.IT

    Coded Downlink Massive Random Access and a Finite de Finetti Theorem

    Authors: Ryan Song, Kareem M. Attiah, Wei Yu

    Abstract: This paper considers a massive connectivity setting in which a base-station (BS) aims to communicate sources $(X_1,\cdots,X_k)$ to a randomly activated subset of $k$ users, among a large pool of $n$ users, via a common downlink message. Although the identities of the $k$ active users are assumed to be known at the BS, each active user only knows whether itself is active and does not know the ident… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 14 Pages, submitted to IEEE Transactions on Information Theory

  49. arXiv:2405.07992  [pdf, other

    cs.CV cs.AI cs.LG

    MambaOut: Do We Really Need Mamba for Vision?

    Authors: Weihao Yu, Xinchao Wang

    Abstract: Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conce… ▽ More

    Submitted 20 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/yuweihao/MambaOut

  50. arXiv:2405.07404  [pdf

    cs.LG cs.AI

    Indoor PM2.5 forecasting and the association with outdoor air pollution: a modelling study based on sensor data in Australia

    Authors: Wenhua Yu, Bahareh Nakisa, Seng W. Loke, Svetlana Stevanovic, Yuming Guo, Mohammad Naim Rastgoo

    Abstract: Exposure to poor indoor air quality poses significant health risks, necessitating thorough assessment to mitigate associated dangers. This study aims to predict hourly indoor fine particulate matter (PM2.5) concentrations and investigate their correlation with outdoor PM2.5 levels across 24 distinct buildings in Australia. Indoor air quality data were gathered from 91 monitoring sensors in eight A… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.