Skip to main content

Showing 1–50 of 1,374 results for author: Wang, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13362  [pdf, other

    cs.CV

    Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

    Authors: Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

    Abstract: The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation. However, the existing distillation-based 3D scene understanding approaches rely on the representation capacity of 2D models, disregar… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13331  [pdf, other

    cs.LG

    Reconstruct the Pruned Model without Any Retraining

    Authors: Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang

    Abstract: Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while usi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 18 pages

  3. arXiv:2407.13113  [pdf, other

    cs.AI

    Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II

    Authors: Rixin Wu, Ran Wang, Jie Hao, Qiang Wu, Ping Wang, Dusit Niyato

    Abstract: This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW), aiming to use a single deep reinforcement learning (DRL) model to solve the entire multiobjective optimization problem. The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes pr… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 13 pages; Under Review; Submitted to IEEE Transactions on Intelligent Transportation Systems

  4. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.11946  [pdf, other

    cs.CV

    Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

    Authors: Ping Wang, Yulun Zhang, Lishun Wang, Xin Yuan

    Abstract: Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstr… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  6. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  7. arXiv:2407.10233  [pdf, other

    cs.CV cs.AI

    Visual Prompt Selection for In-Context Learning Segmentation

    Authors: Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang

    Abstract: As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level. Recently, inspired by In-Context Learning (ICL), several generalist segmentation frameworks have been proposed, providing a promising paradigm for segmenting specific objects. However, existing works mostly ignore the value of visual promp… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accept by ECCV2024

  8. arXiv:2407.09285  [pdf, other

    cs.CV

    MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

    Authors: Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang

    Abstract: The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Technical report for MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction. arXiv admin note: substantial text overlap with arXiv:2407.01717

  9. arXiv:2407.09051  [pdf, other

    cs.CV

    DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects

    Authors: Peng Wang, Yongcai Wang, Deying Li

    Abstract: Multi-object tracking (MOT) on static platforms, such as by surveillance cameras, has achieved significant progress, with various paradigms providing attractive performances. However, the effectiveness of traditional MOT methods is significantly reduced when it comes to dynamic platforms like drones. This decrease is attributed to the distinctive challenges in the MOT-on-drone scenario: (1) object… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures, ICRA 2024

  10. arXiv:2407.08959  [pdf, other

    cs.CL

    Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

    Authors: Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang

    Abstract: Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, 2 figures, Accepted by IJCAI2024

  11. arXiv:2407.08150  [pdf, other

    cs.CV

    Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

    Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

    Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MULTIMEDIA 2024

  12. arXiv:2407.08130  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning

    Authors: Wenrui Li, Penghong Wang, Ruiqin Xiong, Xiaopeng Fan

    Abstract: The spiking neural networks (SNNs) that efficiently encode temporal sequences have shown great potential in extracting audio-visual joint feature representations. However, coupling SNNs (binary spike sequences) with transformers (float-point sequences) to jointly explore the temporal-semantic information still facing challenges. In this paper, we introduce a novel Spiking Tucker Fusion Transformer… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by TIP

  13. arXiv:2407.07457  [pdf, other

    cs.LG cs.CL

    GLBench: A Comprehensive Benchmark for Graph with Large Language Models

    Authors: Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li

    Abstract: The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.10280 by other authors

  14. arXiv:2407.07174  [pdf, other

    cs.CV

    CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

    Authors: Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang

    Abstract: This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description. This method distinguishes itself from existing strategies, such as MVDiffusion, by eliminating the requirement for predefined camera poses. Instead, our model incorporates a mechanism for predicting homography directly within the multi-view diffusio… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  15. arXiv:2407.06904  [pdf, other

    cs.AI

    Hypergraph based Understanding for Document Semantic Entity Recognition

    Authors: Qiwei Li, Zuchao Li, Ping Wang, Haojun Ai, Hai Zhao

    Abstract: Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  16. arXiv:2407.06043  [pdf, other

    cs.CV

    Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts

    Authors: Puzuo Wang, Wei Yao, Jie Shao, Zhiyi He

    Abstract: Domain adaptation (DA) techniques help deep learning models generalize across data shifts for point cloud semantic segmentation (PCSS). Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training, avoiding privacy issues and large computational resources. We address TTA for geospatial PCSS… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  17. arXiv:2407.05705  [pdf, other

    cs.AI

    Fast and Continual Knowledge Graph Embedding via Incremental LoRA

    Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li

    Abstract: Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant chall… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI2024

  18. arXiv:2407.02896  [pdf, other

    cs.HC cs.CY

    Predicting and Understanding Turn-Taking Behavior in Open-Ended Group Activities in Virtual Reality

    Authors: Portia Wang, Eugy Han, Anna C. M. Queiroz, Cyan DeVeaux, Jeremy N. Bailenson

    Abstract: In networked virtual reality (VR), user behaviors, individual differences, and group dynamics can serve as important signals into future speech behaviors, such as who the next speaker will be and the timing of turn-taking behaviors. The ability to predict and understand these behaviors offers opportunities to provide adaptive and personalized assistance, for example helping users with varying sens… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  19. arXiv:2407.02873  [pdf, other

    cs.RO

    Robot Shape and Location Retention in Video Generation Using Diffusion Models

    Authors: Peng Wang, Zhihao Guo, Abdul Latheef Sait, Minh Huy Pham

    Abstract: Diffusion models have marked a significant milestone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. This developme… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 10 figures

  20. arXiv:2407.02408  [pdf, other

    cs.CL cs.LG

    CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

    Authors: Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li

    Abstract: As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 37 pages, 32 figures

  21. arXiv:2407.02174  [pdf, other

    cs.CV

    BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream

    Authors: Wenpu Li, Pian Wan, Peng Wang, Jinghang Li, Yi Zhou, Peidong Liu

    Abstract: Neural implicit representation of visual scenes has attracted a lot of attention in recent research of computer vision and graphics. Most prior methods focus on how to reconstruct 3D scene representation from a set of images. In this work, we demonstrate the possibility to recover the neural radiance fields (NeRF) from a single blurry image and its corresponding event stream. We model the camera m… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  22. arXiv:2407.00898  [pdf, other

    cs.RO

    Residual-MPPI: Online Policy Customization for Continuous Control

    Authors: Pengcheng Wang, Chenran Li, Catherine Weaver, Kenta Kawamoto, Masayoshi Tomizuka, Chen Tang, Wei Zhan

    Abstract: Policies learned through Reinforcement Learning (RL) and Imitation Learning (IL) have demonstrated significant potential in achieving advanced performance in continuous control tasks. However, in real-world environments, it is often necessary to further customize a trained policy when there are additional requirements that were unforeseen during the original training phase. It is possible to fine-… ▽ More

    Submitted 11 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  23. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  24. arXiv:2406.19143  [pdf, other

    cs.DB cs.DS

    QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

    Authors: Yiyan Qi, Rundong Li, Pinghui Wang, Yufang Sun, Rui Xing

    Abstract: Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element carries a positive weight. Unlike traditional cardinality estimation, limited research exists on weighted cardinality, with current methods requiring substantial m… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures, accepted by KDD 2024

  25. arXiv:2406.16992  [pdf, other

    cs.LG cs.AI

    Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

    Authors: Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang

    Abstract: Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still su… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  26. arXiv:2406.16633  [pdf, other

    cs.CV

    MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

    Authors: Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Jiabin Liu, Changpeng Cai

    Abstract: End-to-end (E2E) training approaches are commonly plagued by high memory consumption, reduced efficiency in training, challenges in model parallelization, and suboptimal biocompatibility. Local learning is considered a novel interactive training method that holds promise as an alternative to E2E. Nonetheless, conventional local learning methods fall short in achieving high model accuracy due to in… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  27. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  28. arXiv:2406.14653  [pdf, other

    cs.RO cs.AI

    LLM Granularity for On-the-Fly Robot Control

    Authors: Peng Wang, Mattia Robbiani, Zhihao Guo

    Abstract: Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assis… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  29. arXiv:2406.14024  [pdf, other

    cs.CL

    LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

    Authors: Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang

    Abstract: Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale la… ▽ More

    Submitted 8 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages

  30. arXiv:2406.12316  [pdf, other

    cs.CV cs.AI cs.MM

    Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

    Authors: Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, Peng Wang

    Abstract: The Visible-Infrared Person Re-identification (VI ReID) aims to match visible and infrared images of the same pedestrians across non-overlapped camera views. These two input modalities contain both invariant information, such as shape, and modality-specific details, such as color. An ideal model should utilize valuable information from both modalities during training for enhanced representational… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepyed by ACM International Conference on Multimedia Retrieval (ICMR'24)

    Journal ref: ICMR'24: Proceedings of the 2024 International Conference on Multimedia Retrieval (2024) 579 - 588

  31. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.10789  [pdf, other

    cs.CV

    Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

    Authors: Zhiwen Fan, Pu Wang, Yang Zhao, Yibo Zhao, Boris Ivanovic, Zhangyang Wang, Marco Pavone, Hao Frank Yang

    Abstract: The increasing rate of road accidents worldwide results not only in significant loss of life but also imposes billions financial burdens on societies. Current research in traffic crash frequency modeling and analysis has predominantly approached the problem as classification tasks, focusing mainly on learning-based classification or ensemble learning methods. These approaches often overlook the in… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  33. arXiv:2406.10708  [pdf, other

    cs.CV cs.DB eess.SP

    MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

    Authors: M. Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Perry Wang, Peizhao Li, Adriano Cardace, Petros Boufounos

    Abstract: Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject s… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 26 pages, 25 figures, 10 tables; See https://doi.org/10.5281/zenodo.12611978 to access the MMVR dataset

  34. arXiv:2406.10276  [pdf, other

    cs.CL cs.SD eess.AS

    Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

    Authors: Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam Subramanian

    Abstract: Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  35. arXiv:2406.09390  [pdf, other

    cs.CV cs.LG

    LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

    Authors: Rajatsubhra Chakraborty, Arkaprava Sinha, Dominick Reilly, Manish Kumar Govind, Pu Wang, Francois Bremond, Srijan Das

    Abstract: Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X,… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  36. arXiv:2406.09161  [pdf, other

    cs.SD eess.AS

    Complex Image-Generative Diffusion Transformer for Audio Denoising

    Authors: Junhui Li, Pu Wang, Jialu Li, Youshan Zhang

    Abstract: The audio denoising technique has captured widespread attention in the deep neural network field. Recently, the audio denoising problem has been converted into an image generation task, and deep learning-based approaches have been applied to tackle this problem. However, its performance is still limited, leaving room for further improvement. In order to enhance audio denoising performance, this pa… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  37. arXiv:2406.09154  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion Gaussian Mixture Audio Denoise

    Authors: Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang

    Abstract: Recent diffusion models have achieved promising performances in audio-denoising tasks. The unique property of the reverse process could recover clean signals. However, the distribution of real-world noises does not comply with a single Gaussian distribution and is even unknown. The sampling of Gaussian noise conditions limits its application scenarios. To overcome these challenges, we propose a Di… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  38. arXiv:2406.09031  [pdf, other

    cs.LG cs.AI

    A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability

    Authors: Pengyun Wang, Junyu Luo, Yanxin Shen, Siyu Heng, Xiao Luo

    Abstract: Graph pooling has gained attention for its ability to obtain effective node and graph representations for various downstream tasks. Despite the recent surge in graph pooling approaches, there is a lack of standardized experimental settings and fair benchmarks to evaluate their performance. To address this issue, we have constructed a comprehensive benchmark that includes 15 graph pooling methods a… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  39. arXiv:2406.07537  [pdf, other

    cs.CV

    Autoregressive Pretraining with Mamba in Vision

    Authors: Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

    Abstract: The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structur… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  40. arXiv:2406.07177  [pdf, other

    cs.LG

    TernaryLLM: Ternarized Large Language Model

    Authors: Tianqi Chen, Zhe Li, Weixiang Xu, Zeyu Zhu, Dong Li, Lu Tian, Emad Barsoum, Peisong Wang, Jian Cheng

    Abstract: Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming fr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  41. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  42. arXiv:2406.06644  [pdf, other

    cs.LG cs.AI

    Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises

    Authors: Jianhua Pei, Cheng Feng, Ping Wang, Hina Tabassum, Dongyuan Shi

    Abstract: Semantic communication (SemCom) has emerged as a new paradigm for 6G communication, with deep learning (DL) models being one of the key drives to shift from the accuracy of bit/symbol to the semantics and pragmatics of data. Nevertheless, DL-based SemCom systems often face performance bottlenecks due to overfitting, poor generalization, and sensitivity to outliers. Furthermore, the varying-fading… ▽ More

    Submitted 24 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  43. arXiv:2406.05822  [pdf, other

    cs.LG stat.ML

    Symmetric Matrix Completion with ReLU Sampling

    Authors: Huikang Liu, Peng Wang, Longxiu Huang, Qing Qu, Laura Balzano

    Abstract: We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with deterministic entry-dependent sampling. In particular, we consider rectified linear unit (ReLU) sampling, where only positive entries are observed, as well as a generalization to threshold-based sampling. We first empirically demonstrate that the landscape of this MC problem is not globally benign: Gradie… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 39 pages, 9 figures; This work has been accepted for publication in the Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  44. arXiv:2406.05755  [pdf, other

    cs.CV

    A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

    Authors: Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of appl… ▽ More

    Submitted 15 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: The article is accepted by IEEE Transactions on Geoscience and Remote Sensing. Our code will be available at https://github.com/hoiliu-0801/DNTR

  45. arXiv:2406.05658  [pdf, other

    cs.CV cs.AI

    Visual Prompt Tuning in Null Space for Continual Learning

    Authors: Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang

    Abstract: Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features, so as to ensure no interference on tasks that have been learned to… ▽ More

    Submitted 10 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 20 pages, 10 figures

  46. arXiv:2406.04112  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

    Authors: Can Yaras, Peng Wang, Laura Balzano, Qing Qu

    Abstract: While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that by leveraging the inherent low-dimensional structures of data and compressible dynamics within the model parameters, we can reap the benefits of overparameterization without the… ▽ More

    Submitted 9 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML'24 (Oral)

  47. arXiv:2406.03505  [pdf, other

    cs.LG cs.AI

    Dynamic and Adaptive Feature Generation with LLM

    Authors: Xinhao Zhang, Jinghan Zhang, Banafsheh Rekabdar, Yuanchun Zhou, Pengfei Wang, Kunpeng Liu

    Abstract: The representation of feature space is a crucial environment where data points get vectorized and embedded for upcoming modeling. Thus the efficacy of machine learning (ML) algorithms is closely related to the quality of feature engineering. As one of the most important techniques, feature generation transforms raw data into an optimized feature space conducive to model training and further refine… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  48. arXiv:2406.03019  [pdf, other

    cs.CV

    Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

    Authors: Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu

    Abstract: Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters throug… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICDAR 2024

  49. arXiv:2406.01909  [pdf, other

    cs.LG

    A Global Geometric Analysis of Maximal Coding Rate Reduction

    Authors: Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma

    Abstract: The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape h… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 43 pages, 9 figures. This work has been accepted for publication in the Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  50. arXiv:2406.01392  [pdf, other

    cs.CL

    Sparsity-Accelerated Training for Large Language Models

    Authors: Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu

    Abstract: Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this trai… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings