Zum Hauptinhalt springen

Showing 1–50 of 29,176 results for author: Yang

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17285  [pdf, other

    cs.CR cs.LG

    Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution

    Authors: Yixin Wu, Yun Shen, Michael Backes, Yang Zhang

    Abstract: Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns such as safety. Improvements in image quality are straightforward to assess. However, how model updates resolve existing concerns and whether they raise new questions remain unexplored. This study takes an initial step in investigating the evolution of text-to-image models f… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: To Appear in the ACM Conference on Computer and Communications Security, October 14-18, 2024

  2. arXiv:2408.17267  [pdf, other

    cs.CV cs.AI

    UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

    Authors: Baichuan Zhou, Haote Yang, Dairong Chen, Junyan Ye, Tianyi Bai, Jinhua Yu, Songyang Zhang, Dahua Lin, Conghui He, Weijia Li

    Abstract: Recent evaluations of Large Multimodal Models (LMMs) have explored their capabilities in various domains, with only few benchmarks specifically focusing on urban environments. Moreover, existing urban benchmarks have been limited to evaluating LMMs with basic region-level urban tasks under singular views, leading to incomplete evaluations of LMMs' abilities in urban environments. To address these… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  3. arXiv:2408.17214  [pdf, other

    cs.IR

    Efficient Multi-task Prompt Tuning for Recommendation

    Authors: Ting Bai, Le Huang, Yue Yu, Cheng Yang, Cheng Hou, Zhe Zhao, Chuan Shi

    Abstract: With the expansion of business scenarios, real recommender systems are facing challenges in dealing with the constantly emerging new tasks in multi-task learning frameworks. In this paper, we attempt to improve the generalization ability of multi-task recommendations when dealing with new tasks. We find that joint training will enhance the performance of the new task but always negatively impact e… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  4. arXiv:2408.17168  [pdf, other

    cs.CV

    EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

    Authors: Zhen Fan, Peng Dai, Zhuo Su, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang

    Abstract: Egocentric human pose estimation (HPE) using wearable sensors is essential for VR/AR applications. Most methods rely solely on either egocentric-view images or sparse Inertial Measurement Unit (IMU) signals, leading to inaccuracies due to self-occlusion in images or the sparseness and drift of inertial sensors. Most importantly, the lack of real-world datasets containing both modalities is a major… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  5. arXiv:2408.17121  [pdf, other

    cs.CR

    Traceable AI-driven Avatars Using Multi-factors of Physical World and Metaverse

    Authors: Kedi Yang, Zhenyong Zhang, Youliang Tian

    Abstract: Metaverse allows users to delegate their AI models to an AI engine, which builds corresponding AI-driven avatars to provide immersive experience for other users. Since current authentication methods mainly focus on human-driven avatars and ignore the traceability of AI-driven avatars, attackers may delegate the AI models of a target user to an AI proxy program to perform impersonation attacks with… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 15 pages, 21 figures

  6. arXiv:2408.17115  [pdf

    cs.CV

    Multi-centric AI Model for Unruptured Intracranial Aneurysm Detection and Volumetric Segmentation in 3D TOF-MRI

    Authors: Ashraya K. Indrakanti, Jakob Wasserthal, Martin Segeroth, Shan Yang, Victor Schulze-Zachau, Joshy Cyriac, Michael Bach, Marios Psychogios, Matthias A. Mutke

    Abstract: Purpose: To develop an open-source nnU-Net-based AI model for combined detection and segmentation of unruptured intracranial aneurysms (UICA) in 3D TOF-MRI, and compare models trained on datasets with aneurysm-like differential diagnoses. Methods: This retrospective study (2020-2023) included 385 anonymized 3D TOF-MRI images from 364 patients (mean age 59 years, 60% female) at multiple centers plu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 14 pages, 5 figures, 3 tables, 2 supplementary tables

    ACM Class: I.4.6

  7. arXiv:2408.17054  [pdf

    cs.CV

    BTMuda: A Bi-level Multi-source unsupervised domain adaptation framework for breast cancer diagnosis

    Authors: Yuxiang Yang, Xinyi Zeng, Pinxian Zeng, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Deep learning has revolutionized the early detection of breast cancer, resulting in a significant decrease in mortality rates. However, difficulties in obtaining annotations and huge variations in distribution between training sets and real scenes have limited their clinical applications. To address these limitations, unsupervised domain adaptation (UDA) methods have been used to transfer knowledg… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  8. arXiv:2408.17047  [pdf, other

    cs.NI

    PIB: Prioritized Information Bottleneck Framework for Collaborative Edge Video Analytics

    Authors: Zhengru Fang, Senkang Hu, Liyan Yang, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Collaborative edge sensing systems, particularly in collaborative perception systems in autonomous driving, can significantly enhance tracking accuracy and reduce blind spots with multi-view sensing capabilities. However, their limited channel capacity and the redundancy in sensory data pose significant challenges, affecting the performance of collaborative inference tasks. To tackle these issues,… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by Globecom 2024. Code will be available at https://github.com/fangzr/PIB-Prioritized-Information-Bottleneck-Framework

  9. arXiv:2408.17016  [pdf, other

    cs.LG stat.AP stat.ML

    Error-controlled non-additive interaction discovery in machine learning models

    Authors: Winston Chen, Yifan Jiang, William Stafford Noble, Yang Young Lu

    Abstract: Machine learning (ML) models are powerful tools for detecting complex patterns within data, yet their "black box" nature limits their interpretability, hindering their use in critical domains like healthcare and finance. To address this challenge, interpretable ML methods have been developed to explain how features influence model predictions. However, these methods often focus on univariate featu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  10. arXiv:2408.16975  [pdf, other

    q-bio.BM cs.AI cs.LG

    Technical Report of HelixFold3 for Biomolecular Structure Prediction

    Authors: Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang

    Abstract: The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predicti… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  11. arXiv:2408.16760  [pdf, other

    cs.CV

    OmniRe: Omni Urban Scene Reconstruction

    Authors: Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, Yue Wang

    Abstract: We introduce OmniRe, a holistic approach for efficiently reconstructing high-fidelity dynamic urban scenes from on-device logs. Recent methods for modeling driving sequences using neural radiance fields or Gaussian Splatting have demonstrated the potential of reconstructing challenging dynamic scenes, but often overlook pedestrians and other non-vehicle dynamic actors, hindering a complete pipelin… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: See the project page for code, video results and demos: https://ziyc.github.io/omnire/

  12. arXiv:2408.16751  [pdf, other

    cs.CL cs.LG stat.ML

    A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models

    Authors: Yi-Lin Tuan, William Yang Wang

    Abstract: Beyond maximum likelihood estimation (MLE), the standard objective of a language model (LM) that optimizes good examples probabilities, many studies have explored ways that also penalize bad examples for enhancing the quality of output distribution, including unlikelihood training, exponential maximizing average treatment effect (ExMATE), and direct preference optimization (DPO). To systematically… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  13. arXiv:2408.16633  [pdf

    cs.RO cs.AI

    Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning

    Authors: Keqin Li, Jin Wang, Xubo Wu, Xirui Peng, Runmian Chang, Xiaoyu Deng, Yiwen Kang, Yue Yang, Fanghao Ni, Bo Hong

    Abstract: With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep learning and reinforcement learning technologies to enhance picking efficiency and accuracy while reducing system failure rates. Through empirical analysis, we demonstrate the effectiveness of these technologies… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  14. arXiv:2408.16577  [pdf, other

    cs.LG cs.AI

    Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning

    Authors: Boyu Chen, Junjie Liu, Zhu Li, Mengyue yang

    Abstract: Learning representations with a high Probability of Necessary and Sufficient Causes (PNS) has been shown to enhance deep learning models' ability. This task involves identifying causal features that are both sufficient (guaranteeing the outcome) and necessary (without which the outcome cannot occur). However, current research predominantly focuses on unimodal data, and extending PNS learning to mu… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  15. arXiv:2408.16564  [pdf, other

    cs.MM cs.SD eess.AS

    Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing

    Authors: Qianhui Liu, Jiadong Wang, Yang Wang, Xin Yang, Gang Pan, Haizhou Li

    Abstract: Humans naturally perform audiovisual speech recognition (AVSR), enhancing the accuracy and robustness by integrating auditory and visual information. Spiking neural networks (SNNs), which mimic the brain's information-processing mechanisms, are well-suited for emulating the human capability of AVSR. Despite their potential, research on SNNs for AVSR is scarce, with most existing audio-visual multi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  16. arXiv:2408.16540  [pdf, other

    cs.CV

    GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

    Authors: Xiangchen Yin, Donglin Di, Lei Fan, Hao Li, Chen Wei, Xiaofei Gou, Yang Song, Xiao Sun, Xun Yang

    Abstract: Recent methods using diffusion models have made significant progress in human image generation with various additional controls such as pose priors. However, existing approaches still struggle to generate high-quality images with consistent pose alignment, resulting in unsatisfactory outputs. In this paper, we propose a framework delving into the graph relations of pose priors to provide control i… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: The code will be released at https://github.com/XiangchenYin/GRPose

  17. arXiv:2408.16532  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.SP

    WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

    Authors: Shengpeng Ji, Ziyue Jiang, Xize Cheng, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao

    Abstract: Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domai… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Working in progress. arXiv admin note: text overlap with arXiv:2402.12208

  18. arXiv:2408.16517  [pdf, other

    cs.LG cs.AI

    Adaptive Variational Continual Learning via Task-Heuristic Modelling

    Authors: Fan Yang

    Abstract: Variational continual learning (VCL) is a turn-key learning algorithm that has state-of-the-art performance among the best continual learning models. In our work, we explore an extension of the generalized variational continual learning (GVCL) model, named AutoVCL, which combines task heuristics for informed learning and model optimization. We demonstrate that our model outperforms the standard GV… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 4 pages, 2 figures, 3 tables

  19. CanCal: Towards Real-time and Lightweight Ransomware Detection and Response in Industrial Environments

    Authors: Shenao Wang, Feng Dong, Hangfeng Yang, Jingheng Xu, Haoyu Wang

    Abstract: Ransomware attacks have emerged as one of the most significant cybersecurity threats. Despite numerous proposed detection and defense methods, existing approaches face two fundamental limitations in large-scale industrial applications: intolerable system overheads and notorious alert fatigue. To address these challenges, we propose CanCal, a real-time and lightweight ransomware detection system. S… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: To appear in the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS'24), October 14--18, 2024, Salt Lake City

  20. arXiv:2408.16506  [pdf, other

    cs.CV

    Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

    Authors: Xiaoyu Jin, Zunnan Xu, Mingwen Ou, Wenming Yang

    Abstract: Character animation is a transformative field in computer graphics and vision, enabling dynamic and realistic video animations from static images. Despite advancements, maintaining appearance consistency in animations remains a challenge. Our approach addresses this by introducing a training-free framework that ensures the generated video sequence preserves the reference image's subtleties, such a… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: CVG@ICML 2024

  21. arXiv:2408.16500  [pdf, other

    cs.CV

    CogVLM2: Visual Language Models for Image and Video Understanding

    Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  22. arXiv:2408.16478  [pdf, other

    cs.CV

    MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation

    Authors: Linyan Yang, Lukas Hoyer, Mark Weber, Tobias Fischer, Dengxin Dai, Laura Leal-Taixé, Marc Pollefeys, Daniel Cremers, Luc Van Gool

    Abstract: Unsupervised Domain Adaptation (UDA) is the task of bridging the domain gap between a labeled source domain, e.g., synthetic data, and an unlabeled target domain. We observe that current UDA methods show inferior results on fine structures and tend to oversegment objects with ambiguous appearance. To address these shortcomings, we propose to leverage geometric information, i.e., depth predictions,… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  23. arXiv:2408.16469  [pdf, other

    cs.CV

    Multi-source Domain Adaptation for Panoramic Semantic Segmentation

    Authors: Jing Jiang, Sicheng Zhao, Jiankun Zhu, Wenbo Tang, Zhaopan Xu, Jidong Yang, Pengfei Xu, Hongxun Yao

    Abstract: Panoramic semantic segmentation has received widespread attention recently due to its comprehensive 360\degree field of view. However, labeling such images demands greater resources compared to pinhole images. As a result, many unsupervised domain adaptation methods for panoramic semantic segmentation have emerged, utilizing real pinhole images or low-cost synthetic panoramic images. But, the segm… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures, 5 tables

  24. arXiv:2408.16431  [pdf, other

    cs.CV

    Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS

    Authors: Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang

    Abstract: Video object segmentation (VOS) is a crucial task in computer vision, but current VOS methods struggle with complex scenes and prolonged object motions. To address these challenges, the MOSE dataset aims to enhance object recognition and differentiation in complex environments, while the LVOS dataset focuses on segmenting objects exhibiting long-term, intricate movements. This report introduces a… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 1st Place Solution for 6th LSVOS VOS Track. arXiv admin note: substantial text overlap with arXiv:2406.04600

  25. arXiv:2408.16375  [pdf, other

    cs.RO

    EasyChauffeur: A Baseline Advancing Simplicity and Efficiency on Waymax

    Authors: Lingyu Xiao, Jiang-Jiang Liu, Xiaoqing Ye, Wankou Yang, Jingdong Wang

    Abstract: Recent advancements in deep-learning-based driving planners have primarily focused on elaborate network engineering, yielding limited improvements. This paper diverges from conventional approaches by exploring three fundamental yet underinvestigated aspects: training policy, data efficiency, and evaluation robustness. We introduce EasyChauffeur, a reproducible and effective planner for both imitat… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  26. arXiv:2408.16357  [pdf, other

    cs.CV

    Law of Vision Representation in MLLMs

    Authors: Shijia Yang, Bohan Zhai, Quanzeng You, Jianbo Yuan, Hongxia Yang, Chenfeng Xu

    Abstract: We present the "Law of Vision Representation" in multimodal large language models (MLLMs). It reveals a strong correlation between the combination of cross-modal alignment, correspondence in vision representation, and MLLM performance. We quantify the two factors using the cross-modal Alignment and Correspondence score (AC score). Through extensive experiments involving thirteen different vision r… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: The code is available at https://github.com/bronyayang/Law_of_Vision_Representation_in_MLLMs

  27. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  28. ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding

    Authors: Minghang Zheng, Jiahua Zhang, Qingchao Chen, Yuxin Peng, Yang Liu

    Abstract: Visual grounding aims to localize the object referred to in an image based on a natural language query. Although progress has been made recently, accurately localizing target objects within multiple-instance distractions (multiple objects of the same category as the target) remains a significant challenge. Existing methods demonstrate a significant performance drop when there are multiple distract… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

    ACM Class: I.2

  29. arXiv:2408.16308  [pdf, other

    cs.SI

    AdaMotif: Graph Simplification via Adaptive Motif Design

    Authors: Hong Zhou, Peifeng Lai, Zhida Sun, Xiangyuan Chen, Yang Chen, Huisi Wu, Yong Wang

    Abstract: With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures vi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  30. arXiv:2408.16236  [pdf, other

    cs.CV

    Neural Spectral Decomposition for Dataset Distillation

    Authors: Shaolei Yang, Shen Cheng, Mingbo Hong, Haoqiang Fan, Xing Wei, Shuaicheng Liu

    Abstract: In this paper, we propose Neural Spectrum Decomposition, a generic decomposition framework for dataset distillation. Unlike previous methods, we consider the entire dataset as a high-dimensional observation that is low-rank across all dimensions. We aim to discover the low-rank representation of the entire dataset and perform distillation efficiently. Toward this end, we learn a set of spectrum te… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  31. arXiv:2408.16219  [pdf, other

    cs.CV

    Training-free Video Temporal Grounding using Large-scale Pre-trained Models

    Authors: Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu

    Abstract: Video temporal grounding aims to identify video segments within untrimmed videos that are most relevant to a given natural language query. Existing video temporal localization models rely on specific datasets for training and have high data collection costs, but they exhibit poor generalization capability under the across-dataset and out-of-distribution (OOD) settings. In this paper, we propose a… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  32. arXiv:2408.16180  [pdf, other

    eess.AS cs.CL cs.SD

    Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

    Authors: Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara

    Abstract: With the strong representational power of large language models (LLMs), generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text u… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: submitted to SLT2024

  33. arXiv:2408.16131  [pdf, other

    cs.CL

    Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark

    Authors: Funing Yang, Carolyn Jane Anderson

    Abstract: Several systems have been developed to extract information about characters to aid computational analysis of English literature. We propose character similarity grouping as a holistic evaluation task for these pipelines. We present AustenAlike, a benchmark suite of character similarities in Jane Austen's novels. Our benchmark draws on three notions of character similarity: a structurally defined n… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  34. arXiv:2408.15991  [pdf, other

    cs.CV

    Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

    Authors: Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

    Abstract: Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into an one-step student generator, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  35. arXiv:2408.15919  [pdf, other

    cs.RO

    DeMoBot: Deformable Mobile Manipulation with Vision-based Sub-goal Retrieval

    Authors: Yuying Zhang, Wenyan Yang, Joni Pajarinen

    Abstract: Imitation learning (IL) algorithms typically distill experience into parametric behavior policies to mimic expert demonstrations. Despite their effectiveness, previous methods often struggle with data efficiency and accurately aligning the current state with expert demonstrations, especially in deformable mobile manipulation tasks characterized by partial observations and dynamic object deformatio… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  36. arXiv:2408.15915  [pdf, other

    cs.CV cs.AI cs.CL

    Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

    Authors: Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yucheng Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu

    Abstract: The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) mode… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 28 pages, 12 tables, 10 figures

  37. arXiv:2408.15813  [pdf, other

    cs.CV

    DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries

    Authors: Yu Yang, Jianbiao Mei, Liang Liu, Siliang Du, Yilin Xiao, Jongwon Ra, Yong Liu, Xiao Xu, Huifeng Wu

    Abstract: LiDAR panoptic segmentation, which jointly performs instance and semantic segmentation for things and stuff classes, plays a fundamental role in LiDAR perception tasks. While most existing methods explicitly separate these two segmentation tasks and utilize different branches (i.e., semantic and instance branches), some recent methods have embraced the query-based paradigm to unify LiDAR panoptic… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures

  38. arXiv:2408.15777  [pdf, other

    cs.CV

    A Survey on Facial Expression Recognition of Static and Dynamic Emotions

    Authors: Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan

    Abstract: Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies. As the FER field evolves from controlled laboratory environments to more complex in-the-wild scenarios, advanced methods have been rapidly developed and new… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  39. arXiv:2408.15750  [pdf, other

    cs.CV

    Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph

    Authors: Zherong Zhang, Chunyu Lin, Shujuan Huang, Shangrong Yang, Yao Zhao

    Abstract: Relative pose estimation is crucial for various computer vision applications, including Robotic and Autonomous Driving. Current methods primarily depend on selecting and matching feature points prone to incorrect matches, leading to poor performance. Consequently, relying solely on point-matching relationships for pose estimation is a huge challenge. To overcome these limitations, we propose a Geo… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  40. arXiv:2408.15710  [pdf, other

    cs.CL

    Conan-embedding: General Text Embedding with More and Better Negative Samples

    Authors: Shiyu Li, Yang Tang, Shizhe Chen, Xi Chen

    Abstract: With the growing popularity of RAG, the capabilities of embedding models are gaining increasing attention. Embedding models are primarily trained through contrastive loss learning, with negative examples being a key component. Previous work has proposed various hard negative mining strategies, but these strategies are typically employed as preprocessing steps. In this paper, we propose the conan-e… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  41. arXiv:2408.15708  [pdf, other

    cs.CV

    Towards Realistic Example-based Modeling via 3D Gaussian Stitching

    Authors: Xinyu Gao, Ziyi Yang, Bingchen Gong, Xiaoguang Han, Sipeng Yang, Xiaogang Jin

    Abstract: Using parts of existing models to rebuild new models, commonly termed as example-based modeling, is a classical methodology in the realm of computer graphics. Previous works mostly focus on shape composition, making them very hard to use for realistic composition of 3D objects captured from real-world scenes. This leads to combining multiple NeRFs into a single 3D scene to achieve seamless appeara… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  42. arXiv:2408.15688  [pdf, other

    cs.IR

    PDSR: A Privacy-Preserving Diversified Service Recommendation Method on Distributed Data

    Authors: Lina Wang, Huan Yang, Yiran Shen, Chao Liu, Lianyong Qi, Xiuzhen Cheng, Feng Li

    Abstract: The last decade has witnessed a tremendous growth of service computing, while efficient service recommendation methods are desired to recommend high-quality services to users. It is well known that collaborative filtering is one of the most popular methods for service recommendation based on QoS, and many existing proposals focus on improving recommendation accuracy, i.e., recommending high-qualit… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  43. arXiv:2408.15592  [pdf, ps, other

    cs.IT

    $r$-Minimal Codes with Respect to Rank Metric

    Authors: Yang Xu, Haibin Kan, Guangyue Han

    Abstract: In this paper, we propose and study $r$-minimal codes, a natural extension of minimal codes which have been extensively studied with respect to Hamming metric, rank metric and sum-rank metric. We first propose $r$-minimal codes in a general setting where the ambient space is a finite dimensional left module over a division ring and is supported on a lattice. We characterize minimal subcodes and… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  44. arXiv:2408.15583  [pdf, other

    cs.CE

    PointEMRay: A Novel Efficient SBR Framework on Point Based Geometry

    Authors: Kaiqiao Yang, Che Liu, Wenming Yu, Tie Jun Cui

    Abstract: The rapid computation of electromagnetic (EM) fields across various scenarios has long been a challenge, primarily due to the need for precise geometric models. The emergence of point cloud data offers a potential solution to this issue. However, the lack of electromagnetic simulation algorithms optimized for point-based models remains a significant limitation. In this study, we propose PointEMRay… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 14 pages, 13 figures, and 2 tables

  45. arXiv:2408.15549  [pdf, other

    cs.CL

    WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

    Authors: Taiwei Shi, Zhuoer Wang, Longqi Yang, Ying-Chun Lin, Zexue He, Mengting Wan, Pei Zhou, Sujay Jauhar, Xiaofeng Xu, Xia Song, Jennifer Neville

    Abstract: As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a n… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 24 pages

  46. arXiv:2408.15428  [pdf, other

    cs.CV

    HEAD: A Bandwidth-Efficient Cooperative Perception Approach for Heterogeneous Connected and Autonomous Vehicles

    Authors: Deyuan Qu, Qi Chen, Yongqi Zhu, Yihao Zhu, Sergei S. Avedisov, Song Fu, Qing Yang

    Abstract: In cooperative perception studies, there is often a trade-off between communication bandwidth and perception performance. While current feature fusion solutions are known for their excellent object detection performance, transmitting the entire sets of intermediate feature maps requires substantial bandwidth. Furthermore, these fusion approaches are typically limited to vehicles that use identical… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024 Workshop

  47. arXiv:2408.15281  [pdf

    cs.CR cs.CV

    NeR-VCP: A Video Content Protection Method Based on Implicit Neural Representation

    Authors: Yangping Lin, Yan Ke, Ke Niu, Jia Liu, Xiaoyuan Yang

    Abstract: With the popularity of video applications, the security of video content has emerged as a pressing issue that demands urgent attention. Most video content protection methods mainly rely on encryption technology, which needs to be manually designed or implemented in an experience-based manner. To address this problem, we propose an automatic encryption technique for video content protection based o… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  48. arXiv:2408.15260  [pdf

    cs.HC cs.CY stat.ME

    Artificial Data, Real Insights: Evaluating Opportunities and Risks of Expanding the Data Ecosystem with Synthetic Data

    Authors: Richard Timpone, Yongwei Yang

    Abstract: Synthetic Data is not new, but recent advances in Generative AI have raised interest in expanding the research toolbox, creating new opportunities and risks. This article provides a taxonomy of the full breadth of the Synthetic Data domain. We discuss its place in the research ecosystem by linking the advances in computational social science with the idea of the Fourth Paradigm of scientific disco… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 38 pages, 10 figures: originally prepared for the 2024 International Conference for Computational Social Science

  49. arXiv:2408.15257  [pdf

    cs.CL cs.AI

    Text classification optimization algorithm based on graph neural network

    Authors: Erdi Gao, Haowei Yang, Dan Sun, Haohao Xia, Yuhan Ma, Yuanjing Zhu

    Abstract: In the field of natural language processing, text classification, as a basic task, has important research value and application prospects. Traditional text classification methods usually rely on feature representations such as the bag of words model or TF-IDF, which overlook the semantic connections between words and make it challenging to grasp the deep structural details of the text. Recently, G… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.17460 by other authors

  50. arXiv:2408.15241  [pdf, other

    cs.CV

    GenRec: Unifying Video Generation and Recognition with Diffusion Models

    Authors: Zejia Weng, Xitong Yang, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. Building upon Stable Video Diffusion, we introduce GenRec, the first unified… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 17 pages, 6 figures, 7 tables