Zum Hauptinhalt springen

Showing 1–50 of 19,698 results for author: Wu

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.08273  [pdf, other

    cs.RO cs.AI cs.CV

    Hand-Object Interaction Pretraining from Videos

    Authors: Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik

    Abstract: We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic ba… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  2. arXiv:2409.08240  [pdf, other

    cs.CV cs.AI

    IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

    Authors: Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang

    Abstract: While Text-to-Image (T2I) diffusion models excel at generating visually appealing images of individual instances, they struggle to accurately position and control the features generation of multiple instances. The Layout-to-Image (L2I) task was introduced to address the positioning challenges by incorporating bounding boxes as spatial control signals, but it still falls short in generating precise… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  3. arXiv:2409.08207  [pdf, other

    cs.CV

    VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

    Authors: Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao

    Abstract: Recently, methods like Zero-1-2-3 have focused on single-view based 3D reconstruction and have achieved remarkable success. However, their predictions for unseen areas heavily rely on the inductive bias of large-scale pretrained diffusion models. Although subsequent work, such as DreamComposer, attempts to make predictions more controllable by incorporating additional views, the results remain unr… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  4. arXiv:2409.08202  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    What Makes a Maze Look Like a Maze?

    Authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, Jiajun Wu

    Abstract: A unique aspect of human visual understanding is the ability to flexibly interpret abstract concepts: acquiring lifted rules explaining what they symbolize, grounding them across familiar and unfamiliar contexts, and making predictions or reasoning about them. While off-the-shelf vision-language models excel at making literal interpretations of images (e.g., recognizing object categories such as t… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  5. arXiv:2409.08114  [pdf, other

    cs.IT

    Linear Complementary Dual Codes Constructed from Reinforcement Learning

    Authors: Yansheng Wu, Jin Ma, Shandong Yang

    Abstract: Recently, Linear Complementary Dual (LCD) codes have garnered substantial interest within coding theory research due to their diverse applications and favorable attributes. This paper directs its attention to the construction of binary and ternary LCD codes leveraging curiosity-driven reinforcement learning (RL). By establishing reward and devising well-reasoned mappings from actions to states, it… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: 17 pages, just accepted by JSSC

  6. arXiv:2409.07994  [pdf, other

    cs.NI

    Directional WPT Charging for Routing-Asymmetric WRSNs with a Mobile Charger

    Authors: Zhenguo Gao, Qi Zhang, Qingyu Gao, Yunlong Zhao, Hsiao-Chun Wu

    Abstract: Mobile Charge Scheduling for wirelessly charging nodes in Wireless Rechargeable Sensor Networks (WRSNs) is a promising but still evolving research area. Existing research mostly assumes a symmetric environment, where the routing costs in opposite directions between two locations are considered identical. However, various factors such as terrain restrictions and wind or water flows may invalidate t… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: 15 pages, 5 figures

  7. arXiv:2409.07972  [pdf, other

    cs.CV

    Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

    Authors: Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang

    Abstract: The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  8. arXiv:2409.07966  [pdf, other

    cs.CV cs.AI

    ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE

    Authors: Sichun Wu, Kazi Injamamul Haque, Zerrin Yumak

    Abstract: Audio-driven 3D facial animation synthesis has been an active field of research with attention from both academia and industry. While there are promising results in this area, recent approaches largely focus on lip-sync and identity control, neglecting the role of emotions and emotion control in the generative process. That is mainly due to the lack of emotionally rich facial animation data and al… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: 14 pages, 9 figures, 3 tables. Includes code. Accepted at ACM SIGGRAPH MIG 2024

  9. arXiv:2409.07964  [pdf, other

    cs.NI cs.AI cs.LG

    WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

    Authors: Jingwen Tong, Jiawei Shao, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang

    Abstract: Wireless networks are increasingly facing challenges due to their expanding scale and complexity. These challenges underscore the need for advanced AI-driven strategies, particularly in the upcoming 6G networks. In this article, we introduce WirelessAgent, a novel approach leveraging large language models (LLMs) to develop AI agents capable of managing complex tasks in wireless networks. It can ef… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  10. arXiv:2409.07946  [pdf, ps, other

    cs.IR

    Collaborative Automatic Modulation Classification via Deep Edge Inference for Hierarchical Cognitive Radio Networks

    Authors: Chaowei He, Peihao Dong, Fuhui Zhou, Qihui Wu

    Abstract: In hierarchical cognitive radio networks, edge or cloud servers utilize the data collected by edge devices for modulation classification, which, however, is faced with problems of the transmission overhead, data privacy, and computation load. In this article, an edge learning (EL) based framework jointly mobilizing the edge device and the edge server for intelligent co-inference is proposed to rea… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.20772

  11. arXiv:2409.07843  [pdf, other

    cs.CV cs.RO

    Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes

    Authors: Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li

    Abstract: Omnidirectional Depth Estimation has broad application prospects in fields such as robotic navigation and autonomous driving. In this paper, we propose a robotic prototype system and corresponding algorithm designed to validate omnidirectional depth estimation for navigation and obstacle avoidance in real-world scenarios for both robots and vehicles. The proposed HexaMODE system captures 360… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  12. arXiv:2409.07832  [pdf, other

    cs.AR cs.LG

    Efficient and Reliable Vector Similarity Search Using Asymmetric Encoding with NAND-Flash for Many-Class Few-Shot Learning

    Authors: Hao-Wei Chiang, Chi-Tse Huang, Hsiang-Yun Cheng, Po-Hao Tseng, Ming-Hsiu Lee, An-Yeu, Wu

    Abstract: While memory-augmented neural networks (MANNs) offer an effective solution for few-shot learning (FSL) by integrating deep neural networks with external memory, the capacity requirements and energy overhead of data movement become enormous due to the large number of support vectors in many-class FSL scenarios. Various in-memory search solutions have emerged to improve the energy efficiency of MANN… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  13. arXiv:2409.07825  [pdf, other

    cs.CV cs.AI cs.LG

    A Comprehensive Survey on Deep Multimodal Learning with Missing Modality

    Authors: Renjie Wu, Hu Wang, Hsiang-Ting Chen

    Abstract: During multimodal model training and reasoning, data samples may miss certain modalities and lead to compromised model performance due to sensor limitations, cost constraints, privacy concerns, data loss, and temporal and spatial factors. This survey provides an overview of recent progress in Multimodal Learning with Missing Modality (MLMM), focusing on deep learning techniques. It is the first co… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: Work in progress and welcome to discussion

  14. arXiv:2409.07488  [pdf, other

    eess.SP cs.LG

    Contrastive Learning-based User Identification with Limited Data on Smart Textiles

    Authors: Yunkang Zhang, Ziyu Wu, Zhen Liang, Fangting Xie, Quan Wan, Mingjie Zhao, Xiaohui Cai

    Abstract: Pressure-sensitive smart textiles are widely applied in the fields of healthcare, sports monitoring, and intelligent homes. The integration of devices embedded with pressure sensing arrays is expected to enable comprehensive scene coverage and multi-device integration. However, the implementation of identity recognition, a fundamental function in this context, relies on extensive device-specific d… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  15. arXiv:2409.07454  [pdf, other

    cs.CV cs.MM

    DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

    Authors: Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei

    Abstract: Learning radiance fields (NeRF) with powerful 2D diffusion models has garnered popularity for text-to-3D generation. Nevertheless, the implicit 3D representations of NeRF lack explicit modeling of meshes and textures over surfaces, and such surface-undefined way may suffer from the issues, e.g., noisy surfaces with ambiguous texture details or cross-view inconsistency. To alleviate this, we presen… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: ECCV 2024. Project page is available at \url{https://dreammesh.github.io}

  16. arXiv:2409.07434  [pdf, other

    stat.ML cs.LG math.ST

    Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models

    Authors: Jiaqi Li, Johannes Schmidt-Hieber, Wei Biao Wu

    Abstract: This paper proposes an asymptotic theory for online inference of the stochastic gradient descent (SGD) iterates with dropout regularization in linear regression. Specifically, we establish the geometric-moment contraction (GMC) for constant step-size SGD dropout iterates to show the existence of a unique stationary distribution of the dropout recursive function. By the GMC property, we provide que… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 77 pages, 5 figures, 4 tables

    MSC Class: 62E20; 62F12; 68W27

  17. arXiv:2409.07407  [pdf, other

    cs.CR cs.AI

    CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

    Authors: Zeqing Qin, Yiwei Wu, Lansheng Han

    Abstract: Large Language Models (LLMs) have shown great promise in vulnerability identification. As C/C++ comprises half of the Open-Source Software (OSS) vulnerabilities over the past decade and updates in OSS mainly occur through commits, enhancing LLMs' ability to identify C/C++ Vulnerability-Contributing Commits (VCCs) is essential. However, current studies primarily focus on further pre-training LLMs o… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 8 pages, 2 figures, conference

    MSC Class: 68M25

  18. arXiv:2409.07276  [pdf, other

    cs.IR

    STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM

    Authors: Qijiong Liu, Jieming Zhu, Lu Fan, Zhou Zhao, Xiao-Ming Wu

    Abstract: Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tok… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  19. arXiv:2409.07268  [pdf, other

    cs.LG

    Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences

    Authors: Ziang Liu, Junjie Xu, Xingjiao Wu, Jing Yang, Liang He

    Abstract: Preference-Based reinforcement learning (PBRL) learns directly from the preferences of human teachers regarding agent behaviors without needing meticulously designed reward functions. However, existing PBRL methods often learn primarily from explicit preferences, neglecting the possibility that teachers may choose equal preferences. This neglect may hinder the understanding of the agent regarding… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 7 pages, 6 figures

  20. arXiv:2409.07226  [pdf, other

    cs.SD eess.AS

    Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm

    Authors: Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin

    Abstract: This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to Singing Voice Synthesis (SVS) through the application of pretrained audio models in both continuous and discrete approaches. Specifically, we explore discrete representations derived from SSL models and audio codecs and offer significant advantages in versatility and intelligence, supporting multi-format in… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Accepted by ACMMM 2024 demo track

  21. arXiv:2409.07202  [pdf, other

    cs.LG cs.AI

    Heterogeneity-Aware Coordination for Federated Learning via Stitching Pre-trained blocks

    Authors: Shichen Zhan, Yebo Wu, Chunlin Tian, Yan Zhao, Li Li

    Abstract: Federated learning (FL) coordinates multiple devices to collaboratively train a shared model while preserving data privacy. However, large memory footprint and high energy consumption during the training process excludes the low-end devices from contributing to the global model with their own data, which severely deteriorates the model performance in real-world scenarios. In this paper, we propose… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Journal ref: 2024 IEEE/ACM International Symposium on Quality of Service (IWQoS)

  22. arXiv:2409.07167  [pdf, other

    cs.CR

    H$_2$O$_2$RAM: A High-Performance Hierarchical Doubly Oblivious RAM

    Authors: Leqian Zheng, Zheng Zhang, Wentao Dong, Yao Zhang, Ye Wu, Cong Wang

    Abstract: The combination of Oblivious RAM (ORAM) with Trusted Execution Environments (TEE) has found numerous real-world applications due to their complementary nature. TEEs alleviate the performance bottlenecks of ORAM, such as network bandwidth and roundtrip latency, and ORAM provides general-purpose protection for TEE applications against attacks exploiting memory access patterns. The defining property… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  23. arXiv:2409.07098  [pdf, other

    cs.CV cs.AI

    Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering

    Authors: Zehao Wang, Han Zhou, Matthew B. Blaschko, Tinne Tuytelaars, Minye Wu

    Abstract: Novel view synthesis of indoor scenes can be achieved by capturing a monocular video sequence of the environment. However, redundant information caused by artificial movements in the input video data reduces the efficiency of scene modeling. In this work, we tackle this challenge from the perspective of camera selection. We begin by constructing a similarity matrix that incorporates both the spati… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  24. arXiv:2409.07055  [pdf, other

    cs.CL cs.AI cs.CY

    Legal Fact Prediction: Task Definition and Dataset Construction

    Authors: Junkai Liu, Yujie Tong, Hui Huang, Shuyuan Zheng, Muyun Yang, Peicheng Wu, Makoto Onizuka, Chuan Xiao

    Abstract: Legal facts refer to the facts that can be proven by acknowledged evidence in a trial. They form the basis for the determination of court judgments. This paper introduces a novel NLP task: legal fact prediction, which aims to predict the legal fact based on a list of evidence. The predicted facts can instruct the parties and their lawyers involved in a trial to strengthen their submissions and opt… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  25. arXiv:2409.07045  [pdf, other

    cs.CL cs.AI

    Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency

    Authors: Hanyu Zhao, Li Du, Yiming Ju, Chengwei Wu, Tengfei Pan

    Abstract: With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs). Previous research mainly focuses on selecting individual high-quality instructions. However, these works overlooked the joint interactions and dependencies between different categories of instructions, leading to subopti… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  26. arXiv:2409.07020  [pdf, other

    eess.IV cs.CV

    EVENet: Evidence-based Ensemble Learning for Uncertainty-aware Brain Parcellation Using Diffusion MRI

    Authors: Chenjun Li, Dian Yang, Shun Yao, Shuyue Wang, Ye Wu, Le Zhang, Qiannuo Li, Kang Ik Kevin Cho, Johanna Seitz-Holland, Lipeng Ning, Jon Haitz Legarreta, Yogesh Rathi, Carl-Fredrik Westin, Lauren J. O'Donnell, Nir A. Sochen, Ofer Pasternak, Fan Zhang

    Abstract: In this study, we developed an Evidence-based Ensemble Neural Network, namely EVENet, for anatomical brain parcellation using diffusion MRI. The key innovation of EVENet is the design of an evidential deep learning framework to quantify predictive uncertainty at each voxel during a single inference. Using EVENet, we obtained accurate parcellation and uncertainty estimates across different datasets… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 15 pages, 5 figures

  27. arXiv:2409.07014  [pdf, other

    stat.ML cs.DB cs.LG

    A Practical Theory of Generalization in Selectivity Learning

    Authors: Peizhi Wu, Haoshu Xu, Ryan Marcus, Zachary G. Ives

    Abstract: Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist substantial gaps between practical solutions and state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework. In this paper, we a… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 14 pages

  28. arXiv:2409.06980  [pdf, other

    cs.CV

    PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening

    Authors: RuoCheng Wu, ZiEn Zhang, ShangQi Deng, YuLe Duan, LiangJian Deng

    Abstract: Pansharpening is a challenging image fusion task that involves restoring images using two different modalities: low-resolution multispectral images (LRMS) and high-resolution panchromatic (PAN). Many end-to-end specialized models based on deep learning (DL) have been proposed, yet the scale and performance of these models are limited by the size of dataset. Given the superior parameter scales and… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  29. arXiv:2409.06936  [pdf

    cs.DB cs.DL

    Intelligent Innovation Dataset on Scientific Research Outcomes and Patents

    Authors: Xinran Wu, Hui Zou, Yidan Xing, Jingjing Qu, Qiongxiu Li, Renxia Xue, Xiaoming Fu

    Abstract: Various stakeholders, such as researchers, government agencies, businesses, and laboratories require reliable scientific research outcomes and patent data to support their work. These data are crucial for advancing scientific research, conducting business evaluations, and policy analysis. However, collecting such data is often a time-consuming and laborious task. Consequently, many users turn to u… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  30. arXiv:2409.06863  [pdf, other

    cs.LG cs.HC

    Towards Understanding Human Emotional Fluctuations with Sparse Check-In Data

    Authors: Sagar Paresh Shah, Ga Wu, Sean W. Kortschot, Samuel Daviau

    Abstract: Data sparsity is a key challenge limiting the power of AI tools across various domains. The problem is especially pronounced in domains that require active user input rather than measurements derived from automated sensors. It is a critical barrier to harnessing the full potential of AI in domains requiring active user engagement, such as self-reported mood check-ins, where capturing a continuous… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  31. arXiv:2409.06851  [pdf, other

    cs.CV cs.AI

    LIME-M: Less Is More for Evaluation of MLLMs

    Authors: Kang Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shuyue Guo, Tianyu Zheng, Bo Li, Haoning Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang

    Abstract: With the remarkable success achieved by Multimodal Large Language Models (MLLMs), numerous benchmarks have been designed to assess MLLMs' ability to guide their development in image perception tasks (e.g., image captioning and visual question answering). However, the existence of numerous benchmarks results in a substantial computational burden when evaluating model performance across all of them.… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  32. arXiv:2409.06816  [pdf, other

    cs.CR

    LLM-Enhanced Software Patch Localization

    Authors: Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang

    Abstract: Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models oft… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  33. arXiv:2409.06669  [pdf, other

    cs.LG

    DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models

    Authors: Maryam Akhavan Aghdam, Hongpeng Jin, Yanzhao Wu

    Abstract: Transformer-based Mixture-of-Experts (MoE) models have been driving several recent technological advancements in Natural Language Processing (NLP). These MoE models adopt a router mechanism to determine which experts to activate for routing input tokens. However, existing router mechanisms allocate a fixed number of experts to each token, which neglects the varying importance of different input to… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  34. arXiv:2409.06627  [pdf, other

    cs.HC cs.CY cs.ET

    "The struggle is a part of the experience": Engaging Discontents in the Design of Family Meal Technologies

    Authors: Yuxing Wu, Andrew D Miller, Chia-Fang Chung, Elizabeth Kaziunas

    Abstract: Meals are a central (and messy) part of family life. Previous design framings for mealtime technologies have focused on supporting dietary needs or social and celebratory interactions at the dinner table; however, family meals involve the coordination of many activities and complicated family dynamics. In this paper, we report on findings from interviews and design sessions with 18 families from t… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Journal ref: Proc. ACM Hum.-Comput. Interact 8, CSCW2, Article 477 (November 2024), 33 pages

  35. arXiv:2409.06624  [pdf, other

    cs.CL cs.AI cs.LG

    A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

    Authors: Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Large Language Models (LLM) often needs to be Continual Pre-Trained (CPT) to obtain the unfamiliar language skill or adapt into new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ratio of extra language or domain corpus. However, there is no systematic study which bridge the gap between the optimal mixture ratio and the actual mode… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 pages, 4 figures

  36. arXiv:2409.06601  [pdf, other

    cs.CL cs.LG

    Alleviating Hallucinations in Large Language Models with Scepticism Modeling

    Authors: Yetao Wu, Yihong Wang, Teng Chen, Chenxi Liu, Ningyuan Xi, Qingqing Gu, Hongyang Lei, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Hallucinations is a major challenge for large language models (LLMs), prevents adoption in diverse fields. Uncertainty estimation could be used for alleviating the damages of hallucinations. The skeptical emotion of human could be useful for enhancing the ability of self estimation. Inspirited by this observation, we proposed a new approach called Skepticism Modeling (SM). This approach is formali… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 pages, 6 figures

  37. arXiv:2409.06584  [pdf, other

    cs.CV

    Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

    Authors: Xiang Zhang, Yufei Cui, Chenchen Fu, Weiwei Wu, Zihao Wang, Yuyang Sun, Xue Liu

    Abstract: Real-time object detection is critical for the decision-making process for many real-world applications, such as collision avoidance and path planning in autonomous driving. This work presents an innovative real-time streaming perception method, Transtreaming, which addresses the challenge of real-time object detection with dynamic computational delay. The core innovation of Transtreaming lies in… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: Submitted to AAAI 2025

  38. arXiv:2409.06580  [pdf, other

    eess.AS cs.SD

    Exploring Differences between Human Perception and Model Inference in Audio Event Recognition

    Authors: Yizhou Tan, Yanru Wu, Yuanbo Hou, Xin Xu, Hui Bu, Shengchen Li, Dick Botteldooren, Mark D. Plumbley

    Abstract: Audio Event Recognition (AER) traditionally focuses on detecting and identifying audio events. Most existing AER models tend to detect all potential events without considering their varying significance across different contexts. This makes the AER results detected by existing models often have a large discrepancy with human auditory perception. Although this is a critical and significant issue, i… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: Dataset homepage: https://github.com/Voltmeter00/MAFAR

  39. arXiv:2409.06531  [pdf, other

    cs.RO

    Multi-robot Task Allocation and Path Planning with Maximum Range Constraints

    Authors: Gang Xu, Yuchen Wu, Sheng Tao, Yifan Yang, Tao Liu, Tao Huang, Huifeng Wu, Yong Liu

    Abstract: This letter presents a novel multi-robot task allocation and path planning method that considers robots' maximum range constraints in large-sized workspaces, enabling robots to complete the assigned tasks within their range limits. Firstly, we developed a fast path planner to solve global paths efficiently. Subsequently, we propose an innovative auction-based approach that integrates our path plan… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  40. arXiv:2409.06381  [pdf, other

    cs.CV

    A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions

    Authors: Zhicong Wu, Qifeng Su, Ke Gu, Xiaodong Shi

    Abstract: Oracle Bone Inscription (OBI) is the earliest mature writing system known in China to date, which represents a crucial stage in the development of hieroglyphs. Nevertheless, the substantial quantity of undeciphered OBI characters continues to pose a persistent challenge for scholars, while conventional methods of ancient script research are both time-consuming and labor-intensive. In this paper, w… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  41. arXiv:2409.06307  [pdf, other

    cs.SD cs.AI eess.AS

    An End-to-End Approach for Chord-Conditioned Song Generation

    Authors: Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang, Qiaochu Huang, Shiyin Kang, Zhiyong Wu

    Abstract: The Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics. While the existing method, Jukebox, has explored this task, its constrained control over the generations often leads to deficiency in music performance. To mitigate the issue, we introduce an important concept from music composition, namely chords, to song generation networks. Chords form the… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  42. arXiv:2409.06259  [pdf, other

    cs.CV

    ALSS-YOLO: An Adaptive Lightweight Channel Split and Shuffling Network for TIR Wildlife Detection in UAV Imagery

    Authors: Ang He, Xiaobo Li, Ximei Wu, Chengyue Su, Jing Chen, Sheng Xu, Xiaobin Guo

    Abstract: Unmanned aerial vehicles (UAVs) equipped with thermal infrared (TIR) cameras play a crucial role in combating nocturnal wildlife poaching. However, TIR images often face challenges such as jitter, and wildlife overlap, necessitating UAVs to possess the capability to identify blurred and overlapping small targets. Current traditional lightweight networks deployed on UAVs struggle to extract feature… ▽ More

    Submitted 12 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

  43. arXiv:2409.06237  [pdf, other

    cs.SD eess.AS

    RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion

    Authors: Wei Chen, Xintao Zhao, Jun Chen, Binzhu Sha, Zhiwei Lin, Zhiyong Wu

    Abstract: Singing voice conversion (SVC) is hindered by noise sensitivity due to the use of non-robust methods for extracting pitch and energy during the inference. As clean signals are key for the source audio in SVC, music source separation preprocessing offers a viable solution for handling noisy audio, like singing with background music (BGM). However, current separating methods struggle to fully remove… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: Accepted by ISCSLP 2024

  44. arXiv:2409.06206  [pdf, other

    cs.CV

    AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration

    Authors: Hongyi Cai, Mohammad Mahdinur Rahman, Mohammad Shahid Akhtar, Jie Li, Jingyu Wu, Zhili Fang

    Abstract: Image Transformers show a magnificent success in Image Restoration tasks. Nevertheless, most of transformer-based models are strictly bounded by exorbitant memory occupancy. Our goal is to reduce the memory consumption of Swin Transformer and at the same time speed up the model during training process. Thus, we introduce AgileIR, group shifted attention mechanism along with window attention, which… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  45. arXiv:2409.06201  [pdf, other

    cs.GR math.NA physics.flu-dyn

    An Eulerian Vortex Method on Flow Maps

    Authors: Sinan Wang, Yitong Deng, Molin Deng, Hong-Xing Yu, Junwei Zhou, Duowen Chen, Taku Komura, Jiajun Wu, Bo Zhu

    Abstract: We present an Eulerian vortex method based on the theory of flow maps to simulate the complex vortical motions of incompressible fluids. Central to our method is the novel incorporation of the flow-map transport equations for line elements, which, in combination with a bi-directional marching scheme for flow maps, enables the high-fidelity Eulerian advection of vorticity variables. The fundamental… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: Accepted at ACM Transactions on Graphics (SIGGRAPH Asia 2024)

  46. arXiv:2409.06189  [pdf, other

    cs.CV

    MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control

    Authors: Yining Yao, Xi Guo, Chenjing Ding, Wei Wu

    Abstract: High-quality driving video generation is crucial for providing training data for autonomous driving models. However, current generative models rarely focus on enhancing camera motion control under multi-view tasks, which is essential for driving video generation. Therefore, we propose MyGo, an end-to-end framework for video generation, introducing motion of onboard cameras as conditions to make pr… ▽ More

    Submitted 11 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: Project Page: https://metadrivescape.github.io/papers_project/MyGo/page.html

  47. arXiv:2409.06136  [pdf, other

    cs.SD eess.AS

    DENSE: Dynamic Embedding Causal Target Speech Extraction

    Authors: Yiwen Wang, Zeyu Yuan, Xihong Wu

    Abstract: Target speech extraction (TSE) focuses on extracting the speech of a specific target speaker from a mixture of signals. Existing TSE models typically utilize static embeddings as conditions for extracting the target speaker's voice. However, the static embeddings often fail to capture the contextual information of the extracted speech signal, which may limit the model's performance. We propose a n… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

  48. arXiv:2409.06105  [pdf, other

    cs.CV

    SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided Clustering Codebook

    Authors: Chenjing Ding, Chiyu Wang, Boshi Liu, Xi Guo, Weixuan Tang, Wei Wu

    Abstract: Vector quantization (VQ) is a method for deterministically learning features through discrete codebook representations. Recent works have utilized visual tokenizers to discretize visual regions for self-supervised representation learning. However, a notable limitation of these tokenizers is lack of semantics, as they are derived solely from the pretext task of reconstructing raw image pixels in an… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  49. arXiv:2409.06029  [pdf, other

    cs.SD cs.AI eess.AS

    SongCreator: Lyrics-based Universal Song Generation

    Authors: Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

    Abstract: Music is an integral part of human culture, embodying human intelligence and creativity, of which songs compose an essential part. While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the a… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: work in progress

  50. arXiv:2409.05929  [pdf, other

    cs.LG cs.AI

    Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models

    Authors: Hongyang Lei, Xiaolong Cheng, Dan Wang, Qi Qin, Huazhen Huang, Yetao Wu, Qingqing Gu, Zhonglin Jiang, Yong Chen, Luo Ji

    Abstract: Recent Large Multi-Modal Models (LMMs) have made significant advancements in multi-modal alignment by employing lightweight connection modules to facilitate the representation and fusion of knowledge from existing pre-trained uni-modal models. However, these methods still rely on modality-specific and direction-specific connectors, leading to compartmentalized knowledge representations and reduced… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: work in progress