Skip to main content

Showing 1–50 of 3,639 results for author: Wang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12711  [pdf, other

    cs.RO

    Teleoperation in Robot-assisted MIS with Adaptive RCM via Admittance Control

    Authors: Ehsan Nasiri, Srikarran Sowrirajan, Long Wang

    Abstract: This paper presents the development and assessment of a teleoperation framework for robot-assisted minimally invasive surgery (MIS). The framework leverages our novel integration of an adaptive remote center of motion (RCM) using admittance control. This framework operates within a redundancy resolution method specifically designed for the RCM constraint. We introduce a compact, low-cost, and modu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12687  [pdf, other

    cs.CY cs.AI cs.LG

    Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

    Authors: Irina Jurenka, Markus Kunesch, Kevin R. McKee, Daniel Gillick, Shaojian Zhu, Sara Wiltberger, Shubham Milind Phal, Katherine Hermann, Daniel Kasenberg, Avishkar Bhoopchand, Ankit Anand, Miruna Pîslar, Stephanie Chan, Lisa Wang, Jennifer She, Parsa Mahmoudieh, Aliya Rysbek, Wei-Jen Ko, Andrea Huber, Brett Wiltshire, Gal Elidan, Roni Rabin, Jasmin Rubinovitz, Amit Pitaru, Mac McAllister , et al. (49 additional authors not shown)

    Abstract: A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

  3. arXiv:2407.12505  [pdf, other

    cs.LG cs.AI cs.RO

    Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments

    Authors: Runfa Chen, Ling Wang, Yu Du, Tianrui Xue, Fuchun Sun, Jianwei Zhang, Wenbing Huang

    Abstract: Learning policies for multi-entity systems in 3D environments is far more complicated against single-entity scenarios, due to the exponential expansion of the global state space as the number of entities increases. One potential solution of alleviating the exponential complexity is dividing the global space into independent local views that are invariant to transformations including translations a… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  4. arXiv:2407.12425  [pdf, other

    cs.CL

    Navigating the Noisy Crowd: Finding Key Information for Claim Verification

    Authors: Haisong Gong, Huanhuan Ma, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Claim verification is a task that involves assessing the truthfulness of a given claim based on multiple evidence pieces. Using large language models (LLMs) for claim verification is a promising way. However, simply feeding all the evidence pieces to an LLM and asking if the claim is factual does not yield good results. The challenge lies in the noisy nature of both the evidence and the claim: evi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.12380  [pdf, other

    eess.AS cs.SD

    PCQ: Emotion Recognition in Speech via Progressive Channel Querying

    Authors: Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao

    Abstract: In human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions. Traditional SER methods struggle to effectively capture the long-term temporal correla-tions and dynamic variations in complex emotional expressions. To overcome these limitations, we introduce the PCQ method, a pioneering approach for SER via \textbf{P}rogress… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted for publication by International Conference On Intelligent Computing 2024. For data and code, see <a href="https://github.com/ICIG/PCQ-Net">this https URL</a>

  6. arXiv:2407.12261  [pdf

    cs.NE cs.ET cs.LG physics.app-ph

    Voltage-Controlled Magnetoelectric Devices for Neuromorphic Diffusion Process

    Authors: Yang Cheng, Qingyuan Shu, Albert Lee, Haoran He, Ivy Zhu, Haris Suhail, Minzhang Chen, Renhe Chen, Zirui Wang, Hantao Zhang, Chih-Yao Wang, Shan-Yi Yang, Yu-Chen Hsin, Cheng-Yi Shih, Hsin-Han Lee, Ran Cheng, Sudhakar Pamarti, Xufeng Kou, Kang L. Wang

    Abstract: Stochastic diffusion processes are pervasive in nature, from the seemingly erratic Brownian motion to the complex interactions of synaptically-coupled spiking neurons. Recently, drawing inspiration from Langevin dynamics, neuromorphic diffusion models were proposed and have become one of the major breakthroughs in the field of generative artificial intelligence. Unlike discriminative models that h… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.12031  [pdf

    cs.CY cs.AI

    Evaluation of Bias Towards Medical Professionals in Large Language Models

    Authors: Xi Chen, Yang Xu, MingKe You, Li Wang, WeiZhi Liu, Jian Li

    Abstract: This study evaluates whether large language models (LLMs) exhibit biases towards medical professionals. Fictitious candidate resumes were created to control for identity factors while maintaining consistent qualifications. Three LLMs (GPT-4, Claude-3-haiku, and Mistral-Large) were tested using a standardized prompt to evaluate resumes for specific residency programs. Explicit bias was tested by ch… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 36 pages, 6 figures

  8. arXiv:2407.11946  [pdf, other

    cs.CV

    Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

    Authors: Ping Wang, Yulun Zhang, Lishun Wang, Xin Yuan

    Abstract: Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstr… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  9. arXiv:2407.11433  [pdf, other

    cs.CV

    CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation

    Authors: Yisen Wang, Yao Teng, Limin Wang

    Abstract: Recognition and generation are two fundamental tasks in computer vision, which are often investigated separately in the exiting literature. However, these two tasks are highly correlated in essence as they both require understanding the underline semantics of visual concepts. In this paper, we propose a new learning framework, coined as CycleHOI, to boost the performance of human-object interactio… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  10. arXiv:2407.11389  [pdf, ps, other

    cs.NI eess.SP

    Spatial-spectral Cell-free Networks: A Large-scale Case Study

    Authors: Zesheng Zhu, Lifeng Wang, Xin Wang, Dongming Wang, Kai-Kit Wong

    Abstract: This paper studies the large-scale cell-free networks where dense distributed access points (APs) serve many users. As a promising next-generation network architecture, cell-free networks enable ultra-reliable connections and minimal fading/blockage, which are much favorable to the millimeter wave and Terahertz transmissions. However, conventional beam management with large phased arrays in a cell… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  11. arXiv:2407.11351  [pdf, other

    cs.CV

    Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities

    Authors: Xu Zheng, Yuanhuiyi Lyu, Lin Wang

    Abstract: Image modality is not perfect as it often fails in certain conditions, e.g., night and fast motion. This significantly limits the robustness and versatility of existing multi-modal (i.e., Image+X) semantic segmentation methods when confronting modality absence or failure, as often occurred in real-world applications. Inspired by the open-world learning capability of multi-modal vision-language mod… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  12. arXiv:2407.11344  [pdf, other

    cs.CV

    Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation

    Authors: Xu Zheng, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang

    Abstract: Fusing an arbitrary number of modalities is vital for achieving robust multi-modal fusion of semantic segmentation yet remains less explored to date. Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches. However, the RGB modality may struggle in certain circumstances, e.g., nighttime, while others, e.g., event dat… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  13. arXiv:2407.11335  [pdf, other

    cs.CV

    LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

    Authors: Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

    Abstract: Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  14. arXiv:2407.11268  [pdf, other

    stat.ML cs.CE cs.LG

    Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process

    Authors: Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, Wei Chen

    Abstract: Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 20 Pages,9 Figures, Data is available per request

  15. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  16. arXiv:2407.10862  [pdf, other

    cs.CV

    R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection

    Authors: Zheyuan Zhou, Le Wang, Naiyu Fang, Zili Wang, Lemiao Qiu, Shuyou Zhang

    Abstract: 3D anomaly detection plays a crucial role in monitoring parts for localized inherent defects in precision manufacturing. Embedding-based and reconstruction-based approaches are among the most popular and successful methods. However, there are two major challenges to the practical application of the current approaches: 1) the embedded models suffer the prohibitive computational and storage due to t… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  17. arXiv:2407.10636  [pdf, other

    cs.CV

    Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

    Authors: Lin Zhu, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang

    Abstract: Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  18. arXiv:2407.10223  [pdf, other

    cs.LG cs.CR

    Practical Unlearning for Large Language Models

    Authors: Chongyang Gao, Lixu Wang, Chenkai Weng, Xiao Wang, Qi Zhu

    Abstract: While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training da… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures. The first two authors contribute equally and they are ordered alphabetically

  19. arXiv:2407.10181  [pdf, other

    cs.CV

    Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures

    Authors: Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun, Kede Ma

    Abstract: Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different s… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  20. arXiv:2407.09977  [pdf

    physics.geo-ph cs.AI

    Mitigating Interpretation Bias in Rock Records with Large Language Models: Insights from Paleoenvironmental Analysis

    Authors: Luoqi Wang, Haipeng Li, Linshu Hu, Jiarui Cai, Zhenhong Du

    Abstract: The reconstruction of Earth's history faces significant challenges due to the nonunique interpretations often derived from rock records. The problem has long been recognized but there are no systematic solutions in practice. This study introduces an innovative approach that leverages Large Language Models (LLMs) along with retrieval augmented generation and real-time search capabilities to counter… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

  21. arXiv:2407.08481  [pdf, other

    eess.IV cs.CV

    SliceMamba for Medical Image Segmentation

    Authors: Chao Fan, Hongyuan Yu, Luo Wang, Yan Huang, Liang Wang, Xibin Jia

    Abstract: Despite the progress made in Mamba-based medical image segmentation models, current methods utilizing unidirectional or multi-directional feature scanning mechanisms fail to well model dependencies between neighboring positions in the image, hindering the effective modeling of local features. However, local features are crucial for medical image segmentation as they provide vital information about… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  22. arXiv:2407.07924  [pdf, other

    math.OC cs.AI cs.CL cs.LG

    Solving General Natural-Language-Description Optimization Problems with Large Language Models

    Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

    Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  23. arXiv:2407.07674  [pdf, other

    cs.LG

    Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations

    Authors: Pradeep Bajracharya, Javier Quetzalcóatl Toledo-Marín, Geoffrey Fox, Shantenu Jha, Linwei Wang

    Abstract: High-performance scientific simulations, important for comprehension of complex systems, encounter computational challenges especially when exploring extensive parameter spaces. There has been an increasing interest in developing deep neural networks (DNNs) as surrogate models capable of accelerating the simulations. However, existing approaches for training these DNN surrogates rely on extensive… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 17 pages, 9 figures, 1 table

  24. arXiv:2407.07506  [pdf, other

    eess.SP cs.AI

    Generative AI for RF Sensing in IoT systems

    Authors: Li Wang, Chao Zhang, Qiyang Zhao, Hang Zou, Samson Lasaulce, Giuseppe Valenzise, Zhuo He, Merouane Debbah

    Abstract: The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems. Among these, Radio Frequency (RF) sensing stands out for its cost-effective and non-intrusive monitoring of human activities and environmental changes. However, traditional RF sensing methods face significa… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  25. arXiv:2407.07345  [pdf, other

    cs.CV

    Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

    Authors: Ruolin Li, Lu Wang, Tingting Yang, Lisheng Xu, Bingyang Ma, Yongchun Li, Hongchao Wei

    Abstract: Micro-expressions (MEs) are spontaneous, unconscious facial expressions that have promising applications in various fields such as psychotherapy and national security. Thus, micro-expression recognition (MER) has attracted more and more attention from researchers. Although various MER methods have emerged especially with the development of deep learning techniques, the task still faces several cha… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  26. arXiv:2407.06491  [pdf, other

    cs.CV

    VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

    Authors: Xinhao Li, Zhenpeng Huang, Jing Wang, Kunchang Li, Limin Wang

    Abstract: With the growth of high-quality data and advancement in visual pre-training paradigms, Video Foundation Models (VFMs) have made significant progress recently, demonstrating their remarkable performance on traditional video understanding benchmarks. However, the existing benchmarks (e.g. Kinetics) and their evaluation protocols are often limited by relatively poor diversity, high evaluation costs,… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  27. arXiv:2407.06159  [pdf, other

    cs.CV

    A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

    Authors: Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma

    Abstract: Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and mult… ▽ More

    Submitted 11 June, 2024; originally announced July 2024.

  28. arXiv:2407.05718  [pdf, other

    cs.CL

    A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

    Authors: Chenxu Yang, Zheng Lin, Chong Tian, Liang Pang, Lanrui Wang, Zhengyang Tong, Qirong Ho, Yanan Cao, Weiping Wang

    Abstract: Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to disc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  29. arXiv:2407.05547  [pdf, other

    cs.CV

    LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction

    Authors: Kanghao Chen, Hangyu Li, JiaZhou Zhou, Zeyu Wang, Lin Wang

    Abstract: Event cameras harness advantages such as low latency, high temporal resolution, and high dynamic range (HDR), compared to standard cameras. Due to the distinct imaging paradigm shift, a dominant line of research focuses on event-to-video (E2V) reconstruction to bridge event-based and standard computer vision. However, this task remains challenging due to its inherently ill-posed nature: event came… ▽ More

    Submitted 17 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Project Page: https://vlislab22.github.io/LaSe-E2V/

  30. arXiv:2407.05229  [pdf, other

    cs.LG

    HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning

    Authors: Liyuan Wang, Jingyi Xie, Xingxing Zhang, Hang Su, Jun Zhu

    Abstract: The deployment of pre-trained models (PTMs) has greatly advanced the field of continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting. To sustain these advantages for sequentially arriving tasks, a promising direction involves keeping the pre-trained backbone frozen while employing parameter-efficient tuning (PET) techniques to instruct representatio… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: This is a generalized version of our HiDe-Prompt (NeurIPS 2023, Spotlight)

  31. arXiv:2407.05161  [pdf, other

    cs.SI cs.IR

    A Survey of Datasets for Information Diffusion Tasks

    Authors: Fuxia Guo, Xiaowen Wang, Yanwei Xie, Zehao Wang, Jingqiu Li, Lanjun Wang

    Abstract: Information diffusion across various new media platforms gradually influences perceptions, decisions, and social behaviors of individual users. In communication studies, the famous Five W's of Communication model (5W Model) has displayed the process of information diffusion clearly. At present, although plenty of studies and corresponding datasets about information diffusion have emerged, a system… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  32. arXiv:2407.05119  [pdf, other

    cs.CV

    Open-Event Procedure Planning in Instructional Videos

    Authors: Yilu Wu, Hanlin Wang, Jing Wang, Limin Wang

    Abstract: Given the current visual observations, the traditional procedure planning task in instructional videos requires a model to generate goal-directed plans within a given action space. All previous methods for this task conduct training and inference under the same action space, and they can only plan for pre-defined events in the training set. We argue this setting is not applicable for human assista… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 9 pages(main text), 6 figures, 10 tables

  33. arXiv:2407.04603  [pdf, other

    cs.CV

    AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

    Authors: Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, Limin Wang

    Abstract: Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks. However, we often fail to fully unleash their potential when adapting them for new concept understanding due to limited information on new classes. To address this limitation, we introduce a novel adaptation framework, AWT (Augment, Weight, then Transport). AWT comprises three key compon… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  34. arXiv:2407.04460  [pdf, other

    cs.LG

    Smart Sampling: Helping from Friendly Neighbors for Decentralized Federated Learning

    Authors: Lin Wang, Yang Chen, Yongxin Guo, Xiaoying Tang

    Abstract: Federated Learning (FL) is gaining widespread interest for its ability to share knowledge while preserving privacy and reducing communication costs. Unlike Centralized FL, Decentralized FL (DFL) employs a network architecture that eliminates the need for a central server, allowing direct communication among clients and leading to significant communication resource savings. However, due to data het… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  35. arXiv:2407.04066  [pdf, ps, other

    cs.CV

    EMPL: A novel Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation

    Authors: Wanqi Yang, Haoran Wang, Lei Wang, Ge Song, Yang Gao

    Abstract: Few-shot unsupervised domain adaptation (FS-UDA) utilizes few-shot labeled source domain data to realize effective classification in unlabeled target domain. However, current FS-UDA methods are still suffer from two issues: 1) the data from different domains can not be effectively aligned by few-shot labeled data due to the large domain gaps, 2) it is unstable and time-consuming to generalize to n… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  36. arXiv:2407.03217  [pdf, other

    cs.CV

    MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI

    Authors: Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang

    Abstract: Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification.… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages

  37. arXiv:2407.03179  [pdf, other

    cs.CV cs.AI cs.LG

    Motion meets Attention: Video Motion Prompts

    Authors: Qixiang Chen, Lei Wang, Piotr Koniusz, Tom Gedeon

    Abstract: Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms hav… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Research report

  38. Representation learning with CGAN for casual inference

    Authors: Zhaotian Weng, Jianbo Hong, Lan Wang

    Abstract: Conditional Generative Adversarial Nets (CGAN) is often used to improve conditional image generation performance. However, there is little research on Representation learning with CGAN for causal inference. This paper proposes a new method for finding representation learning functions by adopting the adversarial idea. We apply the pattern of CGAN and theoretically emonstrate the feasibility of fin… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 3rd International Conference on Signal Processing and Machine Learning

    ACM Class: I.2.6

    Journal ref: Applied and Computational Engineering, Vol. 6, 1585-1590 (2023)

  39. arXiv:2407.02315  [pdf, other

    cs.CV cs.AI

    VFIMamba: Video Frame Interpolation with State Space Models

    Authors: Guozhen Zhang, Chunxu Liu, Yutao Cui, Xiaotong Zhao, Kai Ma, Limin Wang

    Abstract: Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering b… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  40. arXiv:2407.01942  [pdf, other

    cs.AI cs.CL cs.CV

    Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

    Authors: Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi

    Abstract: The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and furth… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages

  41. arXiv:2407.01884  [pdf, other

    cs.CV cs.HC

    EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

    Authors: Xu Zheng, Ling Wang, Kanghao Chen, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang

    Abstract: Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  42. arXiv:2407.01842  [pdf, other

    cs.CV

    CLIP the Divergence: Language-guided Unsupervised Domain Adaptation

    Authors: Jinjing Zhu, Yucheng Chen, Lin Wang

    Abstract: Unsupervised domain adaption (UDA) has emerged as a popular solution to tackle the divergence between the labeled source and unlabeled target domains. Recently, some research efforts have been made to leverage large vision-language models, such as CLIP, and then fine-tune or learn prompts from them for addressing the challenging UDA task. In this work, we shift the gear to a new direction by direc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  43. arXiv:2407.01299  [pdf, other

    cs.CV

    Preserving Full Degradation Details for Blind Image Super-Resolution

    Authors: Hongda Liu, Longguang Wang, Ye Zhang, Kaiwen Xue, Shunbo Zhou, Yulan Guo

    Abstract: The performance of image super-resolution relies heavily on the accuracy of degradation information, especially under blind settings. Due to absence of true degradation models in real-world scenarios, previous methods learn distinct representations by distinguishing different degradations in a batch. However, the most significant degradation differences may provide shortcuts for the learning of re… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages, 11 figures, 4 tables

  44. arXiv:2407.01247  [pdf, ps, other

    cs.CV

    Multi-level Reliable Guidance for Unpaired Multi-view Clustering

    Authors: Like Xin, Wanqi Yang, Lei Wang, Ming Yang

    Abstract: In this paper, we address the challenging problem of unpaired multi-view clustering (UMC), aiming to perform effective joint clustering using unpaired observed samples across multiple views. Commonly, traditional incomplete multi-view clustering (IMC) methods often depend on paired samples to capture complementary information between views. However, the strategy becomes impractical in UMC due to t… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  45. arXiv:2407.00743  [pdf, other

    cs.MM cs.AI cs.CL eess.AS

    AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

    Authors: Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang, Jianwu Dang

    Abstract: Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion… ▽ More

    Submitted 12 April, 2024; originally announced July 2024.

  46. arXiv:2407.00731  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models Struggle in Token-Level Clinical Named Entity Recognition

    Authors: Qiuhao Lu, Rui Li, Andrew Wen, Jinlian Wang, Liwei Wang, Hongfang Liu

    Abstract: Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: AMIA 2024 Annual Symposium Proceedings

  47. Parametric Primitive Analysis of CAD Sketches with Vision Transformer

    Authors: Xiaogang Wang, Liang Wang, Hongyu Wu, Guoqiang Xiao, Kai Xu

    Abstract: The design and analysis of Computer-Aided Design (CAD) sketches play a crucial role in industrial product design, primarily involving CAD primitives and their inter-primitive constraints. To address challenges related to error accumulation in autoregressive models and the complexities associated with self-supervised model design for this task, we propose a two-stage network framework. This framewo… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  48. arXiv:2407.00088  [pdf, other

    cs.DC cs.AI

    T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

    Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

    Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  49. arXiv:2407.00016  [pdf, other

    cs.DC

    AdaBridge: Dynamic Data and Computation Reuse for Efficient Multi-task DNN Co-evolution in Edge Systems

    Authors: Lehao Wang, Zhiwen Yu, Sicong Liu, Chenshu Wu, Xiangrui Xu, Bin Guo

    Abstract: Running multi-task DNNs on mobiles is an emerging trend for various applications like autonomous driving and mobile NLP. Mobile DNNs are often compressed to fit the limited resources and thus suffer from degraded accuracy and generalizability due to data drift. DNN evolution, e.g., continuous learning and domain adaptation, has been demonstrated effective in overcoming these issues, mostly for sin… ▽ More

    Submitted 2 May, 2024; originally announced July 2024.

    Comments: Accepted by NSDI'24 Poster

  50. arXiv:2407.00005  [pdf, other

    cs.DC

    Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, GPU and CSD

    Authors: Jia Wei, Xingjun Zhang, Witold Pedrycz, Longxiang Wang, Jie Zhao

    Abstract: Most existing data preprocessing is done at the CPU. Although some studies use techniques such as multi-processing and double buffering to accelerate CPU preprocessing, CPU computational speed and storage bandwidth still limit the processing speed. Other studies try to use intelligent data storage devices, such as computational storage devices, to complete data preprocessing instead of CPUs. The c… ▽ More

    Submitted 17 April, 2024; originally announced July 2024.