Zum Hauptinhalt springen

Showing 1–50 of 1,354 results for author: wang, k

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16237  [pdf, other

    cs.DB

    MQRLD: A Multimodal Data Retrieval Platform with Query-aware Feature Representation and Learned Index Based on Data Lake

    Authors: Ming Sheng, Shuliang Wang, Yong Zhang, Kaige Wang, Jingyi Wang, Yi Luo, Rui Hao

    Abstract: Multimodal data has become a crucial element in the realm of big data analytics, driving advancements in data exploration, data mining, and empowering artificial intelligence applications. To support high-quality retrieval for these cutting-edge applications, a robust data retrieval platform should meet the requirements for transparent data storage, rich hybrid queries, effective feature represent… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 36 pages, 28 figures

  2. arXiv:2408.15207  [pdf, other

    cs.SE

    Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks

    Authors: Shide Zhou, Tianlin Li, Kailong Wang, Yihao Huang, Ling Shi, Yang Liu, Haoyu Wang

    Abstract: The swift advancement of large language models (LLMs) has profoundly shaped the landscape of artificial intelligence; however, their deployment in sensitive domains raises grave concerns, particularly due to their susceptibility to malicious exploitation. This situation underscores the insufficiencies in pre-deployment testing, highlighting the urgent need for more rigorous and comprehensive evalu… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  3. arXiv:2408.15063  [pdf, other

    cs.CV

    Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance

    Authors: Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, Bin Luo

    Abstract: Although most existing multi-modal salient object detection (SOD) methods demonstrate effectiveness through training models from scratch, the limited multi-modal data hinders these methods from reaching optimality. In this paper, we propose a novel framework to explore and exploit the powerful feature representation and zero-shot generalization ability of the pre-trained Segment Anything Model (SA… ▽ More

    Submitted 28 August, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: 10 pages, 9 figures

  4. arXiv:2408.14506  [pdf, other

    cs.LG

    Distilling Long-tailed Datasets

    Authors: Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

    Abstract: Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradi… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  5. arXiv:2408.13257  [pdf, other

    cs.CV

    MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

    Authors: Yi-Fan Zhang, Huanyu Zhang, Haochen Tian, Chaoyou Fu, Shuangqing Zhang, Junfei Wu, Feng Li, Kun Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

    Abstract: Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure the significant challenges that models face in the real world, including: 1) small data scale leads to a large performance variance; 2) reliance on mo… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Project Page: $\href{https://mme-realworld.github.io/}{\text{https://mme-realworld.github.io/}}$

  6. arXiv:2408.12588  [pdf, other

    cs.CV cs.DC

    Real-Time Video Generation with Pyramid Attention Broadcast

    Authors: Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You

    Abstract: We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for DiT-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different b… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  7. arXiv:2408.12245  [pdf, other

    cs.CV

    Scalable Autoregressive Image Generation with Mamba

    Authors: Haopeng Li, Jinyue Yang, Kexin Wang, Xuerui Qiu, Yuhong Chou, Xin Li, Guoqi Li

    Abstract: We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Un… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 9 pages, 8 figures

  8. arXiv:2408.12035  [pdf

    cs.SI cs.CL cs.LG cs.MM

    Let Community Rules Be Reflected in Online Content Moderation

    Authors: Wangjiaxuan Xin, Kanlun Wang, Zhe Fu, Lina Zhou

    Abstract: Content moderation is a widely used strategy to prevent the dissemination of irregular information on social media platforms. Despite extensive research on developing automated models to support decision-making in content moderation, there remains a notable scarcity of studies that integrate the rules of online communities into content moderation. This study addresses this gap by proposing a commu… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures

  9. arXiv:2408.12003  [pdf

    cs.CL

    RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization

    Authors: Jinhu Qi, Shuai Yan, Yibo Zhang, Wentao Zhang, Rong Jin, Yuwei Hu, Ke Wang

    Abstract: With the development of the modern social economy, tourism has become an important way to meet people's spiritual needs, bringing development opportunities to the tourism industry. However, existing large language models (LLMs) face challenges in personalized recommendation capabilities and the generation of content that can sometimes produce hallucinations. This study proposes an optimization sch… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted by AIPR 2024

    ACM Class: I.2.7

  10. arXiv:2408.11554  [pdf, other

    cs.CL cs.AI

    Differentiating Choices via Commonality for Multiple-Choice Question Answering

    Authors: Wenqing Deng, Zhe Wang, Kewen Wang, Shirui Pan, Xiaowang Zhang, Zhiyong Feng

    Abstract: Multiple-choice question answering (MCQA) becomes particularly challenging when all choices are relevant to the question and are semantically similar. Yet this setting of MCQA can potentially provide valuable clues for choosing the right answer. Existing models often rank each choice separately, overlooking the context provided by other choices. Specifically, they fail to leverage the semantic com… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 9 pages, accepted to ECAI 2024

  11. arXiv:2408.11372  [pdf, other

    cs.IR cs.AI

    Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation

    Authors: Hao Wang, Yongqiang Han, Kefan Wang, Kai Cheng, Zhen Wang, Wei Guo, Yong Liu, Defu Lian, Enhong Chen

    Abstract: In the realm of recommendation systems, users exhibit a diverse array of behaviors when interacting with items. This phenomenon has spurred research into learning the implicit semantic relationships between these behaviors to enhance recommendation performance. However, these methods often entail high computational complexity. To address concerns regarding efficiency, pre-training presents a viabl… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  12. arXiv:2408.10511  [pdf, other

    cs.LG cs.AI q-bio.GN

    Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

    Authors: Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen

    Abstract: The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  13. arXiv:2408.09554  [pdf, other

    q-bio.QM cs.CV eess.IV

    Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images

    Authors: Yi Kan Wang, Ludmila Tydlitatova, Jeremy D. Kunz, Gerard Oakley, Ran A. Godrich, Matthew C. H. Lee, Chad Vanderbilt, Razik Yousfi, Thomas Fuchs, David S. Klimstra, Siqi Liu

    Abstract: Many molecular alterations serve as clinically prognostic or therapy-predictive biomarkers, typically detected using single or multi-gene molecular assays. However, these assays are expensive, tissue destructive and often take weeks to complete. Using AI on routine H&E WSIs offers a fast and economical approach to screen for multiple molecular biomarkers. We present a high-throughput AI-based syst… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

  14. arXiv:2408.09431  [pdf, other

    cs.CV

    Adversarial Attacked Teacher for Unsupervised Domain Adaptive Object Detection

    Authors: Kaiwen Wang, Yinzhe Shen, Martin Lauer

    Abstract: Object detectors encounter challenges in handling domain shifts. Cutting-edge domain adaptive object detection methods use the teacher-student framework and domain adversarial learning to generate domain-invariant pseudo-labels for self-training. However, the pseudo-labels generated by the teacher model tend to be biased towards the majority class and often mistakenly include overconfident false p… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  15. arXiv:2408.07259  [pdf, other

    cs.CV cs.AI

    GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models

    Authors: Lei Kang, Fei Yang, Kai Wang, Mohamed Ali Souibgui, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Fonts are integral to creative endeavors, design processes, and artistic productions. The appropriate selection of a font can significantly enhance artwork and endow advertisements with a higher level of expressivity. Despite the availability of numerous diverse font designs online, traditional retrieval-based methods for font selection are increasingly being supplanted by generation-based approac… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted to ECAI2024

  16. arXiv:2408.06150  [pdf, other

    cs.CL physics.chem-ph q-bio.BM

    LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library

    Authors: Tianhao Yu, Cai Yao, Zhuorui Sun, Feng Shi, Lin Zhang, Kangjie Lyu, Xuan Bai, Andong Liu, Xicheng Zhang, Jiali Zou, Wenshou Wang, Chris Lai, Kai Wang

    Abstract: In this study, we generate and maintain a database of 10 million virtual lipids through METiS's in-house de novo lipid generation algorithms and lipid virtual screening techniques. These virtual lipids serve as a corpus for pre-training, lipid representation learning, and downstream task knowledge transfer, culminating in state-of-the-art LNP property prediction performance. We propose LipidBERT,… ▽ More

    Submitted 19 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  17. arXiv:2408.06027  [pdf, other

    eess.SP cs.LG

    A Comprehensive Survey on EEG-Based Emotion Recognition: A Graph-Based Perspective

    Authors: Chenyu Liu, Xinliang Zhou, Yihao Wu, Yi Ding, Liming Zhai, Kun Wang, Ziyu Jia, Yang Liu

    Abstract: Compared to other modalities, electroencephalogram (EEG) based emotion recognition can intuitively respond to emotional patterns in the human brain and, therefore, has become one of the most focused tasks in affective computing. The nature of emotions is a physiological and psychological state change in response to brain region connectivity, making emotion recognition focus more on the dependency… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  18. arXiv:2408.05358  [pdf, other

    eess.SP cs.CV cs.HC cs.LG

    GesturePrint: Enabling User Identification for mmWave-based Gesture Recognition Systems

    Authors: Lilin Xu, Keyi Wang, Chaojie Gu, Xiuzhen Guo, Shibo He, Jiming Chen

    Abstract: The millimeter-wave (mmWave) radar has been exploited for gesture recognition. However, existing mmWave-based gesture recognition methods cannot identify different users, which is important for ubiquitous gesture interaction in many applications. In this paper, we propose GesturePrint, which is the first to achieve gesture recognition and gesture-based user identification using a commodity mmWave… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

    Comments: Accepted to the 44th IEEE International Conference on Distributed Computing Systems (ICDCS 2024)

  19. arXiv:2408.04905  [pdf, other

    cs.CL cs.AI

    GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

    Authors: Zhibo Zhang, Wuxia Bai, Yuxi Li, Mark Huasong Meng, Kailong Wang, Ling Shi, Li Li, Jun Wang, Haoyu Wang

    Abstract: Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in th… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  20. arXiv:2408.04600  [pdf, other

    cs.CV

    Improving Network Interpretability via Explanation Consistency Evaluation

    Authors: Hefeng Wu, Hao Jiang, Keze Wang, Ziyi Tang, Xianghuan He, Liang Lin

    Abstract: While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in IEEE Transactions on Multimedia

  21. arXiv:2408.04344  [pdf, other

    cs.SE

    Semantic-Enhanced Indirect Call Analysis with Large Language Models

    Authors: Baijun Cheng, Cen Zhang, Kailong Wang, Ling Shi, Yang Liu, Haoyu Wang, Yao Guo, Xiangqun Chen

    Abstract: In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the pr… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ASE'24

  22. arXiv:2408.03360  [pdf, other

    cs.LG cs.AI

    Prioritize Alignment in Dataset Distillation

    Authors: Zekai Li, Ziyao Guo, Wangbo Zhao, Tianle Zhang, Zhi-Qi Cheng, Samir Khaki, Kaipeng Zhang, Ahmad Sajedi, Konstantinos N Plataniotis, Kai Wang, Yang You

    Abstract: Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve this, existing methods use the agent model to extract information from the target dataset and embed it into the distilled dataset. Consequently, the quality of extracted and embedded information determines the quality of the d… ▽ More

    Submitted 13 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: 18 pages, 9 figures

  23. arXiv:2408.03353  [pdf, other

    cs.LG cs.AI cs.HC

    Adversarial Domain Adaptation for Cross-user Activity Recognition Using Diffusion-based Noise-centred Learning

    Authors: Xiaozhou Ye, Kevin I-Kai Wang

    Abstract: Human Activity Recognition (HAR) plays a crucial role in various applications such as human-computer interaction and healthcare monitoring. However, challenges persist in HAR models due to the data distribution differences between training and real-world data distributions, particularly evident in cross-user scenarios. This paper introduces a novel framework, termed Diffusion-based Noise-centered… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  24. arXiv:2408.03284  [pdf, other

    cs.CV cs.GR cs.MM

    ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

    Authors: Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu

    Abstract: Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyn… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://guanjz20.github.io/projects/ReSyncer

  25. arXiv:2408.03096  [pdf, other

    cs.SI

    Enhancing Twitter Bot Detection via Multimodal Invariant Representations

    Authors: Jibing Gong, Jiquan Peng, Jin Qu, ShuYing Du, Kaiyu Wang

    Abstract: Detecting Twitter Bots is crucial for maintaining the integrity of online discourse, safeguarding democratic processes, and preventing the spread of malicious propaganda. However, advanced Twitter Bots today often employ sophisticated feature manipulation and account farming techniques to blend seamlessly with genuine user interactions, posing significant challenges to existing detection models. I… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  26. arXiv:2408.02268  [pdf, other

    cs.HC

    CHORDination: Evaluating Visual Design Choices in Chord Diagrams for Network Data

    Authors: Kai Wang, Shuqi He, Wenlu Wang, Jinbei Yu, Yu Liu, Lingyun Yu

    Abstract: Chord diagrams are widely used for visualizing data connectivity and flow between nodes in a network. They are effective for representing complex structures through an intuitive and visually appealing circular layout. While previous work has focused on improving aesthetics and interactivity, the influence of fundamental design elements on user perception and information retrieval remains under-exp… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 12 pages, 4 pages of appendix, 8 figures, VINCI 2024

  27. arXiv:2408.02214  [pdf, other

    cs.CV

    More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis

    Authors: Xiangyu Peng, Kai Wang, Jianfei Yang, Yingying Zhu, Yang You

    Abstract: With the advance of deep learning, much progress has been made in building powerful artificial intelligence (AI) systems for automatic Chest X-ray (CXR) analysis. Most existing AI models are trained to be a binary classifier with the aim of distinguishing positive and negative cases. However, a large gap exists between the simple binary setting and complicated real-world medical scenarios. In this… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  28. arXiv:2408.01835  [pdf, other

    cs.CV

    TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

    Authors: Yang Yu, Chen Xu, Kai Wang

    Abstract: Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks. However, there is still a significant performance gap between fine-tuned SAMs and domain-specific models. To reduce the gap, we propose Two-Stream SAM (TS-SAM). On the one hand, inspired by the side network in Parameter-Efficient Fine-Tuning (PEFT), we designed a lightweight Convolutional Side Adap… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  29. arXiv:2408.01678  [pdf, other

    cs.CV

    iControl3D: An Interactive System for Controllable 3D Scene Generation

    Authors: Xingyi Li, Yizheng Wu, Jun Cen, Juewen Peng, Kewei Wang, Ke Xian, Zhe Wang, Zhiguo Cao, Guosheng Lin

    Abstract: 3D content creation has long been a complex and time-consuming process, often requiring specialized skills and resources. While recent advancements have allowed for text-guided 3D object and scene generation, they still fall short of providing sufficient control over the generation process, leading to a gap between the user's creative vision and the generated results. In this paper, we present iCo… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  30. arXiv:2408.01415  [pdf, other

    cs.AI cs.LG

    Conditional LoRA Parameter Generation

    Authors: Xiaolong Jin, Kai Wang, Dongwen Tang, Wangbo Zhao, Yukun Zhou, Junshu Tang, Yang You

    Abstract: Generative models have achieved remarkable success in image, video, and text domains. Inspired by this, researchers have explored utilizing generative models to generate neural network parameters. However, these efforts have been limited by the parameter size and the practicality of generating high-performance parameters. In this paper, we propose COND P-DIFF, a novel approach that demonstrates th… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  31. arXiv:2408.01003  [pdf, other

    cs.AI

    Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models

    Authors: Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, Shiguo Lian

    Abstract: Multimodal Large Language Models (MLLMs) have made significant progress in bridging the gap between visual and language modalities. However, hallucinations in MLLMs, where the generated text does not align with image content, continue to be a major challenge. Existing methods for addressing hallucinations often rely on instruction-tuning, which requires retraining the model with specific data, whi… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 14 pages, 5 figures

  32. arXiv:2408.00799  [pdf, other

    cs.IR cs.LG stat.ML

    Deep Uncertainty-Based Explore for Index Construction and Retrieval in Recommendation System

    Authors: Xin Jiang, Kaiqiang Wang, Yinlong Wang, Fengchang Lv, Taiyang Peng, Shuai Yang, Xianteng Wu, Pengye Zhang, Shuo Yuan, Yifan Zeng

    Abstract: In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems,… ▽ More

    Submitted 5 August, 2024; v1 submitted 21 July, 2024; originally announced August 2024.

    Comments: accepted by cikm2024

  33. arXiv:2408.00788  [pdf, other

    cs.NE cs.LG

    SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network

    Authors: Kexin Wang, Jiahong Zhang, Yong Ren, Man Yao, Di Shang, Bo Xu, Guoqi Li

    Abstract: Brain-inspired Spiking Neural Network (SNN) has demonstrated its effectiveness and efficiency in vision, natural language, and speech understanding tasks, indicating their capacity to "see", "listen", and "read". In this paper, we design \textbf{SpikeVoice}, which performs high-quality Text-To-Speech (TTS) via SNN, to explore the potential of SNN to "speak". A major obstacle to using SNN for such… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: 9 pages

  34. arXiv:2408.00486  [pdf, other

    cs.RO

    SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement

    Authors: Ze Wang, Yang Li, Long Xu, Hao Shi, Zunwang Ma, Zhen Chu, Chao Li, Fei Gao, Kailun Yang, Kaiwei Wang

    Abstract: Dynamic jumping on high platforms and over gaps differentiates legged robots from wheeled counterparts. Compared to walking on rough terrains, dynamic locomotion on abrupt surfaces requires fusing proprioceptive and exteroceptive perception for explosive movements. In this paper, we propose SF-TIM (Simple Framework combining Terrain Imagination and Measurement), a single-policy method that enhance… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: A demo video has been made available at https://flysoaryun.github.io/SF-TIM

  35. arXiv:2407.21465  [pdf, other

    cs.CV

    MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection

    Authors: Kuo Wang, Lechao Cheng, Weikai Chen, Pingping Zhang, Liang Lin, Fan Zhou, Guanbin Li

    Abstract: Learning from pseudo-labels that generated with VLMs~(Vision Language Models) has been shown as a promising solution to assist open vocabulary detection (OVD) in recent studies. However, due to the domain gap between VLM and vision-detection tasks, pseudo-labels produced by the VLMs are prone to be noisy, while the training design of the detector further amplifies the bias. In this work, we invest… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/wkfdb/MarvelOVD

  36. arXiv:2407.21300  [pdf

    cs.IR cs.AI

    Implementing Streaming algorithm and k-means clusters to RAG

    Authors: Haoyu Kang, Yuzhou Zhu, Yukun Zhong, Ke Wang

    Abstract: Retrieval-augmented generation (RAG) has achieved great success in information retrieval to assist large language models because it builds an external knowledge database. However, it also has many problems: it consumes a lot of memory because of the huge database. When faced with massive streaming data, it is unable to update the established index database in time. To save the memory of building t… ▽ More

    Submitted 4 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  37. arXiv:2407.20281  [pdf, other

    cs.LG cs.SE

    NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic Slicing

    Authors: Shide Zhou, Tianlin Li, Yihao Huang, Ling Shi, Kailong Wang, Yang Liu, Haoyu Wang

    Abstract: Deep Neural networks (DNNs), extensively applied across diverse disciplines, are characterized by their integrated and monolithic architectures, setting them apart from conventional software systems. This architectural difference introduces particular challenges to maintenance tasks, such as model restructuring (e.g., model compression), re-adaptation (e.g., fitting new samples), and incremental d… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  38. arXiv:2407.19468  [pdf, other

    cs.CV cs.MM

    MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability

    Authors: Buyu Liu, Kai Wang, Yansong Liu, Jun Bao, Tingting Han, Jun Yu

    Abstract: This work aims to address the multi-view perspective RGB generation from text prompts given Bird-Eye-View(BEV) semantics. Unlike prior methods that neglect layout consistency, lack the ability to handle detailed text prompts, or are incapable of generalizing to unseen view points, MVPbev simultaneously generates cross-view consistent images of different perspective views with a two-stage design, a… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM24

  39. arXiv:2407.19294  [pdf, other

    cs.CV

    Rethinking Attention Module Design for Point Cloud Analysis

    Authors: Chengzhi Wu, Kaige Wang, Zeyun Zhong, Hao Fu, Junwei Zheng, Jiaming Zhang, Julius Pfrommer, Jürgen Beyerer

    Abstract: In recent years, there have been significant advancements in applying attention mechanisms to point cloud analysis. However, attention module variants featured in various research papers often operate under diverse settings and tasks, incorporating potential training strategies. This heterogeneity poses challenges in establishing a fair comparison among these attention module variants. In this pap… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  40. arXiv:2407.18512  [pdf, other

    cs.SE

    SPOLRE: Semantic Preserving Object Layout Reconstruction for Image Captioning System Testing

    Authors: Yi Liu, Guanyu Wang, Xinyi Zheng, Gelei Deng, Kailong Wang, Yang Liu, Haoyu Wang

    Abstract: Image captioning (IC) systems, such as Microsoft Azure Cognitive Service, translate image content into descriptive language but can generate inaccuracies leading to misinterpretations. Advanced testing techniques like MetaIC and ROME aim to address these issues but face significant challenges. These methods require intensive manual labor for detailed annotations and often produce unrealistic image… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  41. arXiv:2407.15762  [pdf, other

    cs.LG cs.AI cs.CL

    Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

    Authors: Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent

    Abstract: Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Bui… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 40 pages

  42. arXiv:2407.15356  [pdf, other

    cs.CV cs.AI

    X-Recon: Learning-based Patient-specific High-Resolution CT Reconstruction from Orthogonal X-Ray Images

    Authors: Yunpeng Wang, Kang Wang, Yaoyao Zhuo, Weiya Shi, Fei Shan, Lei Liu

    Abstract: Rapid and accurate diagnosis of pneumothorax, utilizing chest X-ray and computed tomography (CT), is crucial for assisted diagnosis. Chest X-ray is commonly used for initial localization of pneumothorax, while CT ensures accurate quantification. However, CT scans involve high radiation doses and can be costly. To achieve precise quantitative diagnosis while minimizing radiation exposure, we propos… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  43. arXiv:2407.14878  [pdf, other

    cs.CL

    Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

    Authors: Yongxin Huang, Kexin Wang, Goran Glavaš, Iryna Gurevych

    Abstract: Multilingual sentence encoders are commonly obtained by training multilingual language models to map sentences from different languages into a shared semantic space. As such, they are subject to curse of multilinguality, a loss of monolingual representational accuracy due to parameter sharing. Another limitation of multilingual sentence encoders is the trade-off between monolingual and cross-lingu… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  44. arXiv:2407.14500  [pdf, other

    cs.CV

    ViLLa: Video Reasoning Segmentation with Large Language Model

    Authors: Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao

    Abstract: Although video perception models have made remarkable advancements in recent years, they still heavily rely on explicit text descriptions or pre-defined categories to identify target instances before executing video perception tasks. These models, however, fail to proactively comprehend and reason the user's intentions via textual input. Even though previous works attempt to investigate solutions… ▽ More

    Submitted 29 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: 15 pages,6 figures

  45. arXiv:2407.14100  [pdf, other

    cs.GR cs.AI cs.LG

    ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging

    Authors: Guan Li, Yang Liu, Guihua Shan, Shiyu Cheng, Weiqun Cao, Junpeng Wang, Ko-Chih Wang

    Abstract: Numerical simulation serves as a cornerstone in scientific modeling, yet the process of fine-tuning simulation parameters poses significant challenges. Conventionally, parameter adjustment relies on extensive numerical simulations, data analysis, and expert insights, resulting in substantial computational costs and low efficiency. The emergence of deep learning in recent years has provided promisi… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: To be published in Proc. IEEE VIS 2024

  46. arXiv:2407.13796  [pdf, other

    cs.CR cs.AI cs.CL

    Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large Language Models

    Authors: Zihao Xu, Yi Liu, Gelei Deng, Kailong Wang, Yuekang Li, Ling Shi, Stjepan Picek

    Abstract: Security concerns for large language models (LLMs) have recently escalated, focusing on thwarting jailbreaking attempts in discrete prompts. However, the exploration of jailbreak vulnerabilities arising from continuous embeddings has been limited, as prior approaches primarily involved appending discrete or continuous suffixes to inputs. Our study presents a novel channel for conducting direct att… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  47. arXiv:2407.13561  [pdf, other

    cs.CL

    Research on Tibetan Tourism Viewpoints information generation system based on LLM

    Authors: Jinhu Qi, Shuai Yan, Wentao Zhang, Yibo Zhang, Zirui Liu, Ke Wang

    Abstract: Tibet, ensconced within China's territorial expanse, is distinguished by its labyrinthine and heterogeneous topography, a testament to its profound historical heritage, and the cradle of a unique religious ethos. The very essence of these attributes, however, has impeded the advancement of Tibet's tourism service infrastructure, rendering existing smart tourism services inadequate for the region's… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Journal ref: ICWOC 2024

  48. arXiv:2407.13201  [pdf, other

    cs.SE

    $μ$Drive: User-Controlled Autonomous Driving

    Authors: Kun Wang, Christopher M. Poskitt, Yang Sun, Jun Sun, Jingyi Wang, Peng Cheng, Jiming Chen

    Abstract: Autonomous Vehicles (AVs) rely on sophisticated Autonomous Driving Systems (ADSs) to provide passengers a satisfying and safe journey. The individual preferences of riders plays a crucial role in shaping the perception of safety and comfort while they are in the car. Existing ADSs, however, lack mechanisms to systematically capture and integrate rider preferences into their planning modules. To br… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  49. arXiv:2407.12538  [pdf, other

    eess.IV cs.CV

    High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

    Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

    Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  50. arXiv:2407.12470  [pdf, other

    cs.CL

    Continual Learning for Temporal-Sensitive Question Answering

    Authors: Wanqi Yang, Yunqiu Xu, Yanda Li, Kunze Wang, Binbin Huang, Ling Chen

    Abstract: In this study, we explore an emerging research area of Continual Learning for Temporal Sensitive Question Answering (CLTSQA). Previous research has primarily focused on Temporal Sensitive Question Answering (TSQA), often overlooking the unpredictable nature of future events. In real-world applications, it's crucial for models to continually acquire knowledge over time, rather than relying on a sta… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCNN 2024