Skip to main content

Showing 1–50 of 1,160 results for author: Liu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12851  [pdf

    cs.CL

    ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

    Authors: Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Junqiu Ye, Chu Liao, Qi Hao, Wen Ye, Cheng Luo, Xinyan Wang, Chuang Cheng, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

    Abstract: Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 39 pages, 6 figures, 6 tables

  2. arXiv:2407.12395  [pdf, other

    cs.CV

    Efficient Depth-Guided Urban View Synthesis

    Authors: Sheng Miao, Jiaxin Huang, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Andreas Geiger, Yiyi Liao

    Abstract: Recent advances in implicit scene representation enable high-fidelity street view novel view synthesis. However, existing methods optimize a neural radiance field for each scene, relying heavily on dense training images and extensive computation resources. To mitigate this shortcoming, we introduce a new method called Efficient Depth-Guided Urban View Synthesis (EDUS) for fast feed-forward inferen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV2024, Project page: https://xdimlab.github.io/EDUS/

  3. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  4. arXiv:2407.10688  [pdf, other

    cs.LG

    Probability Passing for Graph Neural Networks: Graph Structure and Representations Joint Learning

    Authors: Ziyan Wang, YaXuan He, Bin Liu

    Abstract: Graph Neural Networks (GNNs) have achieved notable success in the analysis of non-Euclidean data across a wide range of domains. However, their applicability is constrained by the dependence on the observed graph structure. To solve this problem, Latent Graph Inference (LGI) is proposed to infer a task-specific latent structure by computing similarity or edge probability of node features and then… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.09697  [pdf, other

    cs.CV

    Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

    Authors: Shiqi Tan, Hamidreza Fazlali, Yixuan Xu, Yuan Ren, Bingbing Liu

    Abstract: Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentat… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  6. arXiv:2407.07653  [pdf, other

    cs.HC

    AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Jiangyan Yi, Bin Liu, Jianhua Tao

    Abstract: Explainable Multimodal Emotion Recognition (EMER) is an emerging task that aims to achieve reliable and accurate emotion recognition. However, due to the high annotation cost, the existing dataset (denoted as EMER-Fine) is small, making it difficult to perform supervised training. To reduce the annotation cost and expand the dataset size, this paper reviews the previous dataset construction proces… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  7. arXiv:2407.06516  [pdf, other

    cs.CV

    VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

    Authors: Yibo Liu, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Bingbing Liu, Yang Liu, Jinjun Shan

    Abstract: Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world obser… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2407.06512  [pdf

    cs.CV cs.AI

    LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

    Authors: Jiayi Liu, Qianyu Zhang, Xue Wan, Shengyang Zhang, Yaolin Tian, Haodong Han, Yutao Zhao, Baichuan Liu, Zeyuan Zhao, Xubo Luo

    Abstract: With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 22 pages, 11 figures, 9 tables

  9. arXiv:2407.06380  [pdf, other

    cs.CL

    Data, Data Everywhere: A Guide for Pretraining Dataset Construction

    Authors: Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Bo Liu, Aastha Jhunjhunwala, Zhilin Wang, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on. However, model developers fail to disclose their construction methodology which has lead to a lack of open information on how to develop effective pretraining sets. To address this issue, we perform the first systematic study across the entire p… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Preprint. Under review

  10. arXiv:2407.06309  [pdf, other

    cs.CY cs.AI

    Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps

    Authors: Chuanbo Hu, Bin Liu, Minglei Yin, Yilu Zhou, Xin Li

    Abstract: Mobile applications (Apps) could expose children to inappropriate themes such as sexual content, violence, and drug use. Maturity rating offers a quick and effective method for potential users, particularly guardians, to assess the maturity levels of apps. Determining accurate maturity ratings for mobile apps is essential to protect children's health in today's saturated digital marketplace. Exist… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  11. arXiv:2407.04118  [pdf, other

    cs.CL cs.AI

    MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization

    Authors: Yuyan Chen, Zhihao Wen, Ge Fan, Zhengyu Chen, Wei Wu, Dayiheng Liu, Zhixu Li, Bang Liu, Yanghua Xiao

    Abstract: Prompt engineering, as an efficient and effective way to leverage Large Language Models (LLM), has drawn a lot of attention from the research community. The existing research primarily emphasizes the importance of adapting prompts to specific tasks, rather than specific LLMs. However, a good prompt is not solely defined by its wording, but also binds to the nature of the LLM in question. In this w… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to EMNLP 2023 (Findings)

  12. arXiv:2407.04105  [pdf, other

    cs.CL cs.AI

    Can Pre-trained Language Models Understand Chinese Humor?

    Authors: Yuyan Chen, Zhixu Li, Jiaqing Liang, Yanghua Xiao, Bang Liu, Yunwen Chen

    Abstract: Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, these simple attempts do not substantially answer the question: {\em whether PLMs are capable of humor understanding?} This paper is the first wo… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to WSDM 2022

  13. arXiv:2407.02598  [pdf, other

    cs.CV cs.AI

    AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

    Authors: Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, Bingbing Liu

    Abstract: Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatti… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  14. arXiv:2407.01956  [pdf, other

    eess.SY cs.RO

    Cloud-Edge-Terminal Collaborative AIGC for Autonomous Driving

    Authors: Jianan Zhang, Zhiwei Wei, Boxun Liu, Xiayi Wang, Yong Yu, Rongqing Zhang

    Abstract: In dynamic autonomous driving environment, Artificial Intelligence-Generated Content (AIGC) technology can supplement vehicle perception and decision making by leveraging models' generative and predictive capabilities, and has the potential to enhance motion planning, trajectory prediction and traffic simulation. This article proposes a cloud-edge-terminal collaborative architecture to support AIG… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  15. arXiv:2407.01541  [pdf

    cs.NI

    Integration of Computer Networks and Artificial Neural Networks for an AI-based Network Operator

    Authors: Binbin Wu, Jingyu Xu, Yifan Zhang, Bo Liu, Yulu Gong, Jiaxin Huang

    Abstract: This paper proposes an integrated approach combining computer networks and artificial neural networks to construct an intelligent network operator, functioning as an AI model. State information from computer networks is transformed into embedded vectors, enabling the operator to efficiently recognize different pieces of information and accurately output appropriate operations for the computer netw… ▽ More

    Submitted 9 April, 2024; originally announced July 2024.

  16. arXiv:2407.01251  [pdf, other

    cs.CR cs.AI

    QUEEN: Query Unlearning against Model Extraction

    Authors: Huajie Chen, Tianqing Zhu, Lefeng Zhang, Bo Liu, Derui Wang, Wanlei Zhou, Minhui Xue

    Abstract: Model extraction attacks currently pose a non-negligible threat to the security and privacy of deep learning models. By querying the model with a small dataset and usingthe query results as the ground-truth labels, an adversary can steal a piracy model with performance comparable to the original model. Two key issues that cause the threat are, on the one hand, accurate and unlimited queries can be… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  17. arXiv:2407.00737  [pdf, other

    cs.CV

    LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

    Authors: Mushui Liu, Yuhang Ma, Xinfeng Zhang, Yang Zhen, Zeng Zhao, Zhipeng Hu, Bai Liu, Changjie Fan

    Abstract: Diffusion Models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts that involve multiple objects, attribute binding, and long descriptions. This paper proposes a framework called \textbf{LLM4GEN}, which enhances the semantic understanding ability of text-to-image diffusion models by leveraging the se… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 11 pages, 13 figures

  18. arXiv:2406.19736  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

    Authors: Jihao Liu, Xin Huang, Jinliang Zheng, Boxiao Liu, Jia Wang, Osamu Yoshie, Yu Liu, Hongsheng Li

    Abstract: This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs). While existing visual instruction datasets often focus on question-answering, they struggle to generalize to broader application scenarios such as creative writing, summarization, or image analysis… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Dataset and models are available at https://github.com/jihaonew/MM-Instruct

  19. arXiv:2406.19583  [pdf, other

    cs.IT

    Interference Cancellation Information Geometry Approach for Massive MIMO Channel Estimation

    Authors: An-An Lu, Bingyan Liu, Xiqi Gao

    Abstract: In this paper, the interference cancellation information geometry approaches (IC-IGAs) for massive MIMO channel estimation are proposed. The proposed algorithms are low-complexity approximations of the minimum mean square error (MMSE) estimation. To illustrate the proposed algorithms, a unified framework of the information geometry approach for channel estimation and its geometric explanation are… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 38 pages, 9 figures

  20. arXiv:2406.18931  [pdf, other

    cs.LG

    Semi-adaptive Synergetic Two-way Pseudoinverse Learning System

    Authors: Binghong Liu, Ziqi Zhao, Shupan Li, Ke Wang

    Abstract: Deep learning has become a crucial technology for making breakthroughs in many fields. Nevertheless, it still faces two important challenges in theoretical and applied aspects. The first lies in the shortcomings of gradient descent based learning schemes which are time-consuming and difficult to determine the learning control hyperparameters. Next, the architectural design of the model is usually… ▽ More

    Submitted 6 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  21. arXiv:2406.18873  [pdf, other

    cs.AR

    LayoutCopilot: An LLM-powered Multi-agent Collaborative Framework for Interactive Analog Layout Design

    Authors: Bingyang Liu, Haoyi Zhang, Xiaohan Gao, Zichen Kong, Xiyuan Tang, Yibo Lin, Runsheng Wang, Ru Huang

    Abstract: Analog layout design heavily involves interactive processes between humans and design tools. The tools are usually designed to use scripting commands or visualized buttons for manipulation, especially for those interactive automation functionalities, which have a steep learning curve and cumbersome user experience, making a notable barrier to their adoption by designers. Aiming to address such a u… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 8pages, 8figures

  22. arXiv:2406.13150  [pdf

    eess.IV cs.CV

    MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction

    Authors: Jiaqi Cui, Xinyi Zeng, Pinxian Zeng, Bo Liu, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g.… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Early accepted by MICCAI2024

  23. arXiv:2406.12707  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

    Authors: Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu

    Abstract: Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures, ACL24 accepted

  24. arXiv:2406.11249  [pdf, other

    cs.AI cs.LG

    Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

    Authors: Yang Chen, Cong Fang, Zhouchen Lin, Bing Liu

    Abstract: Foundation Models (FMs) have demonstrated remarkable insights into the relational dynamics of the world, leading to the crucial question: how do these models acquire an understanding of world hybrid relations? Traditional statistical learning, particularly for prediction problems, may overlook the rich and inherently structured information from the data, especially regarding the relationships betw… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.09958  [pdf, other

    cs.LG

    H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

    Authors: Son Nguyen, Lizhang Chen, Bo Liu, Qiang Liu

    Abstract: In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a factorized approach to momentum and scaling parameters. Our algorithm demonstrates competitive performances on both ResNets and Vision Transformers, while achieving sublinear memory costs through the use of rank-1 parameterizations for moment estimators. We develop our algorithms based on principles derived from Ha… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 21 pages, 4 figures

  26. arXiv:2406.07973  [pdf, other

    cs.CR

    Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

    Authors: Shang Wang, Tianqing Zhu, Bo Liu, Ming Ding, Xu Guo, Dayong Ye, Wanlei Zhou, Philip S. Yu

    Abstract: With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and se… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  27. arXiv:2406.07824  [pdf, other

    quant-ph cs.CR

    Efficient Arbitrated Quantum Digital Signature with Multi-Receiver Verification

    Authors: Siyu Xiong, Bangying Tang, Hui Han, Jinquan Huang, Mingqiang Bai, Fangzhao Li, Wanrong Yu Zhiwen Mo, Bo Liu

    Abstract: Quantum digital signature is used to authenticate the identity of the signer with information theoretical security, while providing non-forgery and non-repudiation services. In traditional multi-receiver quantum digital signature schemes without an arbitrater, the transferability of one-to-one signature is always required to achieve unforgeability, with complicated implementation and heavy key con… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  28. arXiv:2406.06462  [pdf, other

    cs.CV cs.LG

    VCR: Visual Caption Restoration

    Authors: Tianyu Zhang, Suyuchen Wang, Lu Li, Ge Zhang, Perouz Taslakian, Sai Rajeswar, Jie Fu, Bang Liu, Yoshua Bengio

    Abstract: We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedde… ▽ More

    Submitted 24 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 17 pages, 2 figures

  29. arXiv:2406.05359  [pdf, other

    eess.AS cs.SD

    Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

    Authors: Bei Liu, Haoyu Wang, Yanmin Qian

    Abstract: Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verification. Firstly, we propose a novel adaptive uniform precision quantization method which enables the dynamic generation of quantization centroids custom… ▽ More

    Submitted 18 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE/ACM Transactions on Audio Speech and Language Processing (Under Review)

  30. arXiv:2406.04598  [pdf, other

    cs.AI

    OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework

    Authors: Wei Zhou, Hong Huang, Guowen Zhang, Ruize Shi, Kehan Yin, Yuanyuan Lin, Bang Liu

    Abstract: Large language models (LLMs) have excelled in various natural language processing tasks, but challenges in interpretability and trustworthiness persist, limiting their use in high-stakes fields. Causal discovery offers a promising approach to improve transparency and reliability. However, current evaluations are often one-sided and lack assessments focused on interpretability performance. Addition… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  31. arXiv:2406.03143  [pdf, other

    cs.CV cs.CR

    ZeroPur: Succinct Training-Free Adversarial Purification

    Authors: Xiuli Bi, Zonglin Yang, Bo Liu, Xiaodong Cun, Chi-Man Pun, Pietro Lio, Bin Xiao

    Abstract: Adversarial purification is a kind of defense technique that can defend various unseen adversarial attacks without modifying the victim classifier. Existing methods often depend on external generative models or cooperation between auxiliary functions and victim classifiers. However, retraining generative models, auxiliary functions, or victim classifiers relies on the domain of the fine-tuned data… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures, under review

  32. arXiv:2406.02134  [pdf, other

    cs.CL

    The current status of large language models in summarizing radiology report impressions

    Authors: Danqing Hu, Shanyuan Zhang, Qing Liu, Xiaofeng Zhu, Bing Liu

    Abstract: Large language models (LLMs) like ChatGPT show excellent capabilities in various natural language processing tasks, especially for text generation. The effectiveness of LLMs in summarizing radiology report impressions remains unclear. In this study, we explore the capability of eight LLMs on the radiology report impression summarization. Three types of radiology reports, i.e., CT, PET-CT, and Ultr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  33. arXiv:2406.01333  [pdf, other

    cs.CL cs.AI

    Probing Language Models for Pre-training Data Detection

    Authors: Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Haonan Lu, Bing Liu, Wenliang Chen

    Abstract: Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perp… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL-2024 main conference

  34. arXiv:2406.00714  [pdf, other

    cs.CV

    A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving

    Authors: Di Wu, Feng Yang, Benlian Xu, Pan Liao, Bo Liu

    Abstract: With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and faci… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  35. arXiv:2406.00114  [pdf, other

    cs.RO cs.NE

    Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion: An application in 6R robot trajectory planning

    Authors: Bao Liu, Tianbao Liu, Zhongshuo Hu, Fei Ye, Lei Gao

    Abstract: The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynam… ▽ More

    Submitted 7 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  36. arXiv:2405.19414  [pdf, other

    cs.LG

    Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning

    Authors: Alexander Politowicz, Sahisnu Mazumder, Bing Liu

    Abstract: Designing Reinforcement Learning (RL) solutions for real-life problems remains a significant challenge. A major area of concern is safety. "Shielding" is a popular technique to enforce safety in RL by turning user-defined safety specifications into safe agent behavior. However, these methods either suffer from extreme learning delays, demand extensive human effort in designing models and safe doma… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 pages, 3 figures

  37. arXiv:2405.19009  [pdf, other

    cs.CV

    Enhancing Vision-Language Model with Unmasked Token Alignment

    Authors: Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu, Hongsheng Li

    Abstract: Contrastive pre-training on image-text pairs, exemplified by CLIP, becomes a standard technique for learning multi-modal visual-language representations. Although CLIP has demonstrated remarkable performance, training it from scratch on noisy web-scale datasets is computationally demanding. On the other hand, mask-then-predict pre-training approaches, like Masked Image Modeling (MIM), offer effici… ▽ More

    Submitted 14 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by TMLR; Code and models are available at https://github.com/jihaonew/UTA

  38. arXiv:2405.18991  [pdf, other

    cs.CV cs.CL cs.MM

    EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

    Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

    Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More

    Submitted 5 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

  39. arXiv:2405.17233  [pdf, other

    cs.LG

    CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

    Authors: Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

    Abstract: Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantizatio… ▽ More

    Submitted 2 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  40. arXiv:2405.16436  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

    Authors: Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang

    Abstract: Aligning generative models with human preference via RLHF typically suffers from overoptimization, where an imperfectly learned reward model can misguide the generative model to output undesired responses. We investigate this problem in a principled manner by identifying the source of the misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate over… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures

  41. arXiv:2405.16270  [pdf, other

    cs.CC

    Complexity of Multiple-Hamiltonicity in Graphs of Bounded Degree

    Authors: Brian Liu, Nathan S. Sheffield, Alek Westover

    Abstract: We study the following generalization of the Hamiltonian cycle problem: Given integers $a,b$ and graph $G$, does there exist a closed walk in $G$ that visits every vertex at least $a$ times and at most $b$ times? Equivalently, does there exist a connected $[2a,2b]$ factor of $2b \cdot G$ with all degrees even? This problem is NP-hard for any constants $1 \leq a \leq b$. However, the graphs produce… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 16 pages

  42. arXiv:2405.16178  [pdf, other

    cs.CL

    Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

    Authors: Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen

    Abstract: Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse R… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  43. arXiv:2405.14333  [pdf, other

    cs.AI

    DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

    Authors: Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang

    Abstract: Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data. To address this issue, we introduce an approach to generate extensive Lean 4 proof data derived from high-school and u… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  44. arXiv:2405.13057  [pdf, other

    cs.SE cs.AI

    Can Github issues be solved with Tree Of Thoughts?

    Authors: Ricardo La Rosa, Corey Hulse, Bangdi Liu

    Abstract: While there have been extensive studies in code generation by large language models (LLM), where benchmarks like HumanEval have been surpassed with an impressive 96.3% success rate, these benchmarks predominantly judge a model's performance on basic function-level code generation and lack the critical thinking and concept of scope required of real-world scenarios such as solving GitHub issues. Thi… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 8 pages, 2 figures, 7 tables

  45. arXiv:2405.12540  [pdf, other

    cs.CV cs.MM

    Context-Enhanced Video Moment Retrieval with Large Language Models

    Authors: Weijia Liu, Bo Miao, Jiuxin Cao, Xuelin Zhu, Bo Liu, Mehwish Nasim, Ajmal Mian

    Abstract: Current methods for Video Moment Retrieval (VMR) struggle to align complex situations involving specific environmental details, character descriptions, and action narratives. To tackle this issue, we propose a Large Language Model-guided Moment Retrieval (LMR) approach that employs the extensive knowledge of Large Language Models (LLMs) to improve video context representation as well as cross-moda… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  46. arXiv:2405.12487  [pdf, other

    cs.CV eess.IV

    3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification

    Authors: Yan He, Bing Tu, Bo Liu, Jun Li, Antonio Plaza

    Abstract: Hyperspectral image (HSI) classification constitutes the fundamental research in remote sensing fields. Convolutional Neural Networks (CNNs) and Transformers have demonstrated impressive capability in capturing spectral-spatial contextual dependencies. However, these architectures suffer from limited receptive fields and quadratic computational complexity, respectively. Fortunately, recent Mamba a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  47. arXiv:2405.12120  [pdf, other

    cs.DC cs.NI

    EdgeLoc: A Communication-Adaptive Parallel System for Real-Time Localization in Infrastructure-Assisted Autonomous Driving

    Authors: Boyi Liu, Jingwen Tong, Yufan Zhuang

    Abstract: This paper presents EdgeLoc, an infrastructure-assisted, real-time localization system for autonomous driving that addresses the incompatibility between traditional localization methods and deep learning approaches. The system is built on top of the Robot Operating System (ROS) and combines the real-time performance of traditional methods with the high accuracy of deep learning approaches. The sys… ▽ More

    Submitted 8 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  48. arXiv:2405.10497  [pdf, other

    cs.MM cs.AI cs.CV cs.SI

    SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge

    Authors: Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo

    Abstract: Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts of multimodal data available on social media platforms. Studying and investigating social media popularity becomes central to various online applications and requires novel methods of comprehensive analysis, multimodal comprehension, a… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: ACM Multimedia. arXiv admin note: text overlap with arXiv:1910.01795

  49. arXiv:2405.09298  [pdf

    eess.IV cs.CV

    Deep Blur Multi-Model (DeepBlurMM) -- a strategy to mitigate the impact of image blur on deep learning model performance in histopathology image analysis

    Authors: Yujie Xiang, Bojing Liu, Mattias Rantalainen

    Abstract: AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality, including unsharp areas of WSIs, impacts model performance. We investigate the impact of blur and propose a multi-model approach to mitigate negative impact of unsharp image areas. In this study, we use a simulation approach, evaluating model performance under varying levels… ▽ More

    Submitted 23 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    ACM Class: I.4; J.3

  50. Financial Table Extraction in Image Documents

    Authors: William Watson, Bo Liu

    Abstract: Table extraction has long been a pervasive problem in financial services. This is more challenging in the image domain, where content is locked behind cumbersome pixel format. Luckily, advances in deep learning for image segmentation, OCR, and sequence modeling provides the necessary heavy lifting to achieve impressive results. This paper presents an end-to-end pipeline for identifying, extracting… ▽ More

    Submitted 18 March, 2024; originally announced May 2024.