Skip to main content

Showing 1–50 of 160 results for author: Wen, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10173  [pdf, other

    cs.DC

    StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications

    Authors: Linfeng Wen, Minxian Xu, Sukhpal Singh Gill, Muhammad Hafizhuddin Hilman, Satish Narayana Srirama, Kejiang Ye, Chengzhong Xu

    Abstract: Microservice architecture has transformed traditional monolithic applications into lightweight components. Scaling these lightweight microservices is more efficient than scaling servers. However, scaling microservices still faces the challenges resulted from the unexpected spikes or bursts of requests, which are difficult to detect and can degrade performance instantaneously. To address this chall… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 26 pages

    Journal ref: ACM Transactions on Autonomous and Adaptive Systems, 2024

  2. arXiv:2407.05688  [pdf

    cs.CV cs.AI

    Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition

    Authors: Yuxiang Yang, Lu Wen, Xinyi Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Facial Expression Recognition (FER) holds significant importance in human-computer interactions. Existing cross-domain FER methods often transfer knowledge solely from a single labeled source domain to an unlabeled target domain, neglecting the comprehensive information across multiple sources. Nevertheless, cross-multidomain FER (CMFER) is very challenging for (i) the inherent inter-domain shifts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2406.10484  [pdf, other

    cs.CV

    Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

    Authors: Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen

    Abstract: The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos in real-world applications are edited videos, \textit{e.g.}, users usually cut and add effects/modifications to the raw video before publishing it on s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.07444  [pdf, other

    cs.CL

    On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

    Authors: Shiao Meng, Xuming Hu, Aiwei Liu, Fukun Ma, Yawen Yang, Shuang Li, Lijie Wen

    Abstract: Driven by the demand for cross-sentence and large-scale relation extraction, document-level relation extraction (DocRE) has attracted increasing research interest. Despite the continuous improvement in performance, we find that existing DocRE models which initially perform well may make more mistakes when merely changing the entity names in the document, hindering the generalization to novel entit… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

    MSC Class: 68T50 ACM Class: I.2.7

  6. arXiv:2406.00415  [pdf, other

    cs.AI

    Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

    Authors: Xuan Wu, Di Wang, Lijie Wen, Yubin Xiao, Chunguo Wu, Yuesong Wu, Chaoyu Yu, Douglas L. Maskell, You Zhou

    Abstract: Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted. These existing surveys did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to provide a comprehensive taxonomy of NCO solvers with up-to-date coverage, based on our thorough review of relevant publicati… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  7. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  8. arXiv:2405.12635  [pdf, other

    cs.DC

    TempoScale: A Cloud Workloads Prediction Approach Integrating Short-Term and Long-Term Information

    Authors: Linfeng Wen, Minxian Xu, Adel N. Toosi, Kejiang Ye

    Abstract: Cloud native solutions are widely applied in various fields, placing higher demands on the efficient management and utilization of resource platforms. To achieve the efficiency, load forecasting and elastic scaling have become crucial technologies for dynamically adjusting cloud resources to meet user demands and minimizing resource waste. However, existing prediction-based methods lack comprehens… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 11pages, 11 figures, 4 tables

    Journal ref: In proceedings of IEEE CLOUD 2024

  9. arXiv:2405.10051  [pdf, other

    cs.CR cs.CL

    MarkLLM: An Open-Source Toolkit for LLM Watermarking

    Authors: Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King

    Abstract: LLM watermarking, which embeds imperceptible yet algorithmically detectable signals in model outputs to identify LLM-generated text, has become crucial in mitigating the potential misuse of large language models. However, the abundance of LLM watermarking algorithms, their intricate mechanisms, and the complex evaluation procedures and perspectives pose challenges for researchers and the community… ▽ More

    Submitted 24 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: 16 pages, 5 figures, 6 tables

    MSC Class: 68T50 ACM Class: I.2.7

  10. arXiv:2405.05949  [pdf, other

    cs.CV

    CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

    Authors: Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen

    Abstract: Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Exp… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  11. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  12. arXiv:2404.14696  [pdf

    cs.CV

    Adaptive Prompt Learning with Negative Textual Semantics and Uncertainty Modeling for Universal Multi-Source Domain Adaptation

    Authors: Yuxiang Yang, Lu Wen, Yuanyuan Xu, Jiliu Zhou, Yan Wang

    Abstract: Universal Multi-source Domain Adaptation (UniMDA) transfers knowledge from multiple labeled source domains to an unlabeled target domain under domain shifts (different data distribution) and class shifts (unknown target classes). Existing solutions focus on excavating image features to detect unknown samples, ignoring abundant information contained in textual semantics. In this paper, we propose a… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by ICME2024

  13. arXiv:2404.12753  [pdf, other

    cs.CL cs.AI

    AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

    Authors: Wenhao Huang, Chenghao Peng, Zhixu Li, Jiaqing Liang, Yanghua Xiao, Liqian Wen, Zulong Chen

    Abstract: Web automation is a significant technique that accomplishes complicated web tasks by automating common web actions, enhancing operational efficiency, and reducing the need for manual intervention. Traditional methods, such as wrappers, suffer from limited adaptability and scalability when faced with a new website. On the other hand, generative agents empowered by large language models (LLMs) exhib… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 18 pages, 5 figures

  14. arXiv:2404.12683  [pdf, other

    cs.RO

    A Containerized Microservice Architecture for a ROS 2 Autonomous Driving Software: An End-to-End Latency Evaluation

    Authors: Tobias Betz, Long Wen, Fengjunjie Pan, Gemb Kaljavesi, Alexander Zuepke, Andrea Bastoni, Marco Caccamo, Alois Knoll, Johannes Betz

    Abstract: The automotive industry is transitioning from traditional ECU-based systems to software-defined vehicles. A central role of this revolution is played by containers, lightweight virtualization technologies that enable the flexible consolidation of complex software applications on a common hardware platform. Despite their widespread adoption, the impact of containerization on fundamental real-time m… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  15. arXiv:2403.20026  [pdf, other

    cs.CV cs.CL

    FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues

    Authors: Shuang Li, Jiahua Wang, Lijie Wen

    Abstract: Multi-modal reasoning plays a vital role in bridging the gap between textual and visual information, enabling a deeper understanding of the context. This paper presents the Feature Swapping Multi-modal Reasoning (FSMR) model, designed to enhance multi-modal reasoning through feature swapping. FSMR leverages a pre-trained visual-language model as an encoder, accommodating both text and image inputs… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  16. arXiv:2403.19078  [pdf, other

    cs.CV cs.AI

    MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck

    Authors: Liangjian Wen, Xiasi Wang, Jianzhuang Liu, Zenglin Xu

    Abstract: Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies sh… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by TPAMI

  17. arXiv:2403.16048  [pdf, other

    cs.CV

    Edit3K: Universal Representation Learning for Video Editing Components

    Authors: Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, Yufei Wang, Tiejian Luo, Sijie Zhu

    Abstract: This paper focuses on understanding the predominant video creation pipeline, i.e., compositional video editing with six main types of editing components, including video effects, animation, transition, filter, sticker, and text. In contrast to existing visual representation learning of visual materials (i.e., images/videos), we aim to learn visual representations of editing actions/components that… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  18. arXiv:2403.12370  [pdf, other

    cs.CV

    XPose: eXplainable Human Pose Estimation

    Authors: Luyu Qiu, Jianing Li, Lei Wen, Chi Su, Fei Hao, Chen Jason Zhang, Lei Chen

    Abstract: Current approaches in pose estimation primarily concentrate on enhancing model architectures, often overlooking the importance of comprehensively understanding the rationale behind model decisions. In this paper, we propose XPose, a novel framework that incorporates Explainable AI (XAI) principles into pose estimation. This integration aims to elucidate the individual contribution of each keypoint… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  19. arXiv:2403.12077  [pdf, other

    cs.CL cs.AI cs.IR

    Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions

    Authors: Xuming Hu, Xiaochuan Li, Junzhe Chen, Yinghui Li, Yangning Li, Xiaoguang Li, Yasheng Wang, Qun Liu, Lijie Wen, Philip S. Yu, Zhijiang Guo

    Abstract: Generative search engines have the potential to transform how people seek information online, but generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate. Nonetheless, retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable par… ▽ More

    Submitted 25 February, 2024; originally announced March 2024.

    Comments: 21 pages, 7 figures, 4 tables

  20. Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised Multi-Organ Segmentation

    Authors: Lu Wen, Zhenghao Feng, Yun Hou, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Semi-supervised learning is a sound measure to relieve the strict demand of abundant annotated datasets, especially for challenging multi-organ segmentation . However, most existing SSL methods predict pixels in a single image independently, ignoring the relations among images and categories. In this paper, we propose a two-stage Dual Contrastive Learning Network for semi-supervised MoS, which uti… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Published at ICASSP 2024

  21. arXiv:2403.02574  [pdf, other

    cs.IR cs.AI cs.CL

    ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary

    Authors: Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, Lijie Wen

    Abstract: The literature review is an indispensable step in the research process. It provides the benefit of comprehending the research problem and understanding the current research situation while conducting a comparative analysis of prior works. However, literature summary is challenging and time consuming. The previous LLM-based studies on literature review mainly focused on the complete process, includ… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 18 pages, 5 figures

    MSC Class: 68T50 ACM Class: I.2.7

  22. arXiv:2403.00869  [pdf, other

    cs.LG stat.ML

    Enhancing Multivariate Time Series Forecasting with Mutual Information-driven Cross-Variable and Temporal Modeling

    Authors: Shiyi Qi, Liangjian Wen, Yiduo Li, Yuanhang Yang, Zhe Li, Zhongwen Rao, Lujia Pan, Zenglin Xu

    Abstract: Recent advancements have underscored the impact of deep learning techniques on multivariate time series forecasting (MTSF). Generally, these techniques are bifurcated into two categories: Channel-independence and Channel-mixing approaches. Although Channel-independence methods typically yield better results, Channel-mixing could theoretically offer improvements by leveraging inter-variable correla… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  23. arXiv:2403.00510  [pdf, other

    cs.CL cs.AI

    ROME: Memorization Insights from Text, Logits and Representation

    Authors: Bo Li, Qinghua Zhao, Lijie Wen

    Abstract: Previous works have evaluated memorization by comparing model outputs with training corpora, examining how factors such as data duplication, model size, and prompt length influence memorization. However, analyzing these extensive training corpora is highly time-consuming. To address this challenge, this paper proposes an innovative approach named ROME that bypasses direct processing of the trainin… ▽ More

    Submitted 16 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Submitted to EMNLP, 2024

  24. arXiv:2402.18946  [pdf, other

    cs.LG eess.SY

    Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

    Authors: Yu Zhang, Long Wen, Xiangtong Yao, Zhenshan Bing, Linghuan Kong, Wei He, Alois Knoll

    Abstract: This paper presents an adaptive online learning framework for systems with uncertain parameters to ensure safety-critical control in non-stationary environments. Our approach consists of two phases. The initial phase is centered on a novel sparse Gaussian process (GP) framework. We first integrate a forgetting factor to refine a variational sparse GP algorithm, thus enhancing its adaptability. Sub… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  25. arXiv:2402.16913  [pdf, other

    cs.LG

    PDETime: Rethinking Long-Term Multivariate Time Series Forecasting from the perspective of partial differential equations

    Authors: Shiyi Qi, Zenglin Xu, Yiduo Li, Liangjian Wen, Qingsong Wen, Qifan Wang, Yuan Qi

    Abstract: Recent advancements in deep learning have led to the development of various models for long-term multivariate time-series forecasting (LMTF), many of which have shown promising results. Generally, the focus has been on historical-value-based models, which rely on past observations to predict future series. Notably, a new trend has emerged with time-index-based models, offering a more nuanced under… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  26. arXiv:2402.16499  [pdf, other

    cs.CL

    LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

    Authors: Junzhe Chen, Xuming Hu, Shuodi Liu, Shiyu Huang, Wei-Wei Tu, Zhaofeng He, Lijie Wen

    Abstract: Recent advancements in large language models (LLMs) have revealed their potential for achieving autonomous agents possessing human-level intelligence. However, existing benchmarks for evaluating LLM Agents either use static datasets, potentially leading to data leakage or focus only on single-agent scenarios, overlooking the complexities of multi-agent interactions. There is a lack of a benchmark… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  27. arXiv:2402.16449  [pdf, other

    cs.RO cs.AI

    Online Efficient Safety-Critical Control for Mobile Robots in Unknown Dynamic Multi-Obstacle Environments

    Authors: Yu Zhang, Guangyao Tian, Long Wen, Xiangtong Yao, Liding Zhang, Zhenshan Bing, Wei He, Alois Knoll

    Abstract: This paper proposes a LiDAR-based goal-seeking and exploration framework, addressing the efficiency of online obstacle avoidance in unstructured environments populated with static and moving obstacles. This framework addresses two significant challenges associated with traditional dynamic control barrier functions (D-CBFs): their online construction and the diminished real-time performance caused… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  28. arXiv:2402.16299  [pdf, other

    cs.IR cs.LG

    Against Filter Bubbles: Diversified Music Recommendation via Weighted Hypergraph Embedding Learning

    Authors: Chaoguang Luo, Liuying Wen, Yong Qin, Liangwei Yang, Zhineng Hu, Philip S. Yu

    Abstract: Recommender systems serve a dual purpose for users: sifting out inappropriate or mismatched information while accurately identifying items that align with their preferences. Numerous recommendation algorithms are designed to provide users with a personalized array of information tailored to their preferences. Nevertheless, excessive personalization can confine users within a "filter bubble". Conse… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  29. arXiv:2402.11907  [pdf, other

    cs.CL

    Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation

    Authors: Aiwei Liu, Haoping Bai, Zhiyun Lu, Xiang Kong, Simon Wang, Jiulong Shan, Meng Cao, Lijie Wen

    Abstract: Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an aut… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 24 pages, 5 pages

    MSC Class: 68T50 ACM Class: I.2.7

  30. arXiv:2402.08917  [pdf, other

    cs.DC

    An Interference-aware Approach for Co-located Container Orchestration with Novel Metric

    Authors: Xiang Li, Linfeng Wen, Minxian Xu, Kejiang Ye

    Abstract: Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 8 pages

    Journal ref: In the Proceedings of IEEE SmartData 2023

  31. arXiv:2402.08221  [pdf, other

    cs.RO cs.CV

    MetaTra: Meta-Learning for Generalized Trajectory Prediction in Unseen Domain

    Authors: Xiaohe Li, Feilong Huang, Zide Fan, Fangli Mou, Yingyan Hou, Chen Qian, Lijie Wen

    Abstract: Trajectory prediction has garnered widespread attention in different fields, such as autonomous driving and robotic navigation. However, due to the significant variations in trajectory patterns across different scenarios, models trained in known environments often falter in unseen ones. To learn a generalized model that can directly handle unseen domains without requiring any model updating, we pr… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  32. arXiv:2402.04566  [pdf

    eess.IV cs.CV

    Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

    Authors: Lu Wen, Qihun Zhang, Zhenghao Feng, Yuanyuan Xu, Xiao Chen, Jiliu Zhou, Yan Wang

    Abstract: Radiotherapy is a primary treatment for cancers with the aim of applying sufficient radiation dose to the planning target volume (PTV) while minimizing dose hazards to the organs at risk (OARs). Convolutional neural networks (CNNs) have automated the radiotherapy plan-making by predicting the dose maps. However, current CNN-based methods ignore the remarkable dose difference in the dose map, i.e.,… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: accepted by 2024 IEEE ISBI

  33. arXiv:2402.03830  [pdf, other

    cs.CV

    OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

    Authors: Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, Xing Gao, Xinyu Cai, Bo Zhang, Xuemeng Yang, Yeqi Bai, Hongbin Zhou, Botian Shi

    Abstract: With deep learning and computer vision technology development, autonomous driving provides new solutions to improve traffic safety and efficiency. The importance of building high-quality datasets is self-evident, especially with the rise of end-to-end autonomous driving algorithms in recent years. Data plays a core role in the algorithm closed-loop system. However, collecting real-world data is ex… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 10 pages, 9 figures

  34. arXiv:2402.01246  [pdf, other

    cs.RO eess.SY

    LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

    Authors: Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao

    Abstract: The emergence of Multimodal Large Language Models ((M)LLMs) has ushered in new avenues in artificial intelligence, particularly for autonomous driving by offering enhanced understanding and reasoning capabilities. This paper introduces LimSim++, an extended version of LimSim designed for the application of (M)LLMs in autonomous driving. Acknowledging the limitations of existing simulation platform… ▽ More

    Submitted 12 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted by 35th IEEE Intelligent Vehicles Symposium (IV 2024)

  35. Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction

    Authors: Jiaqi Cui, Yan Wang, Lu Wen, Pinxian Zeng, Xi Wu, Jiliu Zhou, Dinggang Shen

    Abstract: To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, le… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted by ICASSP 2024

  36. arXiv:2312.07913  [pdf, other

    cs.CL

    A Survey of Text Watermarking in the Era of Large Language Models

    Authors: Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, Philip S. Yu

    Abstract: Text watermarking algorithms play a crucial role in the copyright protection of textual content, yet their capabilities and application scenarios have been limited historically. The recent developments in large language models (LLMs) have opened new opportunities for the advancement of text watermarking techniques. LLMs not only enhance the capabilities of text watermarking algorithms through thei… ▽ More

    Submitted 23 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 35 pages, 7 figures

    MSC Class: 68T50 ACM Class: I.2.7

  37. arXiv:2312.04316  [pdf, other

    cs.RO cs.AI cs.CV

    Towards Knowledge-driven Autonomous Driving

    Authors: Xin Li, Yeqi Bai, Pinlong Cai, Licheng Wen, Daocheng Fu, Bo Zhang, Xuemeng Yang, Xinyu Cai, Tao Ma, Jianfei Guo, Xing Gao, Min Dou, Yikang Li, Botian Shi, Yong Liu, Liang He, Yu Qiao

    Abstract: This paper explores the emerging knowledge-driven autonomous driving technologies. Our investigation highlights the limitations of current autonomous driving systems, in particular their sensitivity to data bias, difficulty in handling long-tail scenarios, and lack of interpretability. Conversely, knowledge-driven methods with the abilities of cognition, generalization and life-long learning emerg… ▽ More

    Submitted 27 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

  38. arXiv:2311.06673  [pdf, other

    cs.LG cs.AI cs.RO

    Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

    Authors: Lu Wen, Songan Zhang, H. Eric Tseng, Huei Peng

    Abstract: Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  39. arXiv:2311.05332  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

    Authors: Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

    Abstract: The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of… ▽ More

    Submitted 28 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

  40. arXiv:2311.02991  [pdf

    cs.CV

    Diffusion-based Radiotherapy Dose Prediction Guided by Inter-slice Aware Structure Encoding

    Authors: Zhenghao Feng, Lu Wen, Jianghong Xiao, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Xingchen Peng, Yan Wang

    Abstract: Deep learning (DL) has successfully automated dose distribution prediction in radiotherapy planning, enhancing both efficiency and quality. However, existing methods suffer from the over-smoothing problem for their commonly used L1 or L2 loss with posterior average calculations. To alleviate this limitation, we propose a diffusion model-based method (DiffDose) for predicting the radiotherapy dose… ▽ More

    Submitted 9 July, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  41. arXiv:2310.16822  [pdf, other

    cs.CL cs.AI cs.MM

    Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction

    Authors: Xuming Hu, Junzhe Chen, Aiwei Liu, Shiao Meng, Lijie Wen, Philip S. Yu

    Abstract: How can we better extract entities and relations from text? Using multimodal extraction with images and text obtains more signals for entities and relations, and aligns them through graphs or hierarchical fusion, aiding in extraction. Despite attempts at various fusions, previous works have overlooked many unlabeled image-caption pairs, such as NewsCLIPing. This paper proposes innovative pre-train… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to ACM Multimedia 2023

  42. arXiv:2310.15743  [pdf, other

    cs.CL

    RAPL: A Relation-Aware Prototype Learning Approach for Few-Shot Document-Level Relation Extraction

    Authors: Shiao Meng, Xuming Hu, Aiwei Liu, Shu'ang Li, Fukun Ma, Yawen Yang, Lijie Wen

    Abstract: How to identify semantic relations among entities in a document when only a few labeled documents are available? Few-shot document-level relation extraction (FSDLRE) is crucial for addressing the pervasive data scarcity problem in real-world scenarios. Metric-based meta-learning is an effective framework widely adopted for FSDLRE, which constructs class prototypes for classification. However, exis… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

    MSC Class: 68T50 ACM Class: I.2.7

  43. arXiv:2310.06356  [pdf, other

    cs.CR cs.CL

    A Semantic Invariant Robust Watermark for Large Language Models

    Authors: Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, Lijie Wen

    Abstract: Watermark algorithms for large language models (LLMs) have achieved extremely high accuracy in detecting text generated by LLMs. Such algorithms typically involve adding extra watermark logits to the LLM's logits at each generation step. However, prior algorithms face a trade-off between attack robustness and security robustness. This is because the watermark logits for a token are determined by a… ▽ More

    Submitted 19 May, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICLR2024, 21 pages, 10 figures, 6 tables

    MSC Class: 68T50 ACM Class: I.2.7

  44. arXiv:2310.05177  [pdf, other

    cs.CL

    Do Large Language Models Know about Facts?

    Authors: Xuming Hu, Junzhe Chen, Xiaochuan Li, Yufei Guo, Lijie Wen, Philip S. Yu, Zhijiang Guo

    Abstract: Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks. The factual knowledge acquired during pretraining and instruction tuning can be useful in various downstream tasks, such as question answering, and language generation. Unlike conventional Knowledge Bases (KBs) that explicitly store factual knowledge, LLMs implic… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 20 pages, 8 figures

  45. arXiv:2309.16292  [pdf, other

    cs.RO cs.CL

    DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

    Authors: Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao

    Abstract: Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an int… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published as a conference paper at ICLR 2024

  46. arXiv:2309.12867  [pdf, other

    cs.CV cs.AI

    Accurate and Fast Compressed Video Captioning

    Authors: Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang

    Abstract: Existing video captioning approaches typically require to first sample video frames from a decoded video and then conduct a subsequent process (e.g., feature extraction and/or captioning model learning). In this pipeline, manual frame sampling may ignore key information in videos and thus degrade performance. Additionally, redundant information in the sampled frames may result in low efficiency in… ▽ More

    Submitted 3 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  47. arXiv:2308.15315  [pdf

    cs.DC

    Practice of Alibaba Cloud on Elastic Resource Provisioning for Large-scale Microservices Cluster

    Authors: Minxian Xu, Lei Yang, Yang Wang, Chengxi Gao, Linfeng Wen, Guoyao Xu, Liping Zhang, Kejiang Ye, Chengzhong Xu

    Abstract: Cloud-native architecture is becoming increasingly crucial for today's cloud computing environments due to the need for speed and flexibility in developing applications. It utilizes microservice technology to break down traditional monolithic applications into light-weight and self-contained microservice components. However, as microservices grow in scale and have dynamic inter-dependencies, they… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 19 pages

    Journal ref: Software: Practice and Experience, 2023

  48. arXiv:2308.12797  [pdf, other

    cs.RO cs.MA eess.SY

    TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search

    Authors: Licheng Wen, Ze Fu, Pinlong Cai, Daocheng Fu, Song Mao, Botian Shi

    Abstract: Digital twins for intelligent transportation systems are currently attracting great interests, in which generating realistic, diverse, and human-like traffic flow in simulations is a formidable challenge. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adapt… ▽ More

    Submitted 31 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  49. arXiv:2307.16230  [pdf, other

    cs.CL

    An Unforgeable Publicly Verifiable Watermark for Large Language Models

    Authors: Aiwei Liu, Leyi Pan, Xuming Hu, Shu'ang Li, Lijie Wen, Irwin King, Philip S. Yu

    Abstract: Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To add… ▽ More

    Submitted 26 May, 2024; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: ICLR2024, 17 pages, 5 figures, 8 tables

    MSC Class: 68T50 ACM Class: I.2.7

  50. arXiv:2307.09794  [pdf

    eess.IV cs.CV physics.med-ph

    DiffDP: Radiotherapy Dose Prediction via a Diffusion Model

    Authors: Zhenghao Feng, Lu Wen, Peng Wang, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Currently, deep learning (DL) has achieved the automatic prediction of dose distribution in radiotherapy planning, enhancing its efficiency and quality. However, existing methods suffer from the over-smoothing problem for their commonly used L_1 or L_2 loss with posterior average calculations. To alleviate this limitation, we innovatively introduce a diffusion-based dose prediction (DiffDP) model… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: to be published in MICCAI 2023