Skip to main content

Showing 1–50 of 1,148 results for author: Li, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13553  [pdf, other

    cs.CV

    SAM-Driven Weakly Supervised Nodule Segmentation with Uncertainty-Aware Cross Teaching

    Authors: Xingyue Zhao, Peiqi Li, Xiangde Luo, Meng Yang, Shi Chang, Zhongyu Li

    Abstract: Automated nodule segmentation is essential for computer-assisted diagnosis in ultrasound images. Nevertheless, most existing methods depend on precise pixel-level annotations by medical professionals, a process that is both costly and labor-intensive. Recently, segmentation foundation models like SAM have shown impressive generalizability on natural images, suggesting their potential as pseudo-lab… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ISBI 2024 Oral

  2. arXiv:2407.13284  [pdf, other

    cs.IR

    Semantic-aware Representation Learning for Homography Estimation

    Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

    Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  4. arXiv:2407.11522  [pdf, other

    cs.CV

    FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

    Authors: Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li

    Abstract: Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction. In this paper, we build FIRE, a feedback-refinement dataset, consisting of 1.1M multi-turn conversations that are derived from 27 source datasets, empowering VLMs to spontaneously refine their responses based on user feedback across diverse tasks. To scale up the data c… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  5. Incremental high average-utility itemset mining: survey and challenges

    Authors: Jing Chen, Shengyi Yang, Weiping Ding, Peng Li, Aijun Liu, Hongjun Zhang, Tian Li

    Abstract: The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 25 pages, 23 figures

  6. arXiv:2407.09057  [pdf, other

    cs.CV

    PersonificationNet: Making customized subject act like a person

    Authors: Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua

    Abstract: Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a Personifi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  7. arXiv:2407.08109  [pdf, other

    cs.CV cs.AI cs.LG

    Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter

    Authors: Suqi Song, Chenxu Zhang, Peng Zhang, Pengkun Li, Fenglong Song, Lei Zhang

    Abstract: Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  8. arXiv:2407.06394  [pdf, other

    cs.RO cs.MA

    Modeling and Analysis of Multi-Line Orders in Multi-Tote Storage and Retrieval Autonomous Mobile Robot Systems

    Authors: Xiaotao Shan, Yichao Jin, Peizheng Li, Koichi Kondo

    Abstract: As warehouses are emphasizing space utilization and the ability to handle multi-line orders, multi-tote storage and retrieval (MTSR) autonomous mobile robot systems, where robots directly retrieve totes from high shelves, are becoming increasingly popular. This paper presents a novel shared-token, multi-class, semi-open queueing network model to account for multi-line orders with general distribut… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures. This paper has been accepted for publication in IEEE 20th International Conference on Automation Science and Engineering (IEEE CASE 2024)

  9. arXiv:2407.06196  [pdf, other

    cs.CV cs.AI

    Poetry2Image: An Iterative Correction Framework for Images Generated from Chinese Classical Poetry

    Authors: Jing Jiang, Yiran Ling, Binzhu Li, Pengxiang Li, Junming Piao, Yu Zhang

    Abstract: Text-to-image generation models often struggle with key element loss or semantic confusion in tasks involving Chinese classical poetry.Addressing this issue through fine-tuning models needs considerable training costs. Additionally, manual prompts for re-diffusion adjustments need professional knowledge. To solve this problem, we propose Poetry2Image, an iterative correction framework for images g… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures

  10. arXiv:2407.06128  [pdf

    cs.CV

    Towards SAR Automatic Target Recognition MultiCategory SAR Image Classification Based on Light Weight Vision Transformer

    Authors: Guibin Zhao, Pengfei Li, Zhibo Zhang, Fusen Guo, Xueting Huang, Wei Xu, Jinyin Wang, Jianlong Chen

    Abstract: Synthetic Aperture Radar has been extensively used in numerous fields and can gather a wealth of information about the area of interest. This large scene data intensive technology puts a high value on automatic target recognition which can free the utilizers and boost the efficiency. Recent advances in artificial intelligence have made it possible to create a deep learning based SAR ATR that can a… ▽ More

    Submitted 9 July, 2024; v1 submitted 18 May, 2024; originally announced July 2024.

  11. Neural Garment Dynamics via Manifold-Aware Transformers

    Authors: Peizhuo Li, Tuanfeng Y. Wang, Timur Levent Kesdogan, Duygu Ceylan, Olga Sorkine-Hornung

    Abstract: Data driven and learning based solutions for modeling dynamic garments have significantly advanced, especially in the context of digital humans. However, existing approaches often focus on modeling garments with respect to a fixed parametric human body model and are limited to garment geometries that were seen during training. In this work, we take a different approach and model the dynamics of a… ▽ More

    Submitted 13 May, 2024; originally announced July 2024.

    Comments: EUROGRAPHICS 2024. Project page: https://peizhuoli.github.io/manifold-aware-transformers/ Video: https://www.youtube.com/watch?v=v6FCTHmjyqI

  12. arXiv:2407.05176  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Towards Socially and Environmentally Responsible AI

    Authors: Pengfei Li, Yejia Liu, Jianyi Yang, Shaolei Ren

    Abstract: The sharply increasing sizes of artificial intelligence (AI) models come with significant energy consumption and environmental footprints, which can disproportionately impact certain (often marginalized) regions and hence create environmental inequity concerns. Moreover, concerns with social inequity have also emerged, as AI computing resources may not be equitably distributed across the globe and… ▽ More

    Submitted 22 April, 2024; originally announced July 2024.

    Comments: Presented at HotEthics 2024 (co-located with ASPLOS' 24)

  13. arXiv:2407.03580  [pdf

    cs.IR

    Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems

    Authors: Pan Li, Alexander Tuzhilin

    Abstract: Optimizing multiple objectives simultaneously is an important task for recommendation platforms to improve their performance. However, this task is particularly challenging since the relationships between different objectives are heterogeneous across different consumers and dynamically fluctuating according to different contexts. Especially in those cases when objectives become conflicting with ea… ▽ More

    Submitted 9 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  14. arXiv:2407.01897  [pdf, other

    cs.CL

    Proposal Report for the 2nd SciCAP Competition 2024

    Authors: Pengpeng Li, Tingmin Li, Jingyuan Wang, Boyuan Wang, Yang Yang

    Abstract: In this paper, we propose a method for document summarization using auxiliary information. This approach effectively summarizes descriptions related to specific images, tables, and appendices within lengthy texts. Our experiments demonstrate that leveraging high-quality OCR data and initially extracted information from the original text enables efficient summarization of the content related to des… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2407.01702  [pdf, other

    cs.CV cs.RO

    SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

    Authors: Qingwen Zhang, Yi Yang, Peizheng Li, Olov Andersson, Patric Jensfelt

    Abstract: Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. This detailed, point-level, information can help autonomous vehicles to accurately predict and understand dynamic changes in their surroundings. Current state-of-the-art methods require annotated data to train scene flow networks and the expense of labeling inherently limits their scalability. Self-supervised app… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 25 pages (14 main pages + 11 supp materail), 5 figures

  16. arXiv:2407.00849  [pdf, other

    cs.LG

    Towards Understanding Sensitive and Decisive Patterns in Explainable AI: A Case Study of Model Interpretation in Geometric Deep Learning

    Authors: Jiajun Zhu, Siqi Miao, Rex Ying, Pan Li

    Abstract: The interpretability of machine learning models has gained increasing attention, particularly in scientific domains where high precision and accountability are crucial. This research focuses on distinguishing between two critical data patterns -- sensitive patterns (model-related) and decisive patterns (task-related) -- which are commonly used as model interpretations but often lead to confusion.… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  17. Generative Iris Prior Embedded Transformer for Iris Restoration

    Authors: Yubo Huang, Jia Wang, Peipei Li, Liuyu Xiang, Peigang Li, Zhaofeng He

    Abstract: Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/sawyercharlton/Gformer

    Journal ref: 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 510-515

  18. arXiv:2407.00077  [pdf, other

    cs.IR cs.CR cs.LG

    Differentially Private Graph Diffusion with Applications in Personalized PageRanks

    Authors: Rongzhe Wei, Eli Chien, Pan Li

    Abstract: Graph diffusion, which iteratively propagates real-valued substances among the graph, is used in numerous graph/network-involved applications. However, releasing diffusion vectors may reveal sensitive linking information in the data such as transaction information in financial network data. However, protecting the privacy of graph data is challenging due to its interconnected nature. This work pro… ▽ More

    Submitted 1 July, 2024; v1 submitted 22 June, 2024; originally announced July 2024.

  19. arXiv:2406.20030  [pdf, other

    cs.CL

    LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models

    Authors: Renzhi Wang, Piji Li

    Abstract: Large language models (LLMs) require continual knowledge updates to stay abreast of the ever-changing world facts, prompting the formulation of lifelong model editing task. While recent years have witnessed the development of various techniques for single and batch editing, these methods either fail to apply or perform sub-optimally when faced with lifelong editing. In this paper, we introduce LEM… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  20. arXiv:2406.18770  [pdf, other

    cs.LG

    ADO-LLM: Analog Design Bayesian Optimization with In-Context Learning of Large Language Models

    Authors: Yuxuan Yin, Yu Wang, Boxun Xu, Peng Li

    Abstract: Analog circuit design requires substantial human expertise and involvement, which is a significant roadblock to design productivity. Bayesian Optimization (BO), a popular machine learning based optimization strategy, has been leveraged to automate analog design given its applicability across various circuit topologies and technologies. Traditional BO methods employ black box Gaussian Process surro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  21. arXiv:2406.18536  [pdf, other

    eess.SY cs.AI cs.AR

    Reliable Interval Prediction of Minimum Operating Voltage Based on On-chip Monitors via Conformalized Quantile Regression

    Authors: Yuxuan Yin, Xiaoxiao Wang, Rebecca Chen, Chen He, Peng Li

    Abstract: Predicting the minimum operating voltage ($V_{min}$) of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current $V_{min}$ prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertaintie… ▽ More

    Submitted 3 May, 2024; originally announced June 2024.

    Comments: Accepted by DATE 2024. Camera-ready version

  22. arXiv:2406.13719  [pdf, other

    cs.CV

    GUI Action Narrator: Where and When Did That Action Take Place?

    Authors: Qinchen Wu, Difei Gao, Kevin Qinghong Lin, Zhuoyu Wu, Xiangwu Guo, Peiran Li, Weichen Zhang, Hengxu Wang, Mike Zheng Shou

    Abstract: The advent of Multimodal LLMs has significantly enhanced image OCR recognition capabilities, making GUI automation a viable reality for increasing efficiency in digital tasks. One fundamental aspect of developing a GUI automation system is understanding primitive GUI actions. This comprehension is crucial as it enables agents to learn from user demonstrations, an essential element of automation. T… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  23. arXiv:2406.13450  [pdf, other

    cs.AI

    Federating to Grow Transformers with Constrained Resources without Model Sharing

    Authors: Shikun Shen, Yifei Zou, Yuan Yuan, Yanwei Zheng, Peng Li, Xiuzhen Cheng, Dongxiao Yu

    Abstract: The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to h… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  24. arXiv:2406.13351  [pdf, other

    cs.LG cs.AI cs.DC

    A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments

    Authors: Ruirui Zhang, Xingze Wu, Yifei Zou, Zhenzhen Xie, Peng Li, Xiuzhen Cheng, Dongxiao Yu

    Abstract: The paper studies a fundamental federated learning (FL) problem involving multiple clients with heterogeneous constrained resources. Compared with the numerous training parameters, the computing and communication resources of clients are insufficient for fast local training and real-time knowledge sharing. Besides, training on clients with heterogeneous resources may result in the straggler proble… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  25. arXiv:2406.12527  [pdf, other

    cs.CL

    FuseGen: PLM Fusion for Data-generation based Zero-shot Learning

    Authors: Tianyuan Zou, Yang Liu, Peng Li, Jianqing Zhang, Jingjing Liu, Ya-Qin Zhang

    Abstract: Data generation-based zero-shot learning, although effective in training Small Task-specific Models (STMs) via synthetic datasets generated by Pre-trained Language Models (PLMs), is often limited by the low quality of such synthetic datasets. Previous solutions have primarily focused on single PLM settings, where synthetic datasets are typically restricted to specific sub-spaces and often deviate… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, 12 tabels

  26. arXiv:2406.12292  [pdf, other

    cs.SD cs.AI eess.AS

    JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

    Authors: Boyu Chen, Peike Li, Yao Yao, Alex Wang

    Abstract: Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  27. arXiv:2406.12224  [pdf, other

    cs.RO

    Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

    Authors: Xinzhu Liu, Peiyan Li, Wenju Yang, Di Guo, Huaping Liu

    Abstract: Compared with the widely investigated homogeneous multi-robot collaboration, heterogeneous robots with different capabilities can provide a more efficient and flexible collaboration for more complex tasks. In this paper, we consider a more challenging heterogeneous ad hoc teamwork collaboration problem where an ad hoc robot joins an existing heterogeneous team for a shared goal. Specifically, the… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 20 pages

  28. arXiv:2406.12199  [pdf, other

    cs.LG cs.AI

    Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers

    Authors: Haowei Ni, Shuchen Meng, Xieming Geng, Panfeng Li, Zhuoying Li, Xupeng Chen, Xiaotong Wang, Shiyao Zhang

    Abstract: Cardiovascular disease (CVD) is a leading cause of death globally, necessitating precise forecasting models for monitoring vital signs like heart rate, blood pressure, and ECG. Traditional models, such as ARIMA and Prophet, are limited by their need for manual parameter tuning and challenges in handling noisy, sparse, and highly variable medical data. This study investigates advanced deep learning… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 6th International Conference on Electronic Engineering and Informatics

  29. arXiv:2406.11809  [pdf, other

    cs.LG cs.RO eess.SY

    Physics-Constrained Learning for PDE Systems with Uncertainty Quantified Port-Hamiltonian Models

    Authors: Kaiyuan Tan, Peilun Li, Thomas Beckers

    Abstract: Modeling the dynamics of flexible objects has become an emerging topic in the community as these objects become more present in many applications, e.g., soft robotics. Due to the properties of flexible materials, the movements of soft objects are often highly nonlinear and, thus, complex to predict. Data-driven approaches seem promising for modeling those complex dynamics but often neglect basic p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  30. arXiv:2406.10708  [pdf, other

    cs.CV cs.DB eess.SP

    MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

    Authors: M. Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Perry Wang, Peizhao Li, Adriano Cardace, Petros Boufounos

    Abstract: Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject s… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 26 pages, 25 figures, 10 tables; See https://doi.org/10.5281/zenodo.12611978 to access the MMVR dataset

  31. arXiv:2406.10605  [pdf, other

    cs.LG cs.GT

    Last-iterate Convergence Separation between Extra-gradient and Optimism in Constrained Periodic Games

    Authors: Yi Feng, Ping Li, Ioannis Panageas, Xiao Wang

    Abstract: Last-iterate behaviors of learning algorithms in repeated two-player zero-sum games have been extensively studied due to their wide applications in machine learning and related tasks. Typical algorithms that exhibit the last-iterate convergence property include optimistic and extra-gradient methods. However, most existing results establish these properties under the assumption that the game is tim… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted for UAI 2024

  32. arXiv:2406.09509  [pdf, other

    cs.AI cs.LG cs.RO

    CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making

    Authors: Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yi Ma, Pengyi Li, Yan Zheng

    Abstract: Leveraging the powerful generative capability of diffusion models (DMs) to build decision-making agents has achieved extensive success. However, there is still a demand for an easy-to-use and modularized open-source library that offers customized and efficient development for DM-based decision-making algorithms. In this work, we introduce CleanDiffuser, the first DM library specifically designed f… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: The first two authors contribute equally to this work. Code and documentation: https://github.com/CleanDiffuserTeam/CleanDiffuser

  33. arXiv:2406.08772  [pdf, other

    cs.CV cs.CL

    MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

    Authors: Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He

    Abstract: Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MM… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  34. arXiv:2406.08155  [pdf, other

    cs.LG cs.AI cs.CL

    Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark

    Authors: Pingzhi Li, Xiaolong Jin, Yu Cheng, Tianlong Chen

    Abstract: Large Language Models~(LLMs) have become foundational in the realm of natural language processing, demonstrating performance improvements as model sizes increase. The Mixture-of-Experts~(MoE) approach offers a promising way to scale LLMs more efficiently by using fewer computational FLOPs through sparse activation. However, it suffers from significant memory overheads, necessitating model compress… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Our code for reproducing all our experiments is provided at https://github.com/UNITES-Lab/moe-quantization

  35. arXiv:2406.07648  [pdf, other

    cs.CV

    M-LRM: Multi-view Large Reconstruction Model

    Authors: Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  36. arXiv:2406.06498  [pdf, other

    cs.RO cs.HC

    Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace

    Authors: Chenxu Wang, Boyuan Du, Jiaxin Xu, Peiyan Li, Di Guo, Huaping Liu

    Abstract: Human-robot collaboration (HRC) in a shared workspace has become a common pattern in real-world robot applications and has garnered significant research interest. However, most existing studies for human-in-the-loop (HITL) collaboration with robots in a shared workspace evaluate in either simplified game environments or physical platforms, falling short in limited realistic significance or limited… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: In RSS 2024

  37. arXiv:2406.05815  [pdf, other

    cs.LG

    What Can We Learn from State Space Models for Machine Learning on Graphs?

    Authors: Yinan Huang, Siqi Miao, Pan Li

    Abstract: Machine learning on graphs has recently found extensive applications across domains. However, the commonly used Message Passing Neural Networks (MPNNs) suffer from limited expressive power and struggle to capture long-range dependencies. Graph transformers offer a strong alternative due to their global attention mechanism, but they come with great computational overheads, especially for large grap… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  38. arXiv:2406.05534  [pdf, other

    cs.AI cs.CL cs.LG

    Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

    Authors: Biqing Qi, Pengfei Li, Fangyuan Li, Junqi Gao, Kaiyan Zhang, Bowen Zhou

    Abstract: Direct Preference Optimization (DPO) improves the alignment of large language models (LLMs) with human values by training directly on human preference datasets, eliminating the need for reward models. However, due to the presence of cross-domain human preferences, direct continual training can lead to catastrophic forgetting, limiting DPO's performance and efficiency. Inspired by intraspecific com… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  39. arXiv:2406.05532  [pdf, other

    cs.LG cs.AI

    Exploring Adversarial Robustness of Deep State Space Models

    Authors: Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou

    Abstract: Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs r… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  40. arXiv:2406.05347  [pdf, other

    q-bio.BM cs.AI cs.LG

    MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

    Authors: Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, Le Song

    Abstract: Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in compre… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  41. arXiv:2406.03879  [pdf, other

    cs.LG cs.CV

    Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure

    Authors: Minghao Yang, Linlin Gao, Pengyuan Li, Wenbo Li, Yihong Dong, Zhiying Cui

    Abstract: Current structured pruning methods often result in considerable accuracy drops due to abrupt network changes and loss of information from pruned structures. To address these issues, we introduce the Decay Pruning Method (DPM), a novel smooth pruning approach with a self-rectifying mechanism. DPM consists of two key components: (i) Smooth Pruning: It converts conventional single-step pruning into m… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  42. arXiv:2406.02790  [pdf, other

    cs.LG cs.CY

    Building Socially-Equitable Public Models

    Authors: Yejia Liu, Jianyi Yang, Pengfei Li, Tongxin Li, Shaolei Ren

    Abstract: Public models offer predictions to a variety of downstream tasks and have played a crucial role in various AI applications, showcasing their proficiency in accurate predictions. However, the exclusive emphasis on prediction accuracy may not align with the diverse end objectives of downstream agents. Recognizing the public model's predictions as a service, we advocate for integrating the objectives… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by the ICML 2024

  43. arXiv:2406.02287  [pdf, other

    cs.CV

    Optimised ProPainter for Video Diminished Reality Inpainting

    Authors: Pengze Li, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

    Abstract: In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimi… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ISBI 2024

  44. arXiv:2406.01956  [pdf, other

    cs.CV

    Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt

    Authors: Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li

    Abstract: This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. Thi… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  45. arXiv:2406.01587  [pdf, other

    cs.RO

    PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

    Authors: Yupeng Zheng, Zebin Xing, Qichao Zhang, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao

    Abstract: Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  46. arXiv:2406.01027  [pdf, other

    cs.DB cs.LG

    PRICE: A Pretrained Model for Cross-Database Cardinality Estimation

    Authors: Tianjing Zeng, Junwei Lan, Jiahong Ma, Wenqing Wei, Rong Zhu, Pengfei Li, Bolin Ding, Defu Lian, Zhewei Wei, Jingren Zhou

    Abstract: Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs and lack of transferability across databases. In this paper, we propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations. PRICE takes low-level but transferable features… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  47. arXiv:2406.00779  [pdf, other

    cs.LG

    Differentiation of Multi-objective Data-driven Decision Pipeline

    Authors: Peng Li, Lixia Wu, Chaoqun Feng, Haoyuan Hu, Lei Fu, Jieping Ye

    Abstract: Real-world scenarios frequently involve multi-objective data-driven optimization problems, characterized by unknown problem coefficients and multiple conflicting objectives. Traditional two-stage methods independently apply a machine learning model to estimate problem coefficients, followed by invoking a solver to tackle the predicted optimization problem. The independent use of optimization solve… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  48. arXiv:2406.00432  [pdf, other

    cs.CV

    Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner

    Authors: Xing Cui, Peipei Li, Zekun Li, Xuannan Liu, Yueying Zou, Zhaofeng He

    Abstract: Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning ``how to drag'' through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  49. arXiv:2405.20044  [pdf, other

    cs.CV

    A Point-Neighborhood Learning Framework for Nasal Endoscope Image Segmentation

    Authors: Pengyu Jie, Wanquan Liu, Chenqiang Gao, Yihui Wen, Rui He, Pengcheng Li, Jintao Zhang, Deyu Meng

    Abstract: The lesion segmentation on endoscopic images is challenging due to its complex and ambiguous features. Fully-supervised deep learning segmentation methods can receive good performance based on entirely pixel-level labeled dataset but greatly increase experts' labeling burden. Semi-supervised and weakly supervised methods can ease labeling burden, but heavily strengthen the learning difficulty. To… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures,

  50. arXiv:2405.19086  [pdf, other

    cs.CL

    MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors

    Authors: Renzhi Wang, Piji Li

    Abstract: Model editing aims to efficiently alter the behavior of Large Language Models (LLMs) within a desired scope, while ensuring no adverse impact on other inputs. Recent years have witnessed various model editing methods been proposed. However, these methods either exhibit poor overall performance or struggle to strike a balance between generalization and locality. We propose MEMoE, a model editing ad… ▽ More

    Submitted 1 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.