Zum Hauptinhalt springen

Showing 1–50 of 408 results for author: Xue, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16469  [pdf, other

    cs.CV

    Multi-source Domain Adaptation for Panoramic Semantic Segmentation

    Authors: Jing Jiang, Sicheng Zhao, Jiankun Zhu, Wenbo Tang, Zhaopan Xu, Jidong Yang, Pengfei Xu, Hongxun Yao

    Abstract: Panoramic semantic segmentation has received widespread attention recently due to its comprehensive 360\degree field of view. However, labeling such images demands greater resources compared to pinhole images. As a result, many unsupervised domain adaptation methods for panoramic semantic segmentation have emerged, utilizing real pinhole images or low-cost synthetic panoramic images. But, the segm… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures, 5 tables

  2. arXiv:2408.15740  [pdf

    cs.CV

    MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms

    Authors: Tianyi Shang, Zhenyu Li, Wenhao Pei, Pengjie Xu, ZhaoJun Deng, Fanchen Kong

    Abstract: Vision Language Place Recognition (VLVPR) enhances robot localization performance by incorporating natural language descriptions from images. By utilizing language information, VLVPR directs robot place matching, overcoming the constraint of solely depending on vision. The essence of multimodal fusion lies in mining the complementary information between different modalities. However, general fusio… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 8 pages

  3. arXiv:2408.12355  [pdf, other

    cs.CV cs.AI

    Class-balanced Open-set Semi-supervised Object Detection for Medical Images

    Authors: Zhanyun Lu, Renshu Gu, Huimin Cheng, Siyu Pang, Mingyu Xu, Peifang Xu, Yaqi Wang, Yuichiro Kinoshita, Juan Ye, Gangyong Jia, Qing Wu

    Abstract: Medical image datasets in the real world are often unlabeled and imbalanced, and Semi-Supervised Object Detection (SSOD) can utilize unlabeled data to improve an object detector. However, existing approaches predominantly assumed that the unlabeled data and test data do not contain out-of-distribution (OOD) classes. The few open-set semi-supervised object detection methods have two weaknesses: fir… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  4. arXiv:2408.12333  [pdf, other

    cs.AI

    Graph Retrieval Augmented Trustworthiness Reasoning

    Authors: Ying Zhu, Shengchang Li, Ziqian Kong, Peilan Xu

    Abstract: Trustworthiness reasoning is crucial in multiplayer games with incomplete information, enabling agents to identify potential allies and adversaries, thereby enhancing reasoning and decision-making processes. Traditional approaches relying on pre-trained models necessitate extensive domain-specific data and considerable reward feedback, with their lack of real-time adaptability hindering their effe… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  5. arXiv:2408.09380   

    cs.AI cs.IR

    ELASTIC: Efficient Linear Attention for Sequential Interest Compression

    Authors: Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun Liu, Peixing Xu, Guorui Zhou

    Abstract: State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: We hereby withdraw this paper from arXiv due to incomplete experiments. Upon further review, we have determined that additional experimental work is necessary to fully validate our findings and conclusions

  6. arXiv:2408.07422  [pdf, other

    cs.CV cs.AI

    LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image

    Authors: Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding

    Abstract: Recent advancements in autonomous driving, augmented reality, robotics, and embodied intelligence have necessitated 3D perception algorithms. However, current 3D perception methods, particularly small models, struggle with processing logical reasoning, question-answering, and handling open scenario categories. On the other hand, generative multimodal large language models (MLLMs) excel in general… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2408.06099  [pdf, other

    cs.LG cs.CY

    Approximating Discrimination Within Models When Faced With Several Non-Binary Sensitive Attributes

    Authors: Yijun Bian, Yujie Luo, Ping Xu

    Abstract: Discrimination mitigation with machine learning (ML) models could be complicated because multiple factors may interweave with each other including hierarchically and historically. Yet few existing fairness measures are able to capture the discrimination level within ML models in the face of multiple sensitive attributes. To bridge this gap, we propose a fairness measure based on distances between… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: The first two authors contributed equally, listed in alphabetical order. arXiv admin note: substantial text overlap with arXiv:2405.09251

    MSC Class: 68T01; 68T09; 68T20 ACM Class: I.2; I.2.6; I.2.0; K.4.2

  8. arXiv:2408.03906  [pdf, other

    cs.RO

    Achieving Human Level Competitive Robot Table Tennis

    Authors: David B. D'Ambrosio, Saminda Abeyruwan, Laura Graesser, Atil Iscen, Heni Ben Amor, Alex Bewley, Barney J. Reed, Krista Reymann, Leila Takayama, Yuval Tassa, Krzysztof Choromanski, Erwin Coumans, Deepali Jain, Navdeep Jaitly, Natasha Jaques, Satoshi Kataoka, Yuheng Kuang, Nevena Lazic, Reza Mahjourian, Sherry Moore, Kenneth Oslund, Anish Shankar, Vikas Sindhwani, Vincent Vanhoucke, Grace Vesom , et al. (2 additional authors not shown)

    Abstract: Achieving human-level speed and performance on real world tasks is a north star for the robotics research community. This work takes a step towards that goal and presents the first learned robot agent that reaches amateur human-level performance in competitive table tennis. Table tennis is a physically demanding sport which requires human players to undergo years of training to achieve an advanced… ▽ More

    Submitted 9 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: v2, 29 pages, 19 main paper, 10 references + appendix, adding an additional 9 references

  9. arXiv:2408.01402  [pdf, other

    cs.LG cs.AI cs.CL

    Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

    Authors: Yu Yang, Pan Xu

    Abstract: Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collect… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 2 figures, 8 tables. Accepted by the Training Agents with Foundation Models Workshop at RLC 2024

  10. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  11. arXiv:2407.20519  [pdf, other

    cs.HC cs.AI

    DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis

    Authors: Yue Pan, Qile Liu, Qing Liu, Li Zhang, Gan Huang, Xin Chen, Fali Li, Peng Xu, Zhen Liang

    Abstract: Affective brain-computer interfaces (aBCIs) are increasingly recognized for their potential in monitoring and interpreting emotional states through electroencephalography (EEG) signals. Current EEG-based emotion recognition methods perform well with short segments of EEG data. However, these methods encounter significant challenges in real-life scenarios where emotional states evolve over extended… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  12. arXiv:2407.14482  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

    Authors: Peng Xu, Wei Ping, Xianchao Wu, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. These two capabilities are essential for LLMs to process large volumes of information that cannot fit into a single prompt and are complementary to each o… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  13. arXiv:2407.11062  [pdf, other

    cs.LG cs.AI cs.CL

    EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

    Authors: Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Large language models (LLMs) are integral to modern natural language processing and artificial intelligence. However, they face challenges in managing their significant memory requirements. Although quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss, it demands substantial training resources to optimize mode… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: An efficient and effective quantization technical to improve the performance of low-bits LMMs and LVLMs

  14. arXiv:2407.10687  [pdf, other

    cs.CV cs.GR

    FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation

    Authors: Honghao Xu, Juzhan Xu, Zeyu Huang, Pengfei Xu, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce a novel method called FRI-Net for 2D floorplan reconstruction from 3D point cloud. Existing methods typically rely on corner regression or box regression, which lack consideration for the global shapes of rooms. To address these issues, we propose a novel approach using a room-wise implicit representation with structural regularization to characterize the shapes of room… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  15. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  16. arXiv:2407.00032  [pdf, other

    cs.DC cs.AI

    Design a Win-Win Strategy That Is Fair to Both Service Providers and Tasks When Rejection Is Not an Option

    Authors: Yohai Trabelsi, Pan Xu, Sarit Kraus

    Abstract: Assigning tasks to service providers is a frequent procedure across various applications. Often the tasks arrive dynamically while the service providers remain static. Preventing task rejection caused by service provider overload is of utmost significance. To ensure a positive experience in relevant applications for both service providers and tasks, fairness must be considered. To address the issu… ▽ More

    Submitted 22 May, 2024; originally announced July 2024.

  17. arXiv:2406.18543  [pdf, ps, other

    cs.CV

    A Set-based Approach for Feature Extraction of 3D CAD Models

    Authors: Peng Xu, Qi Gao, Ying-Jie Wu

    Abstract: Feature extraction is a critical technology to realize the automatic transmission of feature information throughout product life cycles. As CAD models primarily capture the 3D geometry of products, feature extraction heavily relies on geometric information. However, existing feature extraction methods often yield inaccurate outcomes due to the diverse interpretations of geometric information. This… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

    Comments: 13 pages

  18. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, Jing Sun, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (11 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo… ▽ More

    Submitted 9 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  19. arXiv:2406.15093  [pdf, other

    cs.CR cs.CV eess.IV

    ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

    Authors: Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai Jin

    Abstract: Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, bui… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ESORICS 2024

  20. arXiv:2406.13514  [pdf, other

    cs.CV

    Locally orderless networks

    Authors: Jon Sporring, Peidi Xu, Jiahao Lu, François Lauze, Sune Darkner

    Abstract: We present Locally Orderless Networks (LON) and its theoretic foundation which links it to Convolutional Neural Networks (CNN), to Scale-space histograms, and measurement theory. The key elements are a regular sampling of the bias and the derivative of the activation function. We compare LON, CNN, and Scale-space histograms on prototypical single-layer networks. We show how LON and CNN can emulate… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  21. arXiv:2406.13114  [pdf, other

    cs.CL cs.AI

    Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

    Authors: Yuhang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang

    Abstract: Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of mer… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: preprint

  22. arXiv:2406.12241  [pdf, other

    cs.LG cs.AI

    More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

    Authors: Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu

    Abstract: Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal r… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: First two authors contributed equally. Accepted to the Reinforcement Learning Conference (RLC) 2024

  23. arXiv:2406.10125  [pdf, other

    cs.CV

    MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

    Authors: Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu

    Abstract: Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  24. arXiv:2406.09771  [pdf, other

    cs.DS

    Block Coordinate Descent Methods for Optimization under J-Orthogonality Constraints with Applications

    Authors: Di He, Ganzhao Yuan, Xiao Wang, Pengxiang Xu

    Abstract: The J-orthogonal matrix, also referred to as the hyperbolic orthogonal matrix, is a class of special orthogonal matrix in hyperbolic space, notable for its advantageous properties. These matrices are integral to optimization under J-orthogonal constraints, which have widespread applications in statistical learning and data science. However, addressing these problems is generally challenging due to… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  25. arXiv:2406.04137  [pdf, other

    cs.LG math.ST stat.ML

    Optimal Batched Linear Bandits

    Authors: Xuanfei Ren, Tianyuan Jin, Pan Xu

    Abstract: We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret with only $O(\log\log T)$ batches, and the asymptotically optimal regret with only $3$ batches as $T\rightarrow\infty$, where $T$ is the time horizon. We furthe… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 26 pages, 6 figures, 4 tables. To appear in the proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  26. arXiv:2405.17710  [pdf, other

    cs.SI cs.CL

    Does Geo-co-location Matter? A Case Study of Public Health Conversations during COVID-19

    Authors: Paiheng Xu, Louiqa Raschid, Vanessa Frias-Martinez

    Abstract: Social media platforms like Twitter (now X) have been pivotal in information dissemination and public engagement, especially during COVID-19. A key goal for public health experts was to encourage prosocial behavior that could impact local outcomes such as masking and social distancing. Given the importance of local news and guidance during COVID-19, the objective of our research is to analyze the… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  27. arXiv:2405.16378  [pdf, other

    cs.NI cs.DC cs.PF

    FPsPIN: An FPGA-based Open-Hardware Research Platform for Processing in the Network

    Authors: Timo Schneider, Pengcheng Xu, Torsten Hoefler

    Abstract: In the era of post-Moore computing, network offload emerges as a solution to two challenges: the imperative for low-latency communication and the push towards hardware specialisation. Various methods have been employed to offload protocol- and data-processing onto network interface cards (NICs), from firmware modification to running full Linux on NICs for application execution. The sPIN project en… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 11 pages

  28. arXiv:2405.12490  [pdf, other

    cs.CV

    Customize Your Own Paired Data via Few-shot Way

    Authors: Jinshu Chen, Bingchuan Li, Miao Hua, Panpan Xu, Qian He

    Abstract: Existing solutions to image editing tasks suffer from several issues. Though achieving remarkably satisfying generated results, some supervised methods require huge amounts of paired training data, which greatly limits their usages. The other unsupervised methods take full advantage of large-scale pre-trained priors, thus being strictly restricted to the domains where the priors are trained on and… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by AI4CC CVPR2024 WorkShop

  29. arXiv:2405.02759  [pdf, other

    cs.GR

    Region-Aware Color Smudging

    Authors: Ying Jiang, Pengfei Xu, Congyi Zhang, Hongbo Fu, Henry Lau, Wenping Wang

    Abstract: Color smudge operations from digital painting software enable users to create natural shading effects in high-fidelity paintings by interactively mixing colors. To precisely control results in traditional painting software, users tend to organize flat-filled color regions in multiple layers and smudge them to generate different color gradients. However, the requirement to carefully deal with regio… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  30. arXiv:2405.00749  [pdf, other

    cs.CV cs.LG

    More is Better: Deep Domain Adaptation with Multiple Sources

    Authors: Sicheng Zhao, Hui Chen, Hu Huang, Pengfei Xu, Guiguang Ding

    Abstract: In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to d… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024. arXiv admin note: text overlap with arXiv:2002.12169

  31. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  32. arXiv:2404.16006  [pdf, other

    cs.CV

    MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 77 pages, 41 figures

  33. arXiv:2404.14824  [pdf, other

    cs.SE

    Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond

    Authors: Pengyu Xue, Linhao Wu, Zhongxing Yu, Zhi Jin, Zhen Yang, Xinyi Li, Zhenyu Yang, Yue Tan

    Abstract: Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs, which facilitate collaboration among developers and play a critical role in Open-Source Software (OSS). Very recently, Large Language Models (LLMs) have demonstrated extensive applicability in diverse code-related task. But few studies systematically explored their effectiveness usin… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  34. arXiv:2404.10728  [pdf, other

    cs.LG stat.ML

    Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

    Authors: Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu

    Abstract: We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin M… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 80 pages, 14 figures, 1 table. Hao-Lun Hsu and Weixin Wang contributed equally to this work

  35. arXiv:2404.06167  [pdf, other

    cs.LG cs.AI q-bio.GN

    scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding

    Authors: Ping Xu, Zhiyuan Ning, Meng Xiao, Guihai Feng, Xin Li, Yuanchun Zhou, Pengfei Wang

    Abstract: Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular heterogeneity and diversity, offering invaluable insights for bioinformatics advancements. Despite its potential, traditional clustering methods in scRNA-seq data analysis often neglect the structural information embedded in gene expression profiles, crucial for understanding cellular correlations and dependencies. Existin… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted as a long paper for the research track at DASFAA 2024

  36. arXiv:2404.02444  [pdf, other

    cs.CL cs.AI

    The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education

    Authors: Paiheng Xu, Jing Liu, Nathan Jones, Julie Cohen, Wei Ai

    Abstract: Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers' expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that mostly focuses on low-inference instructional practic… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  37. arXiv:2404.00292  [pdf, other

    cs.CV

    LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

    Authors: Pancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang

    Abstract: Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to… ▽ More

    Submitted 12 July, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024, Fig.2 and Equation 4 revised

  38. arXiv:2403.18660  [pdf, other

    cs.GR cs.CV

    InstructBrush: Learning Attention-based Instruction Optimization for Image Editing

    Authors: Ruoyu Zhao, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Wei Wu, Pengcheng Xu, Mingrui Zhu, Nannan Wang, Xinbo Gao

    Abstract: In recent years, instruction-based image editing methods have garnered significant attention in image editing. However, despite encompassing a wide range of editing priors, these methods are helpless when handling editing tasks that are challenging to accurately describe through language. We propose InstructBrush, an inversion method for instruction-based image editing methods to bridge this gap.… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Project Page: https://royzhao926.github.io/InstructBrush/

  39. arXiv:2403.15137  [pdf, other

    cs.AI cs.CL cs.MA

    CACA Agent: Capability Collaboration based AI Agent

    Authors: Peng Xu, Haoran Wang, Chuang Wang, Xu Liu

    Abstract: As AI Agents based on Large Language Models (LLMs) have shown potential in practical applications across various fields, how to quickly deploy an AI agent and how to conveniently expand the application scenario of AI agents has become a challenge. Previous studies mainly focused on implementing all the reasoning capabilities of AI agents within a single LLM, which often makes the model more comple… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 4 pages,5 figures

  40. The Power of Bamboo: On the Post-Compromise Security for Searchable Symmetric Encryption

    Authors: Tianyang Chen, Peng Xu, Stjepan Picek, Bo Luo, Willy Susilo, Hai Jin, Kaitai Liang

    Abstract: Dynamic searchable symmetric encryption (DSSE) enables users to delegate the keyword search over dynamically updated encrypted databases to an honest-but-curious server without losing keyword privacy. This paper studies a new and practical security risk to DSSE, namely, secret key compromise (e.g., a user's secret key is leaked or stolen), which threatens all the security guarantees offered by exi… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: This is a full version paper that includes the security proof. The paper with the same name has been published by NDSS 2023

    Journal ref: NDSS 2023

  41. arXiv:2403.14354  [pdf, other

    cs.CV

    LDTR: Transformer-based Lane Detection with Anchor-chain Representation

    Authors: Zhongyu Yang, Chen Shen, Wei Shao, Tengfei Xing, Runbo Hu, Pengfei Xu, Hua Chai, Ruini Xue

    Abstract: Despite recent advances in lane detection methods, scenarios with limited- or no-visual-clue of lanes due to factors such as lighting conditions and occlusion remain challenging and crucial for automated driving. Moreover, current lane representations require complex post-processing and struggle with specific instances. Inspired by the DETR architecture, we propose LDTR, a transformer-based model… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by CVM 2024 and CVMJ. 16 pages, 14 figures

  42. arXiv:2403.11081  [pdf, other

    cs.IT cs.NI eess.SP

    Enhanced Index Modulation Aided Non-Orthogonal Multiple Access via Constellation Rotation

    Authors: Ronglan Huang, Fei ji, Zeng Hu, Dehuan Wan, Pengcheng Xu, Yun Liu

    Abstract: Non-orthogonal multiple access (NOMA) has been widely nominated as an emerging spectral efficiency (SE) multiple access technique for the next generation of wireless communication network. To meet the growing demands in massive connectivity and huge data in transmission, a novel index modulation aided NOMA with the rotation of signal constellation of low power users (IM-NOMA-RC) is developed to th… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  43. arXiv:2403.10830  [pdf, other

    cs.CV

    View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV

    Authors: Deyi Ji, Siqi Gao, Lanyun Zhu, Qi Zhu, Yiru Zhao, Peng Xu, Hongtao Lu, Feng Zhao, Jieping Ye

    Abstract: In this paper, we address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IOU association… ▽ More

    Submitted 14 May, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  44. arXiv:2403.09621  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

    Authors: Zhishuai Liu, Pan Xu

    Abstract: Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces. However, the consideration of dynamics uncertainty introduces essential nonlinearity and computational burden, posing unique challenges for analyzing and practicall… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 53 pages, 1 figure, 1 table

  45. arXiv:2403.09606  [pdf, ps, other

    cs.CL cs.AI

    Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey

    Authors: Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian McAuley, Wei Ai, Furong Huang

    Abstract: Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of Natural Language Processing (NLP) models by capturing causal relationships among variables. The emergence of generative Large Language Models (LLMs) has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on e… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  46. arXiv:2403.08193  [pdf, other

    cs.LG cs.AR cs.ET

    Learning-driven Physically-aware Large-scale Circuit Gate Sizing

    Authors: Yuyang Ye, Peng Xu, Lizheng Ren, Tinghuan Chen, Hao Yan, Bei Yu, Longxing Shi

    Abstract: Gate sizing plays an important role in timing optimization after physical design. Existing machine learning-based gate sizing works cannot optimize timing on multiple timing paths simultaneously and neglect the physical constraint on layouts. They cause sub-optimal sizing solutions and low-efficiency issues when compared with commercial gate sizing tools. In this work, we propose a learning-driven… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  47. arXiv:2403.02709  [pdf, other

    cs.RO

    RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

    Authors: Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, Ajinkya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal

    Abstract: Natural language and images are commonly used as goal representations in goal-conditioned imitation learning (IL). However, natural language can be ambiguous and images can be over-specified. In this work, we propose hand-drawn sketches as a modality for goal specification in visual imitation learning. Sketches are easy for users to provide on the fly like language, but similar to images they can… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  48. Map-aided annotation for pole base detection

    Authors: Benjamin Missaoui, Maxime Noizet, Philippe Xu

    Abstract: For autonomous navigation, high definition maps are a widely used source of information. Pole-like features encoded in HD maps such as traffic signs, traffic lights or street lights can be used as landmarks for localization. For this purpose, they first need to be detected by the vehicle using its embedded sensors. While geometric models can be used to process 3D point clouds retrieved by lidar se… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Journal ref: 35th IEEE Intelligent Vehicles Symposium (IV 2023), Jun 2023, Anchorage, AK, United States

  49. arXiv:2403.01182  [pdf, other

    cs.CR

    d-DSE: Distinct Dynamic Searchable Encryption Resisting Volume Leakage in Encrypted Databases

    Authors: Dongli Liu, Wei Wang, Peng Xu, Laurence T. Yang, Bo Luo, Kaitai Liang

    Abstract: Dynamic Searchable Encryption (DSE) has emerged as a solution to efficiently handle and protect large-scale data storage in encrypted databases (EDBs). Volume leakage poses a significant threat, as it enables adversaries to reconstruct search queries and potentially compromise the security and privacy of data. Padding strategies are common countermeasures for the leakage, but they significantly in… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 23pages, 13 figures, will be published in USENIX Security'24

  50. arXiv:2403.01155  [pdf, other

    cs.CR

    Query Recovery from Easy to Hard: Jigsaw Attack against SSE

    Authors: Hao Nie, Wei Wang, Peng Xu, Xianglong Zhang, Laurence T. Yang, Kaitai Liang

    Abstract: Searchable symmetric encryption schemes often unintentionally disclose certain sensitive information, such as access, volume, and search patterns. Attackers can exploit such leakages and other available knowledge related to the user's database to recover queries. We find that the effectiveness of query recovery attacks depends on the volume/frequency distribution of keywords. Queries containing ke… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 21 pages, accepted in USENIX Security 2024