Zum Hauptinhalt springen

Showing 1–50 of 898 results for author: Tao, D

.
  1. arXiv:2408.16520  [pdf, other

    cs.CV

    Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

    Authors: Liyao Tang, Zhe Chen, Shanshan Zhao, Chaoyue Wang, Dacheng Tao

    Abstract: Label-efficient segmentation aims to perform effective segmentation on input data using only sparse and limited ground-truth labels for training. This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely, while it is also essential for cost-effective segmentation on 2D images. Until recently, pseudo-labels have been widely employed to faci… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Extended version of arXiv:2305.15832; Code at https://github.com/LiyaoTang/ERDA

  2. arXiv:2408.15621  [pdf, other

    cs.LG cs.CR

    Convergent Differential Privacy Analysis for General Federated Learning: the f-DP Perspective

    Authors: Yan Sun, Li Shen, Dacheng Tao

    Abstract: Federated learning (FL) is an efficient collaborative training paradigm extensively developed with a focus on local privacy protection, and differential privacy (DP) is a classical approach to capture and ensure the reliability of local privacy. The powerful cooperation of FL and DP provides a promising learning framework for large-scale private clients, juggling both privacy securing and trustwor… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.15556  [pdf, other

    cs.CV

    Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

    Authors: Wenbin Wang, Liang Ding, Minyan Zeng, Xiabin Zhou, Li Shen, Yong Luo, Dacheng Tao

    Abstract: Multimodal large language models (MLLMs) have experienced significant advancements recently, but still struggle to recognize and interpret intricate details in high-resolution (HR) images effectively. While state-of-the-art (SOTA) MLLMs claim to process images at 4K resolution, existing MLLM benchmarks only support up to 2K, leaving the capabilities of SOTA models on true HR images largely unteste… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  4. arXiv:2408.12199  [pdf, other

    quant-ph cs.LG

    Efficient Learning for Linear Properties of Bounded-Gate Quantum Circuits

    Authors: Yuxuan Du, Min-Hsiu Hsieh, Dacheng Tao

    Abstract: The vast and complicated large-qubit state space forbids us to comprehensively capture the dynamics of modern quantum computers via classical simulations or quantum tomography. However, recent progress in quantum learning theory invokes a crucial question: given a quantum circuit containing d tunable RZ gates and G-d Clifford gates, can a learner perform purely classical inference to efficiently p… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  5. arXiv:2408.10504  [pdf, other

    cs.AI

    QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

    Authors: Yilun Kong, Hangyu Mao, Qi Zhao, Bin Zhang, Jingqing Ruan, Li Shen, Yongzhe Chang, Xueqian Wang, Rui Zhao, Dacheng Tao

    Abstract: Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLM… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  6. arXiv:2408.10174  [pdf, other

    cs.LG cs.AI

    SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

    Authors: Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao

    Abstract: Deep model training on extensive datasets is increasingly becoming cost-prohibitive, prompting the widespread adoption of deep model fusion techniques to leverage knowledge from pre-existing models. From simple weight averaging to more sophisticated methods like AdaMerging, model fusion effectively improves model performance and accelerates the development of new models. However, potential interfe… ▽ More

    Submitted 26 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: Code is available at https://github.com/tanganke/fusion_bench

  7. arXiv:2408.09937  [pdf, other

    quant-ph cs.LG

    The curse of random quantum data

    Authors: Kaining Zhang, Junyu Liu, Liu Liu, Liang Jiang, Min-Hsiu Hsieh, Dacheng Tao

    Abstract: Quantum machine learning, which involves running machine learning algorithms on quantum devices, may be one of the most significant flagship applications for these devices. Unlike its classical counterparts, the role of data in quantum machine learning has not been fully understood. In this work, we quantify the performances of quantum machine learning in the landscape of quantum data. Provided th… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 40 pages, 8 figures

  8. arXiv:2408.07666  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

    Authors: Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao

    Abstract: Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature reg… ▽ More

    Submitted 21 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  9. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  10. arXiv:2408.04879  [pdf, other

    cs.CV

    On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

    Authors: Jingcai Guo, Zhijie Rao, Zhi Chen, Song Guo, Jingren Zhou, Dacheng Tao

    Abstract: Zero-shot image recognition (ZSIR) aims at empowering models to recognize and reason in unseen domains via learning generalized knowledge from limited data in the seen domain. The gist for ZSIR is to execute element-wise representation and reasoning from the input visual space to the target semantic space, which is a bottom-up modeling paradigm inspired by the process by which humans observe the w… ▽ More

    Submitted 22 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: 23 pages, 7 figures, and 3 tables

  11. arXiv:2408.02882  [pdf, other

    cs.AI cs.CR cs.LG

    Compromising Embodied Agents with Contextual Backdoor Attacks

    Authors: Aishan Liu, Yuguang Zhou, Xianglong Liu, Tianyuan Zhang, Siyuan Liang, Jiakai Wang, Yanjun Pu, Tianlin Li, Junqi Zhang, Wenbo Zhou, Qing Guo, Dacheng Tao

    Abstract: Large language models (LLMs) have transformed the development of embodied intelligence. By providing a few contextual demonstrations, developers can utilize the extensive internal knowledge of LLMs to effortlessly translate complex tasks described in abstract language into sequences of code snippets, which will serve as the execution logic for embodied agents. However, this paper uncovers a signif… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  12. arXiv:2407.19547  [pdf, other

    cs.CV

    Temporal Feature Matters: A Framework for Diffusion Model Quantization

    Authors: Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao

    Abstract: The Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues. However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising. Typically, each time-step… ▽ More

    Submitted 7 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.16503

  13. arXiv:2407.07111  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Diffusion Model-Based Video Editing: A Survey

    Authors: Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Dacheng Tao

    Abstract: The rapid development of diffusion models (DMs) has significantly advanced image and video applications, making "what you want is what you see" a reality. Among these, video editing has gained substantial attention and seen a swift rise in research activity, necessitating a comprehensive and systematic review of the existing literature. This paper reviews diffusion model-based video editing techni… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 23 pages, 12 figures, a project related to this paper can be found at https://github.com/wenhao728/awesome-diffusion-v2v

  14. arXiv:2407.06087  [pdf, other

    cs.LG cs.CV

    Analytic Convolutional Layer: A Step to Analytic Neural Network

    Authors: Jingmao Cui, Donglai Tao, Linmi Tao, Ruiyang Liu, Yu Cheng

    Abstract: The prevailing approach to embedding prior knowledge within convolutional layers typically includes the design of steerable kernels or their modulation using designated kernel banks. In this study, we introduce the Analytic Convolutional Layer (ACL), an innovative model-driven convolutional layer, which is a mosaic of analytical convolution kernels (ACKs) and traditional convolution kernels. ACKs… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  15. arXiv:2407.04272  [pdf, other

    cs.LG cs.DC

    Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

    Authors: Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao

    Abstract: DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we… ▽ More

    Submitted 25 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: camera-ready version for SC '24

  16. arXiv:2407.04267  [pdf, other

    cs.DC

    A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization

    Authors: Daoce Wang, Pascal Grosset, Jesus Pulido, Tushar M. Athawale, Jiannan Tian, Kai Zhao, Zarija Lukić, Axel Huebl, Zhe Wang, James Ahrens, Dingwen Tao

    Abstract: Multi-resolution methods such as Adaptive Mesh Refinement (AMR) can enhance storage efficiency for HPC applications generating vast volumes of data. However, their applicability is limited and cannot be universally deployed across all applications. Furthermore, integrating lossy compression with multi-resolution techniques to further boost storage efficiency encounters significant barriers. To thi… ▽ More

    Submitted 25 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: camera-ready version for SC '24

  17. arXiv:2407.02301  [pdf, other

    cs.CL

    CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models

    Authors: Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao

    Abstract: Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to b… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  18. arXiv:2407.01445  [pdf, other

    cs.LG cs.CV

    FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

    Authors: Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen Tao, Tianbao Yang

    Abstract: Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstra… ▽ More

    Submitted 29 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 24 pages

  19. arXiv:2407.00717  [pdf, other

    cs.LG cs.AI eess.SY

    Learning System Dynamics without Forgetting

    Authors: Xikun Zhang, Dongjin Song, Yushan Jiang, Yixin Chen, Dacheng Tao

    Abstract: Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with di… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  20. arXiv:2407.00600  [pdf, other

    cs.CV cs.AI

    GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

    Authors: Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, Dacheng Tao

    Abstract: Large Vision-Language Models (LVLMs) have been widely adopted in various applications; however, they exhibit significant gender biases. Existing benchmarks primarily evaluate gender bias at the demographic group level, neglecting individual fairness, which emphasizes equal treatment of similar individuals. This research gap limits the detection of discriminatory behaviors, as individual fairness o… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  21. arXiv:2407.00341  [pdf, other

    cs.CL

    Iterative Data Augmentation with Large Language Models for Aspect-based Sentiment Analysis

    Authors: Haiyun Li, Qihuang Zhong, Ke Zhu, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data augmentation (DA) has become the standard for improving the performance of ABSA. However, current DA methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diver… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Work in process

  22. arXiv:2406.14555  [pdf, other

    cs.CV

    A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

    Authors: Xincheng Shuai, Henghui Ding, Xingjun Ma, Rongcheng Tu, Yu-Gang Jiang, Dacheng Tao

    Abstract: Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC). Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models, which generate images according to text prompts. Th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Project Page: https://github.com/xinchengshuai/Awesome-Image-Editing

  23. arXiv:2406.14367  [pdf, other

    cs.CV cs.AI

    PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions

    Authors: Sihan Ma, Jing Zhang, Qiong Cao, Dacheng Tao

    Abstract: Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images, which is crucial for various applications such as human-machine interaction, embodied AI, and autonomous driving. While current models show promising results, they are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment and thus… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Technical report. Project page: https://xymsh.github.io/PoseBench/

  24. arXiv:2406.12800  [pdf, other

    cs.CR

    Supporting Human Raters with the Detection of Harmful Content using Large Language Models

    Authors: Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Bratanič, Felipe Tiengo Ferreira, Vijay Kumar Eranti, Elie Bursztein

    Abstract: In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  25. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  26. arXiv:2406.11190  [pdf, other

    cs.CL cs.AI

    Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

    Authors: Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo Wang, Qi Zhang, Liang Ding, Dacheng Tao

    Abstract: In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures

  27. arXiv:2406.08989  [pdf, other

    eess.AS cs.SD

    ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

    Authors: Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee

    Abstract: Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  28. arXiv:2406.06302  [pdf, other

    cs.CR cs.CV

    Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks

    Authors: Zonghao Ying, Aishan Liu, Xianglong Liu, Dacheng Tao

    Abstract: The recent release of GPT-4o has garnered widespread attention due to its powerful general capabilities. While its impressive performance is widely acknowledged, its safety aspects have not been sufficiently explored. Given the potential societal impact of risky content generated by advanced generative AI such as GPT-4o, it is crucial to rigorously evaluate its safety. In response to this question… ▽ More

    Submitted 2 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  29. arXiv:2406.04854  [pdf, other

    cs.CL

    Uncertainty Aware Learning for Language Model Alignment

    Authors: Yikun Wang, Rui Zheng, Liang Ding, Qi Zhang, Dahua Lin, Dacheng Tao

    Abstract: As instruction-tuned large language models (LLMs) evolve, aligning pretrained foundation models presents increasing challenges. Existing alignment strategies, which typically leverage diverse and high-quality data sources, often overlook the intrinsic uncertainty of tasks, learning all data samples equally. This may lead to suboptimal data efficiency and model performance. In response, we propose… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  30. arXiv:2406.04836  [pdf, other

    cs.CL cs.AI

    Revisiting Catastrophic Forgetting in Large Language Model Tuning

    Authors: Hongyu Li, Liang Ding, Meng Fang, Dacheng Tao

    Abstract: Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data. It compromises the effectiveness of large language models (LLMs) during fine-tuning, yet the underlying causes have not been thoroughly investigated. This paper takes the first step to reveal the direct link between the flatness of the model loss landscape and the extent of CF in the field of… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  31. arXiv:2406.04031  [pdf, other

    cs.CV cs.CR

    Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

    Authors: Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xianglong Liu, Dacheng Tao

    Abstract: In the realm of large vision language models (LVLMs), jailbreak attacks serve as a red-teaming approach to bypass guardrails and uncover safety implications. Existing jailbreaks predominantly focus on the visual modality, perturbing solely visual inputs in the prompt for attacks. However, they fall short when confronted with aligned models that fuse visual and textual features simultaneously for g… ▽ More

    Submitted 1 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  32. arXiv:2406.03280  [pdf, other

    cs.LG cs.AI cs.CL

    FusionBench: A Comprehensive Benchmark of Deep Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao

    Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Project homepage: https://github.com/tanganke/fusion_bench

  33. arXiv:2406.00934  [pdf, other

    cs.CV

    LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions

    Authors: Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao

    Abstract: Lane detection (LD) is an essential component of autonomous driving systems, providing fundamental functionalities like adaptive cruise control and automated lane centering. Existing LD benchmarks primarily focus on evaluating common cases, neglecting the robustness of LD models against environmental illusions such as shadows and tire marks on the road. This research gap poses significant safety c… ▽ More

    Submitted 16 July, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM MM 2024

  34. arXiv:2405.19684  [pdf, other

    cs.CV

    A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning

    Authors: Xiaofeng Cong, Yu Zhao, Jie Gui, Junming Hou, Dacheng Tao

    Abstract: Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent. To foster future advancements, we provide a detailed overview of the UIE task from several perspectives. Firstly, we introduce the physical models, data construction processes, evaluation metrics,… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: A survey on the underwater image enhancement task

  35. arXiv:2405.18080  [pdf, other

    cs.LG

    HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  36. arXiv:2405.17098  [pdf, other

    cs.LG

    Q-value Regularized Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  37. arXiv:2405.16560  [pdf, other

    cs.LG

    Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

    Authors: Yongxian Wei, Zixuan Hu, Li Shen, Zhenyi Wang, Yu Li, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data, enabling the rapid adaptation to new unseen tasks. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. In this paper, we empirically and theoretically identify and analyze the mode… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  38. arXiv:2405.10642  [pdf, other

    cs.LG

    Hi-GMAE: Hierarchical Graph Masked Autoencoders

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu, Shirui Pan, Bo Du

    Abstract: Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance,… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, 3 tables

  39. arXiv:2405.09757  [pdf, other

    cs.CR

    Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates

    Authors: Enze Liu, George Kappos, Eric Mugnier, Luca Invernizzi, Stefan Savage, David Tao, Kurt Thomas, Geoffrey M. Voelker, Sarah Meiklejohn

    Abstract: Scams -- fraudulent schemes designed to swindle money from victims -- have existed for as long as recorded history. However, the Internet's combination of low communication cost, global reach, and functional anonymity has allowed scam volumes to reach new heights. Designing effective interventions requires first understanding the context: how scammers reach potential victims, the earnings they mak… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Under review

  40. arXiv:2405.08550  [pdf, other

    cs.LG

    Learning Multi-Agent Communication from Graph Modeling Perspective

    Authors: Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao

    Abstract: In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, information sharing among all agents proves to be resource-intensive, while the adoption of a manually pre-defined c… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Published at ICLR 2024

  41. arXiv:2405.07226  [pdf, other

    quant-ph cs.AI cs.LG

    Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem

    Authors: Xinbiao Wang, Yuxuan Du, Kecheng Liu, Yong Luo, Bo Du, Dacheng Tao

    Abstract: The No-Free-Lunch (NFL) theorem, which quantifies problem- and data-independent generalization errors regardless of the optimization process, provides a foundational framework for comprehending diverse learning protocols' potential. Despite its significance, the establishment of the NFL theorem for quantum machine learning models remains largely unexplored, thereby overlooking broader insights int… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  42. arXiv:2405.06001  [pdf, other

    cs.LG cs.AI cs.CL

    LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

    Authors: Ruihao Gong, Yang Yong, Shiqiao Gu, Yushi Huang, Chentao Lv, Yunchen Zhang, Xianglong Liu, Dacheng Tao

    Abstract: Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements limit the widespread adoption. Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating LLMs, albeit w… ▽ More

    Submitted 20 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  43. arXiv:2405.04940  [pdf, other

    cs.CV

    Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

    Authors: Wentao Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yibing Zhan, Dapeng Tao

    Abstract: Text-to-image person re-identification (ReID) retrieves pedestrian images according to textual descriptions. Manually annotating textual descriptions is time-consuming, restricting the scale of existing datasets and therefore the generalization ability of ReID models. As a result, we study the transferable text-to-image ReID problem, where we train a model on our proposed large-scale database and… ▽ More

    Submitted 30 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  44. arXiv:2405.01649  [pdf, other

    cs.CL

    Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

    Authors: Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, Dacheng Tao

    Abstract: Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propo… ▽ More

    Submitted 8 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  45. arXiv:2405.00984  [pdf, other

    cs.LG cs.CV

    FREE: Faster and Better Data-Free Meta-Learning

    Authors: Yongxian Wei, Zixuan Hu, Zhenyi Wang, Li Shen, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-tra… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  46. arXiv:2404.18413  [pdf, other

    cs.CV cs.AI

    3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

    Authors: Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang

    Abstract: Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT researc… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  47. arXiv:2404.15806  [pdf, other

    cs.LG

    Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders

    Authors: Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu

    Abstract: Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the po… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 Figures. Accepted by IJCAI 2024

  48. arXiv:2404.15598  [pdf, other

    cs.LG cs.CR

    Federated Learning with Only Positive Labels by Exploring Label Correlations

    Authors: Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

    Abstract: Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue ca… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems

  49. arXiv:2404.14963  [pdf, other

    cs.CL cs.AI

    Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

    Authors: Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, Dacheng Tao

    Abstract: Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors. Prior studies involve addressing the calculation errors and step-missing error… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Work in progress

  50. arXiv:2404.14387  [pdf, other

    cs.CL cs.AI

    A Survey on Self-Evolution of Large Language Models

    Authors: Zhengwei Tao, Ting-En Lin, Xiancai Chen, Hangyu Li, Yuchuan Wu, Yongbin Li, Zhi Jin, Fei Huang, Dacheng Tao, Jingren Zhou

    Abstract: Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. However, current LLMs that learn from human or external model supervision are costly and may face performance ceilings as task complexity and diversity increase. To address this issue, self-evolution approaches that enable LLM to autonomously acquire, refine, and learn from experiences ge… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.