Zum Hauptinhalt springen

Showing 1–50 of 1,243 results for author: he, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14211  [pdf, other

    cs.CV cs.AI

    MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

    Authors: Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

    Abstract: Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D d… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Project Page: https://thuhcsi.github.io/MagicMan

  2. arXiv:2408.13741  [pdf, other

    cs.CR

    CAMH: Advancing Model Hijacking Attack in Machine Learning

    Authors: Xing He, Jiahao Chen, Yuwen Pu, Qingming Li, Chunyi Zhou, Yingcai Wu, Jinbao Li, Shouling Ji

    Abstract: In the burgeoning domain of machine learning, the reliance on third-party services for model training and the adoption of pre-trained models have surged. However, this reliance introduces vulnerabilities to model hijacking attacks, where adversaries manipulate models to perform unintended tasks, leading to significant security and ethical concerns, like turning an ordinary image classifier into a… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 9 pages

  3. arXiv:2408.13534  [pdf, other

    cs.CL

    Cultural Adaptation of Menus: A Fine-Grained Approach

    Authors: Zhonghe Zhang, Xiaoyu He, Vivek Iyer, Alexandra Birch

    Abstract: Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures; however, a deeper analysis is needed to examine the benefits and pitfalls of each method. In this paper, we introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  4. arXiv:2408.12984  [pdf, other

    cond-mat.mtrl-sci cs.AI

    Zeoformer: Coarse-Grained Periodic Graph Transformer for OSDA-Zeolite Affinity Prediction

    Authors: Xiangxiang Shen, Zheng Wan, Lingfeng Wen, Licheng Sun, Ou Yang Ming Jie, Xuan Tang, Xian Zeng, Mingsong Chen, Xiao He, Xian Wei

    Abstract: To date, the International Zeolite Association Structure Commission (IZA-SC) has cataloged merely 255 distinct zeolite structures, with millions of theoretically possible structures yet to be discovered. The synthesis of a specific zeolite typically necessitates the use of an organic structure-directing agent (OSDA), since the selectivity for a particular zeolite is largely determined by the affin… ▽ More

    Submitted 25 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 7 pages, 5 figures

  5. arXiv:2408.11787  [pdf, other

    eess.IV cs.CV

    NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

    Authors: Zhenye Lou, Qing Xu, Zekun Jiang, Xiangjian He, Zhen Chen, Yi Wang, Chenxin Li, Maggie M. He, Wenting Duan

    Abstract: Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, th… ▽ More

    Submitted 24 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Reivew

  6. arXiv:2408.11623  [pdf, other

    cs.IR cs.LG

    End-to-End Cost-Effective Incentive Recommendation under Budget Constraint with Uplift Modeling

    Authors: Zexu Sun, Hao Yang, Dugang Liu, Yunpeng Weng, Xing Tang, Xiuqiang He

    Abstract: In modern online platforms, incentives are essential factors that enhance user engagement and increase platform revenue. Over recent years, uplift modeling has been introduced as a strategic approach to assign incentives to individual customers. Especially in many real-world applications, online platforms can only incentivize customers with specific budget constraints. This problem can be reformul… ▽ More

    Submitted 24 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted by RecSys 2024

  7. arXiv:2408.11429  [pdf, other

    cs.RO cs.AI

    Long-Range Vision-Based UAV-assisted Localization for Unmanned Surface Vehicles

    Authors: Waseem Akram, Siyuan Yang, Hailiang Kuang, Xiaoyu He, Muhayy Ud Din, Yihao Dong, Defu Lin, Lakmal Seneviratne, Shaoming He, Irfan Hussain

    Abstract: The global positioning system (GPS) has become an indispensable navigation method for field operations with unmanned surface vehicles (USVs) in marine environments. However, GPS may not always be available outdoors because it is vulnerable to natural interference and malicious jamming attacks. Thus, an alternative navigation system is required when the use of GPS is restricted or prohibited. To th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  8. arXiv:2408.11338  [pdf, other

    cs.AI cs.LG

    Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

    Authors: Minghao Liu, Zonglin Di, Jiaheng Wei, Zhongruo Wang, Hengxiang Zhang, Ruixuan Xiao, Haoyu Wang, Jinlong Pang, Hao Chen, Ankit Shah, Hongxin Wei, Xinlei He, Zhaowei Zhao, Haobo Wang, Lei Feng, Jindong Wang, James Davis, Yang Liu

    Abstract: Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (A… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  9. arXiv:2408.10541  [pdf, other

    cs.CV

    The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution

    Authors: Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu

    Abstract: Referring Video Object Segmentation is an emerging multi-modal task that aims to segment objects in the video given a natural language expression. In this work, we build two instance-centric models and fuse predicted results from frame-level and instance-level. First, we introduce instance mask into the DETR-based model for query initialization to achieve temporal enhancement and employ SAM for sp… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.13939

  10. arXiv:2408.10159  [pdf, other

    cs.IR cs.AI

    Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

    Authors: Xiaoyu Kong, Jiancan Wu, An Zhang, Leheng Sheng, Hui Lin, Xiang Wang, Xiangnan He

    Abstract: Sequential recommendation systems predict a user's next item of interest by analyzing past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches have applied LLMs to sequential recommendation through language generation paradigms. These methods convert user behavior se… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  11. arXiv:2408.09916  [pdf, other

    cs.CV cs.CL

    Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

    Authors: Qizhou Chen, Taolin Zhang, Chengyu Wang, Xiaofeng He, Dakan Wang, Tingting Liu

    Abstract: Model editing aims to correct outdated or erroneous knowledge in large models without costly retraining. Recent research discovered that the mid-layer representation of the subject's final token in a prompt has a strong influence on factual predictions, and developed Large Language Model (LLM) editing techniques based on this observation. However, for Vision-LLMs (VLLMs), how visual representation… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  12. arXiv:2408.08585  [pdf, other

    cs.IR cs.LG

    OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction

    Authors: Yunpeng Weng, Xing Tang, Zhenhao Xu, Fuyuan Lyu, Dugang Liu, Zexu Sun, Xiuqiang He

    Abstract: Customer Lifetime Value (CLTV) prediction is a critical task in business applications. Accurately predicting CLTV is challenging in real-world business scenarios, as the distribution of CLTV is complex and mutable. Firstly, there is a large number of users without any consumption consisting of a long-tailed part that is too complex to fit. Secondly, the small set of high-value users spent orders o… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: CIKM 2024

  13. arXiv:2408.08003  [pdf, other

    cs.CL

    Leveraging Web-Crawled Data for High-Quality Fine-Tuning

    Authors: Jing Zhou, Chenglin Jiang, Wei Shen, Xiao Zhou, Xiaonan He

    Abstract: Most large language models are fine-tuned using either expensive human-annotated data or GPT-4 generated data which cannot guarantee performance in certain domains. We argue that although the web-crawled data often has formatting errors causing semantic inaccuracies, it can still serve as a valuable source for high-quality supervised fine-tuning in specific domains without relying on advanced mode… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  14. arXiv:2408.07490  [pdf, other

    cs.CV

    Attention-Guided Perturbation for Unsupervised Image Anomaly Detection

    Authors: Tingfeng Huang, Yuxuan Cheng, Jingbo Xia, Rui Yu, Yuxuan Cai, Jinhai Xiang, Xinwei He, Xiang Bai

    Abstract: Reconstruction-based methods have significantly advanced modern unsupervised anomaly detection. However, the strong capacity of neural networks often violates the underlying assumptions by reconstructing abnormal samples well. To alleviate this issue, we present a simple yet effective reconstruction framework named Attention-Guided Pertuation Network (AGPNet), which learns to add perturbation nois… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  15. arXiv:2408.07476  [pdf, other

    cs.CV

    One Step Diffusion-based Super-Resolution with Time-Aware Distillation

    Authors: Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu

    Abstract: Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. However, these approaches typically require tens or even hundreds of iterative samplings, resulting in significant latency. Recently, techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowl… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages

  16. arXiv:2408.07369  [pdf, other

    cs.SI

    ProCom: A Few-shot Targeted Community Detection Algorithm

    Authors: Xixi Wu, Kaiyu Xiong, Yun Xiong, Xiaoxin He, Yao Zhang, Yizhu Jiao, Jiawei Zhang

    Abstract: Targeted community detection aims to distinguish a particular type of community in the network. This is an important task with a lot of real-world applications, e.g., identifying fraud groups in transaction networks. Traditional community detection methods fail to capture the specific features of the targeted community and detect all types of communities indiscriminately. Semi-supervised community… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by SIGKDD'2024

  17. arXiv:2408.06825  [pdf, other

    cs.CR cs.CV

    Membership Inference Attack Against Masked Image Modeling

    Authors: Zheng Li, Xinlei He, Ning Yu, Yang Zhang

    Abstract: Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition. The image encoder pre-trained through MIM, involving the masking and subsequent reconstruction of input images, attains state-of-the-art performance in various downstream vision tasks. However, most existing works focus on improving the performance of MIM.In this work… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  18. arXiv:2408.06653  [pdf, other

    cs.IR cs.AI

    Hierarchical Structured Neural Network for Retrieval

    Authors: Kaushik Rangadurai, Siyang Yuan, Minhui Huang, Yiqun Liu, Golnaz Ghasemiesfeh, Yunchen Pu, Xinfeng Xie, Xingfeng He, Fangzhou Xu, Andrew Cui, Vidhoon Viswanathan, Yan Dong, Liang Xiong, Lin Yang, Liang Wang, Jiyan Yang, Chonglin Sun

    Abstract: Embedding Based Retrieval (EBR) is a crucial component of the retrieval stage in (Ads) Recommendation System that utilizes Two Tower or Siamese Networks to learn embeddings for both users and items (ads). It then employs an Approximate Nearest Neighbor Search (ANN) to efficiently retrieve the most relevant ads for a specific user. Despite the recent rise to popularity in the industry, they have a… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 9 pages

  19. arXiv:2408.04600  [pdf, other

    cs.CV

    Improving Network Interpretability via Explanation Consistency Evaluation

    Authors: Hefeng Wu, Hao Jiang, Keze Wang, Ziyi Tang, Xianghuan He, Liang Lin

    Abstract: While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in IEEE Transactions on Multimedia

  20. arXiv:2408.04590  [pdf, other

    cs.LG

    Learn To Learn More Precisely

    Authors: Runxi Cheng, Yongxian Wei, Xianglong He, Wanyun Zhu, Songsong Huang, Fei Richard Yu, Fei Ma, Chun Yuan

    Abstract: Meta-learning has been extensively applied in the domains of few-shot learning and fast adaptation, achieving remarkable performance. While Meta-learning methods like Model-Agnostic Meta-Learning (MAML) and its variants provide a good set of initial parameters for the model, the model still tends to learn shortcut features, which leads to poor generalization. In this paper, we propose the formal c… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 10pages,4 figures, meta learning

  21. arXiv:2408.03220  [pdf, other

    cs.LG cs.DC

    Masked Random Noise for Communication Efficient Federaetd Learning

    Authors: Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, and Ruixuan Li

    Abstract: Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters withi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  22. arXiv:2408.03215  [pdf, other

    cs.LG cs.DC

    FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

    Authors: Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

    Abstract: Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize mode… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ICML 2024

  23. arXiv:2408.01952  [pdf, other

    cs.CV

    CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization

    Authors: Xiang He, Xiangxi Liu, Yang Li, Dongcheng Zhao, Guobin Shen, Qingqun Kong, Xin Yang, Yi Zeng

    Abstract: The audio-visual event localization task requires identifying concurrent visual and auditory events from unconstrained videos within a network model, locating them, and classifying their category. The efficient extraction and integration of audio and visual modal information have always been challenging in this field. In this paper, we introduce CACE-Net, which differs from most existing methods t… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024. Code is available at this https://github.com/Brain-Cog-Lab/CACE-Net

  24. arXiv:2408.01191  [pdf, other

    cs.CV

    A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

    Authors: Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, Yunpeng Cai

    Abstract: Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, b… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo

  25. arXiv:2407.20060  [pdf, other

    cs.LG cs.AI cs.DB

    RelBench: A Benchmark for Deep Learning on Relational Databases

    Authors: Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, Jure Leskovec

    Abstract: We present RelBench, a public benchmark for solving predictive tasks over relational databases with graph neural networks. RelBench provides databases and tasks spanning diverse domains and scales, and is intended to be a foundational infrastructure for future research. We use RelBench to conduct the first comprehensive study of Relational Deep Learning (RDL) (Fey et al., 2024), which combines gra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  26. arXiv:2407.19726  [pdf, other

    cs.CL cs.HC

    Do Text-to-Vis Benchmarks Test Real Use of Visualisations?

    Authors: Hy Nguyen, Xuefei He, Andrew Reeson, Cecile Paris, Josiah Poon, Jonathan K. Kummerfeld

    Abstract: Large language models are able to generate code for visualisations in response to user requests. This is a useful application, and an appealing one for NLP research because plots of data provide grounding for language. However, there are relatively few benchmarks, and it is unknown whether those that exist are representative of what people do in practice. This paper aims to answer that question th… ▽ More

    Submitted 15 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: ARR AE score of 4

  27. arXiv:2407.19208  [pdf, other

    cs.GR

    WindPoly: Polygonal Mesh Reconstruction via Winding Numbers

    Authors: Xin He, Chenlei Lv, Pengdi Huang, Hui Huang

    Abstract: Polygonal mesh reconstruction of a raw point cloud is a valuable topic in the field of computer graphics and 3D vision. Especially to 3D architectural models, polygonal mesh provides concise expressions for fundamental geometric structures while effectively reducing data volume. However, there are some limitations of traditional reconstruction methods: normal vector dependency, noisy points and de… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: European Conference on Computer Vision (Proceedings of ECCV 2024)

  28. arXiv:2407.17155  [pdf, other

    cs.CV

    FIIH: Fully Invertible Image Hiding for Secure and Robust

    Authors: Lang Huang, Lin Huo, Zheng Gan, Xinrong He

    Abstract: Image hiding is the study of techniques for covert storage and transmission, which embeds a secret image into a container image and generates stego image to make it similar in appearance to a normal image. However, existing image hiding methods have a serious problem that the hiding and revealing process cannot be fully invertible, which results in the revealing network not being able to recover t… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  29. arXiv:2407.17115  [pdf, other

    cs.IR

    Reinforced Prompt Personalization for Recommendation with Large Language Models

    Authors: Wenyu Mao, Jiancan Wu, Weijian Chen, Chongming Gao, Xiang Wang, Xiangnan He

    Abstract: Designing effective prompts can empower LLMs to understand user preferences and provide recommendations by leveraging LLMs' intent comprehension and knowledge utilization capabilities. However, existing research predominantly concentrates on task-wise prompting, developing fixed prompt templates composed of four patterns (i.e., role-playing, history records, reasoning guidance, and output format)… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  30. arXiv:2407.14872  [pdf, other

    cs.CV cs.RO

    Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

    Authors: Yanting Yang, Minghao Chen, Qibo Qiu, Jiahao Wu, Wenxiao Wang, Binbin Lin, Ziyu Guan, Xiaofei He

    Abstract: For a general-purpose robot to operate in reality, executing a broad range of instructions across various environments is imperative. Central to the reinforcement learning and planning for such robotic agents is a generalizable reward function. Recent advances in vision-language models, such as CLIP, have shown remarkable performance in the domain of deep learning, paving the way for open-domain v… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 camera-ready

  31. arXiv:2407.14491  [pdf, other

    cs.CV

    PD-TPE: Parallel Decoder with Text-guided Position Encoding for 3D Visual Grounding

    Authors: Chenshu Hou, Liang Peng, Xiaopei Wu, Wenxiao Wang, Xiaofei He

    Abstract: 3D visual grounding aims to locate the target object mentioned by free-formed natural language descriptions in 3D point cloud scenes. Most previous work requires the encoder-decoder to simultaneously align the attribute information of the target object and its relational information with the surrounding environment across modalities. This causes the queries' attention to be dispersed, potentially… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  32. arXiv:2407.14153  [pdf, other

    eess.IV cs.CV

    ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

    Authors: Qing Xu, Jiaxuan Li, Xiangjian He, Ziyu Liu, Zhen Chen, Wenting Duan, Chenxin Li, Maggie M. He, Fiseha B. Tesema, Wooi P. Cheah, Yi Wang, Rong Qu, Jonathan M. Garibaldi

    Abstract: The universality of deep neural networks across different modalities and their generalization capabilities to unseen domains play an essential role in medical image segmentation. The recent Segment Anything Model (SAM) has demonstrated its potential in both settings. However, the huge computational costs, demand for manual annotations as prompts and conflict-prone decoding process of SAM degrade i… ▽ More

    Submitted 17 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: Under Review

  33. arXiv:2407.13571  [pdf

    cs.CV cs.CL

    New Capability to Look Up an ASL Sign from a Video Example

    Authors: Carol Neidle, Augustine Opoku, Carey Ballard, Yang Zhou, Xiaoxiao He, Gregory Dimitriadis, Dimitris Metaxas

    Abstract: Looking up an unknown sign in an ASL dictionary can be difficult. Most ASL dictionaries are organized based on English glosses, despite the fact that (1) there is no convention for assigning English-based glosses to ASL signs; and (2) there is no 1-1 correspondence between ASL signs and English words. Furthermore, what if the user does not know either the meaning of the target sign or its possible… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 11 pages, 10 figures

  34. arXiv:2407.13181  [pdf, other

    cs.CV

    Training-Free Large Model Priors for Multiple-in-One Image Restoration

    Authors: Xuanhua He, Lang Li, Yingying Wang, Hui Zheng, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a nov… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  35. arXiv:2407.12705  [pdf, other

    cs.CV

    IMAGDressing-v1: Customizable Virtual Dressing

    Authors: Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang

    Abstract: Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  36. arXiv:2407.12489  [pdf, other

    cs.CV

    Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation

    Authors: Ruijie Xu, Chuyu Zhang, Hui Ren, Xuming He

    Abstract: We tackle the novel class discovery in point cloud segmentation, which discovers novel classes based on the semantic knowledge of seen classes. Existing work proposes an online point-wise clustering method with a simplified equal class-size constraint on the novel classes to avoid degenerate solutions. However, the inherent imbalanced distribution of novel classes in point clouds typically violate… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  37. arXiv:2407.12053  [pdf, other

    cs.LG cs.AI q-bio.QM

    Improving AlphaFlow for Efficient Protein Ensembles Generation

    Authors: Shaoning Li, Mingyu Li, Yusong Wang, Xinheng He, Nanning Zheng, Jian Zhang, Pheng-Ann Heng

    Abstract: Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still r… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024 AI4Science workshop

  38. arXiv:2407.10953  [pdf, other

    cs.CL

    MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models

    Authors: Chengguang Gan, Qingyu Yin, Xinyang He, Hanjun Wei, Yunhao Liang, Younghun Lim, Shijian Wang, Hexiang Huang, Qinghao Zhang, Shiwen Ni, Tatsunori Mori

    Abstract: The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that… ▽ More

    Submitted 17 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Under Review. 11 pages, 5 Figure

  39. arXiv:2407.10499  [pdf, other

    cs.CL

    CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

    Authors: Songyang Zhang, Chuyu Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen

    Abstract: While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation f… ▽ More

    Submitted 25 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review. The first three authors contribute equally, and Songyang Zhang is the project leader

  40. arXiv:2407.10196  [pdf, other

    cs.LG cs.AI

    A3S: A General Active Clustering Method with Pairwise Constraints

    Authors: Xun Deng, Junlong Liu, Han Zhong, Fuli Feng, Chen Shen, Xiangnan He, Jieping Ye, Zheng Wang

    Abstract: Active clustering aims to boost the clustering performance by integrating human-annotated pairwise constraints through strategic querying. Conventional approaches with semi-supervised clustering schemes encounter high query costs when applied to large datasets with numerous classes. To address these limitations, we propose a novel Adaptive Active Aggregation and Splitting (A3S) framework, falling… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  41. arXiv:2407.09003  [pdf, other

    cs.AI

    Enhancing Few-Shot Stock Trend Prediction with Large Language Models

    Authors: Yiqi Deng, Xingwei He, Jiahao Hu, Siu-Ming Yiu

    Abstract: The goal of stock trend prediction is to forecast future market movements for informed investment decisions. Existing methods mostly focus on predicting stock trends with supervised models trained on extensive annotated data. However, human annotation can be resource-intensive and the annotated data are not readily available. Inspired by the impressive few-shot capability of Large Language Models… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  42. arXiv:2407.08941  [pdf, other

    cs.IT

    Two Classes of Optimal Multi-Input Structures for Node Computations in Message Passing Algorithms

    Authors: Teng Lu, Xuan He, Xiaohu Tang

    Abstract: In this paper, we delve into the computations performed at a node within a message-passing algorithm. We investigate low complexity/latency multi-input structures that can be adopted by the node for computing outgoing messages y = (y1, y2, . . . , yn) from incoming messages x = (x1, x2, . . . , xn), where each yj , j = 1, 2, . . . , n is computed via a multi-way tree with leaves x excluding xj . S… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  43. arXiv:2407.08919  [pdf, other

    cs.NI cs.ET eess.SP

    Redefinition of Digital Twin and its Situation Awareness Framework Designing Towards Fourth Paradigm for Energy Internet of Things

    Authors: Xing He, Yuezhong Tang, Shuyan Ma, Qian Ai, Fei Tao, Robert Qiu

    Abstract: Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance S… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 16 pages, 15 figures Accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems

  44. arXiv:2407.08639  [pdf, other

    cs.AI cs.LG

    $β$-DPO: Direct Preference Optimization with Dynamic $β$

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$, as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the inf… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  45. arXiv:2407.07880  [pdf, other

    cs.LG cs.AI cs.CL

    Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robus… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  46. arXiv:2407.06564  [pdf, other

    cs.CL cs.AI

    Combining Knowledge Graphs and Large Language Models

    Authors: Amanda Kau, Xuzeng He, Aishwarya Nambissan, Aland Astudillo, Hui Yin, Amir Aryani

    Abstract: In recent years, Natural Language Processing (NLP) has played a significant role in various Artificial Intelligence (AI) applications such as chatbots, text generation, and language translation. The emergence of large language models (LLMs) has greatly improved the performance of these applications, showing astonishing results in language understanding and generation. However, they still show some… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  47. arXiv:2407.04794  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks

    Authors: Zesen Liu, Tianshuo Cong, Xinlei He, Qi Li

    Abstract: Large Language Models (LLMs) excel in various applications, including text generation and complex tasks. However, the misuse of LLMs raises concerns about the authenticity and ethical implications of the content they produce, such as deepfake news, academic fraud, and copyright infringement. Watermarking techniques, which embed identifiable markers in machine-generated text, offer a promising solu… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  48. arXiv:2407.04295  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Jailbreak Attacks and Defenses Against Large Language Models: A Survey

    Authors: Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Jiaxing Song, Ke Xu, Qi Li

    Abstract: Large Language Models (LLMs) have performed exceptionally in various text-generative tasks, including question answering, translation, code completion, etc. However, the over-assistance of LLMs has raised the challenge of "jailbreaking", which induces the model to generate malicious responses against the usage policy and society by designing adversarial prompts. With the emergence of jailbreak att… ▽ More

    Submitted 30 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  49. arXiv:2407.04179  [pdf, other

    cs.CL

    Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

    Authors: Xinglin Li, Xianwen He, Yao Li, Minhao Cheng

    Abstract: Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  50. arXiv:2407.04153  [pdf, other

    cs.LG cs.AI

    Mixture of A Million Experts

    Authors: Xu Owen He

    Abstract: The feedforward (FFW) layers in standard transformer architectures incur a linear increase in computational costs and activation memory as the hidden layer width grows. Sparse mixture-of-experts (MoE) architectures have emerged as a viable approach to address this issue by decoupling model size from computational cost. The recent discovery of the fine-grained MoE scaling law shows that higher gran… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.