Skip to main content

Showing 1–50 of 291 results for author: He, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11183  [pdf, other

    cs.LG

    Differentiable Neural-Integrated Meshfree Method for Forward and Inverse Modeling of Finite Strain Hyperelasticity

    Authors: Honghui Du, Binyao Guo, QiZhi He

    Abstract: The present study aims to extend the novel physics-informed machine learning approach, specifically the neural-integrated meshfree (NIM) method, to model finite-strain problems characterized by nonlinear elasticity and large deformations. To this end, the hyperelastic material models are integrated into the loss function of the NIM method by employing a consistent local variational formulation. Th… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.10485  [pdf, other

    cs.CV

    Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss

    Authors: Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson

    Abstract: Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces tracking difficulties caused by large and irregular motion, and insufficient training due to the motion long-tailed distribution of current UAV-MOT datasets. Previous UAV-MOT methods either extract motion and detection features redundantly or supervise motio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.07207

  3. arXiv:2407.05368  [pdf, other

    cs.SD cs.AI cs.IR eess.AS

    Music Era Recognition Using Supervised Contrastive Learning and Artist Information

    Authors: Qiqi He, Xuchen Song, Weituo Hao, Ju-Chiang Wang, Wei-Tsung Lu, Wei Li

    Abstract: Does popular music from the 60s sound different than that of the 90s? Prior study has shown that there would exist some variations of patterns and regularities related to instrumentation changes and growing loudness across multi-decadal trends. This indicates that perceiving the era of a song from musical features such as audio and artist information is possible. Music era information can be an im… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  4. arXiv:2407.00294  [pdf, other

    math.NA cs.LG physics.comp-ph

    Deep Neural Networks with Symplectic Preservation Properties

    Authors: Qing He, Wei Cai

    Abstract: We propose a deep neural network architecture designed such that its output forms an invertible symplectomorphism of the input. This design draws an analogy to the real-valued non-volume-preserving (real NVP) method used in normalizing flow techniques. Utilizing this neural network type allows for learning tasks on unknown Hamiltonian systems without breaking the inherent symplectic structure of t… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    MSC Class: 37J11; 70H15; 68T07

  5. arXiv:2406.19859  [pdf, other

    cs.AI cs.HC cs.MM

    MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

    Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, Jin-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann

    Abstract: MetaDesigner revolutionizes artistic typography synthesis by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. At the core of this framework lies a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively enable the creation of customized WordArt, ranging from semantic enhancements to the imposition… ▽ More

    Submitted 4 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: 18 pages, 16 figures, Project: https://modelscope.cn/studios/WordArt/WordArt

  6. arXiv:2406.10902  [pdf, other

    cs.CV cs.CL

    Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models

    Authors: Yikai Zhang, Qianyu He, Xintao Wang, Siyu Yuan, Jiaqing Liang, Yanghua Xiao

    Abstract: Multi-Modal Knowledge Graphs (MMKGs) have proven valuable for various downstream tasks. However, scaling them up is challenging because building large-scale MMKGs often introduces mismatched images (i.e., noise). Most entities in KGs belong to the long tail, meaning there are few images of them available online. This scarcity makes it difficult to determine whether a found image matches the entity… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.10517  [pdf, other

    cs.IR cs.AI cs.LG

    ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

    Authors: Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, Jie Jiang

    Abstract: Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  8. arXiv:2406.09422  [pdf, other

    cs.DC cs.AI cs.CE cs.CR

    LooPIN: A PinFi protocol for decentralized computing

    Authors: Yunwei Mao, Qi He, Ju Li

    Abstract: Networked computing power is a critical utility in the era of artificial intelligence. This paper presents a novel Physical Infrastructure Finance (PinFi) protocol designed to facilitate the distribution of computing power within networks in a decentralized manner. Addressing the core challenges of coordination, pricing, and liquidity in decentralized physical infrastructure networks (DePIN), the… ▽ More

    Submitted 29 March, 2024; originally announced June 2024.

  9. arXiv:2406.08122  [pdf

    eess.AS cs.SD

    Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

    Authors: Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

    Abstract: It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for publication on Interspeech 2024. 5 pages, 3 figures, 5 tables

  10. arXiv:2406.08119  [pdf

    eess.AS cs.SD

    Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

    Authors: Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He

    Abstract: This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for publication on Interspeech 2024. 5 pages, 4 figures, 3 tables

  11. arXiv:2406.06464  [pdf, other

    cs.AI cs.CL

    Transforming Wearable Data into Health Insights using Large Language Model Agents

    Authors: Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, Xin Liu

    Abstract: Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 38 pages

  12. arXiv:2406.03262  [pdf, other

    cs.CV

    ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection

    Authors: Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong Liu

    Abstract: Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across differen… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  13. arXiv:2406.01103  [pdf, other

    cs.AI cs.HC cs.LG

    Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

    Authors: Chen Zhang, Qiang He, Zhou Yuan, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang

    Abstract: Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a p… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept at ICML 2024

  14. arXiv:2405.20081  [pdf, other

    cs.CV cs.AI

    NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

    Authors: Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang

    Abstract: Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures with supplementary material

  15. arXiv:2405.17741  [pdf, other

    cs.AI

    LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

    Authors: Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu

    Abstract: Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures. Though such dynamic adapters incur modest computational complexity, they surprisingly lead to huge inference latency overhead, slowing down the decoding speed by 2.5+ times. In this p… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  16. arXiv:2405.17718  [pdf, other

    cs.CV cs.LG

    AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval

    Authors: Sihe Zhang, Qingdong He, Jinlong Peng, Yuxi Li, Zhengkai Jiang, Jiafu Wu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Image retrieval aims to identify visually similar images within a database using a given query image. Traditional methods typically employ both global and local features extracted from images for matching, and may also apply re-ranking techniques to enhance accuracy. However, these methods often fail to account for the noise present in query images, which can stem from natural or human-induced fac… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  17. arXiv:2405.16265  [pdf, other

    cs.LG

    MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

    Authors: Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Qianyi Sun, Boxing Chen, Dong Li, Xu He, Quan He, Feng Wen, Jianye Hao, Jun Yao

    Abstract: Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datase… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  18. arXiv:2405.15580  [pdf, other

    cs.CV

    Open-Vocabulary SAM3D: Understand Any 3D Scene

    Authors: Hanchen Tai, Qingdong He, Jiangning Zhang, Yijie Qian, Zhenyu Zhang, Xiaobin Hu, Yabiao Wang, Yong Liu

    Abstract: Open-vocabulary 3D scene understanding presents a significant challenge in the field. Recent advancements have sought to transfer knowledge embedded in vision language models from the 2D domain to 3D domain. However, these approaches often require learning prior knowledge from specific 3D scene datasets, which limits their applicability in open-world scenarios. The Segment Anything Model (SAM) has… ▽ More

    Submitted 21 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://hithqd.github.io/projects/OV-SAM3D

  19. arXiv:2405.15214  [pdf, other

    cs.CV

    PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

    Authors: Qingdong He, Jiangning Zhang, Jinlong Peng, Haoyang He, Yabiao Wang, Chengjie Wang

    Abstract: Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  20. arXiv:2405.14210  [pdf, other

    cs.CV eess.IV

    Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds

    Authors: Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang

    Abstract: Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect tha… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint

  21. arXiv:2405.13902  [pdf, other

    cs.LG cs.AI

    LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework

    Authors: Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, Qing He

    Abstract: Recent prevailing works on graph machine learning typically follow a similar methodology that involves designing advanced variants of graph neural networks (GNNs) to maintain the superior performance of GNNs on different graphs. In this paper, we aim to streamline the GNN design process and leverage the advantages of Large Language Models (LLMs) to improve the performance of GNNs on downstream tas… ▽ More

    Submitted 6 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  22. arXiv:2405.12490  [pdf, other

    cs.CV

    Customize Your Own Paired Data via Few-shot Way

    Authors: Jinshu Chen, Bingchuan Li, Miao Hua, Panpan Xu, Qian He

    Abstract: Existing solutions to image editing tasks suffer from several issues. Though achieving remarkably satisfying generated results, some supervised methods require huge amounts of paired training data, which greatly limits their usages. The other unsupervised methods take full advantage of large-scale pre-trained priors, thus being strictly restricted to the domains where the priors are trained on and… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by AI4CC CVPR2024 WorkShop

  23. arXiv:2405.04828  [pdf, other

    cs.CL

    ChuXin: 1.6B Technical Report

    Authors: Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu

    Abstract: In this report, we present ChuXin, an entirely open-source language model with a size of 1.6 billion parameters. Unlike the majority of works that only open-sourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research communit… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Technical Report

  24. arXiv:2405.03349  [pdf, other

    cs.CV

    Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement

    Authors: Jiesong Bai, Yuhao Yin, Qiyuan He, Yuanxian Li, Xiaofeng Zhang

    Abstract: In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preserva… ▽ More

    Submitted 19 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  25. arXiv:2405.00236  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    STT: Stateful Tracking with Transformers for Autonomous Driving

    Authors: Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, Congcong Li

    Abstract: Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: ICRA 2024

  26. arXiv:2404.18057  [pdf, other

    cs.CL

    Efficient LLM Inference with Kcache

    Authors: Qiaozhi He, Zhihua Wu

    Abstract: Large Language Models(LLMs) have had a profound impact on AI applications, particularly in the domains of long-text comprehension and generation. KV Cache technology is one of the most widely used techniques in the industry. It ensures efficient sequence generation by caching previously computed KV states. However, it also introduces significant memory overhead. We discovered that KV Cache is not… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Technical Report, 8 pages

  27. arXiv:2404.16022  [pdf, other

    cs.CV

    PuLID: Pure and Lightning ID Customization via Contrastive Alignment

    Authors: Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He

    Abstract: We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior perform… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Tech Report. Codes and models will be available at https://github.com/ToTheBeginning/PuLID

  28. arXiv:2404.15846  [pdf, other

    cs.CL

    From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models

    Authors: Qianyu He, Jie Zeng, Qianxi He, Jiaqing Liang, Yanghua Xiao

    Abstract: It is imperative for Large language models (LLMs) to follow instructions with elaborate requirements (i.e. Complex Instructions Following). Yet, it remains under-explored how to enhance the ability of LLMs to follow complex instructions with multiple constraints. To bridge the gap, we initially study what training data is effective in enhancing complex constraints following abilities. We found tha… ▽ More

    Submitted 18 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  29. arXiv:2404.14705  [pdf, other

    cs.CV

    Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

    Authors: Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin Jin

    Abstract: This work addresses the 3D situated reasoning task which aims to answer questions given egocentric observations in a 3D environment. The task remains challenging as it requires comprehensive 3D perception and complex reasoning skills. End-to-end models trained on supervised data for 3D situated reasoning suffer from data scarcity and generalization ability. Inspired by the recent success of levera… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  30. arXiv:2404.12754  [pdf, other

    cs.LG cs.AI

    Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation

    Authors: Qiang He, Tianyi Zhou, Meng Fang, Setareh Maghsudi

    Abstract: Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation r… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR23; Code: https://github.com/sweetice/BEER-ICLR2024

  31. arXiv:2404.11326  [pdf, other

    cs.CV

    Single-temporal Supervised Remote Change Detection for Domain Generalization

    Authors: Qiangang Du, Jinlong Peng, Xu Chen, Qingdong He, Liren He, Qiang Nie, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Change detection is widely applied in remote sensing image analysis. Existing methods require training models separately for each dataset, which leads to poor domain generalization. Moreover, these methods rely heavily on large amounts of high-quality pair-labelled data for training, which is expensive and impractical. In this paper, we propose a multimodal contrastive learning (ChangeCLIP) based… ▽ More

    Submitted 23 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  32. arXiv:2404.11318  [pdf, other

    cs.CV

    Leveraging Fine-Grained Information and Noise Decoupling for Remote Sensing Change Detection

    Authors: Qiangang Du, Jinlong Peng, Changan Wang, Xu Chen, Qingdong He, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Change detection aims to identify remote sense object changes by analyzing data between bitemporal image pairs. Due to the large temporal and spatial span of data collection in change detection image pairs, there are often a significant amount of task-specific and task-agnostic noise. Previous effort has focused excessively on denoising, with this goes a great deal of loss of fine-grained informat… ▽ More

    Submitted 21 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  33. arXiv:2404.10253  [pdf, other

    cs.DC

    Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

    Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

    Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 18 pages, 13 figures

  34. arXiv:2404.08681  [pdf, other

    cs.CL

    EFSA: Towards Event-Level Financial Sentiment Analysis

    Authors: Tianyu Chen, Yiming Zhang, Guoxin Yu, Dapeng Zhang, Li Zeng, Qing He, Xiang Ao

    Abstract: In this paper, we extend financial sentiment analysis~(FSA) to event-level since events usually serve as the subject of the sentiment in financial text. Though extracting events from the financial text may be conducive to accurate sentiment predictions, it has specialized challenges due to the lengthy and discontinuity of events in a financial text. To this end, we reconceptualize the event extrac… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  35. arXiv:2404.06564  [pdf, other

    cs.CV

    MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

    Authors: Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie

    Abstract: Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches. However, CNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Mamba-based models, with their superior long-range modeling and linear efficiency, have garnered substantial attention. This study pioneers the application of Mamba to mu… ▽ More

    Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  36. arXiv:2404.04293  [pdf, other

    cs.CL cs.AI

    Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding

    Authors: Yanda Li, Dixuan Wang, Jiaqing Liang, Guochao Jiang, Qianyu He, Yanghua Xiao, Deqing Yang

    Abstract: Large Language Models (LLMs) have demonstrated good performance in many reasoning tasks, but they still struggle with some complicated reasoning tasks including logical reasoning. One non-negligible reason for LLMs' suboptimal performance on logical reasoning is their overlooking of understanding logical fallacies correctly. To evaluate LLMs' capability of logical fallacy understanding (LFU), we p… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  37. arXiv:2404.02174  [pdf, other

    cs.GT cs.AI cs.CE

    Bounds of Block Rewards in Honest PinFi Systems

    Authors: Qi He, Yunwei Mao, Ju Li

    Abstract: PinFi is a class of novel protocols for decentralized pricing of dissipative assets, whose value naturally declines over time. Central to the protocol's functionality and its market efficiency is the role of liquidity providers (LPs). This study addresses critical stability and sustainability challenges within the protocol, namely: the propensity of LPs to prefer selling in external markets over p… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  38. arXiv:2403.19121  [pdf, other

    cs.CL

    Code Comparison Tuning for Code Large Language Models

    Authors: Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu

    Abstract: We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version c… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Preprint

  39. arXiv:2403.17924  [pdf, other

    cs.CV cs.AI

    AID: Attention Interpolation of Text-to-Image Diffusion

    Authors: Qiyuan He, Jinghao Wang, Ziwei Liu, Angela Yao

    Abstract: Conditional diffusion models can create unseen images in various settings, aiding image interpolation. Interpolation in latent spaces is well-studied, but interpolation with specific conditions like text or poses is less understood. Simple approaches, such as linear interpolation in the space of conditions, often result in images that lack consistency, smoothness, and fidelity. To that end, we int… ▽ More

    Submitted 18 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  40. arXiv:2403.16396  [pdf, other

    cs.CL

    Is There a One-Model-Fits-All Approach to Information Extraction? Revisiting Task Definition Biases

    Authors: Wenhao Huang, Qianyu He, Zhixu Li, Jiaqing Liang, Yanghua Xiao

    Abstract: Definition bias is a negative phenomenon that can mislead models. Definition bias in information extraction appears not only across datasets from different domains but also within datasets sharing the same domain. We identify two types of definition bias in IE: bias among information extraction datasets and bias between information extraction datasets and instruction tuning datasets. To systematic… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 15 pages, 4 figures

  41. arXiv:2403.15010  [pdf, other

    cs.CV cs.CR

    Clean-image Backdoor Attacks

    Authors: Dazhong Rong, Guoyao Yu, Shuheng Shen, Xinyi Fu, Peng Qian, Jianhai Chen, Qinming He, Xing Fu, Weiqiang Wang

    Abstract: To gather a significant quantity of annotated training data for high-performance image classification models, numerous companies opt to enlist third-party providers to label their unlabeled data. This practice is widely regarded as secure, even in cases where some annotated errors occur, as the impact of these minor inaccuracies on the final performance of the models is negligible and existing bac… ▽ More

    Submitted 26 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  42. arXiv:2403.11101  [pdf, other

    cs.CV

    Hierarchical Generative Network for Face Morphing Attacks

    Authors: Zuyuan He, Zongyong Deng, Qiaoyun He, Qijun Zhao

    Abstract: Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rel… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by FG2024

  43. arXiv:2403.09479  [pdf, other

    cs.LG cs.AI cs.CL

    Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks

    Authors: Yuncheng Huang, Qianyu He, Yipei Xu, Jiaqing Liang, Yanghua Xiao

    Abstract: Current language models have demonstrated their capability to develop basic reasoning, but struggle in more complicated reasoning tasks that require a combination of atomic skills, such as math word problem requiring skills like arithmetic and unit conversion. Previous methods either do not improve the inherent atomic skills of models or not attempt to generalize the atomic skills to complex reaso… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  44. arXiv:2403.06951  [pdf, other

    cs.CV

    DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

    Authors: Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang

    Abstract: The diffusion-based text-to-image model harbors immense potential in transferring reference style. However, current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this paper, we introduce DEADiff to address this issue using the following two strategies: 1) a mechanism to decouple the style and semantics of reference imag… ▽ More

    Submitted 11 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  45. arXiv:2403.06606  [pdf, other

    cs.CV cs.LG

    Distributionally Generative Augmentation for Fair Facial Attribute Classification

    Authors: Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang

    Abstract: Facial Attribute Classification (FAC) holds substantial promise in widespread applications. However, FAC models trained by traditional methodologies can be unfair by exhibiting accuracy inconsistencies across varied data subpopulations. This unfairness is largely attributed to bias in data, where some spurious attributes (e.g., Male) statistically correlate with the target attribute (e.g., Smiling… ▽ More

    Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  46. arXiv:2403.06456  [pdf, other

    cs.DB cs.LG

    A Survey of Learned Indexes for the Multi-dimensional Space

    Authors: Abdullah Al-Mamun, Hao Wu, Qiyang He, Jianguo Wang, Walid G. Aref

    Abstract: A recent research trend involves treating database index structures as Machine Learning (ML) models. In this domain, single or multiple ML models are trained to learn the mapping from keys to positions inside a data set. This class of indexes is known as "Learned Indexes." Learned indexes have demonstrated improved search performance and reduced space requirements for one-dimensional data. The con… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  47. arXiv:2403.06403  [pdf, other

    cs.CV

    PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models

    Authors: Qingdong He, Jinlong Peng, Zhengkai Jiang, Xiaobin Hu, Jiangning Zhang, Qiang Nie, Yabiao Wang, Chengjie Wang

    Abstract: Recent success of vision foundation models have shown promising performance for the 2D perception tasks. However, it is difficult to train a 3D foundation network directly due to the limited dataset and it remains under explored whether existing foundation models can be lifted to 3D space seamlessly. In this paper, we present PointSeg, a novel training-free paradigm that leverages off-the-shelf vi… ▽ More

    Submitted 17 July, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  48. arXiv:2403.00483  [pdf, other

    cs.CV

    RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

    Authors: Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang

    Abstract: Text-to-image customization, which aims to synthesize text-driven images for the given subjects, has recently revolutionized content creation. Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text. However, the inherent entangled influence scope of pseudo-words with the given text results in a dual-optimum parad… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  49. arXiv:2403.00336  [pdf, other

    cs.RO cs.AI

    Never-Ending Behavior-Cloning Agent for Robotic Manipulation

    Authors: Wenqi Liang, Gan Sun, Qian He, Yu Ren, Jiahua Dong, Yang Cong

    Abstract: Relying on multi-modal observations, embodied robots could perform multiple robotic manipulation tasks in unstructured real-world environments. However, most language-conditioned behavior-cloning agents still face existing long-standing challenges, i.e., 3D scene representation and human-level task learning, when adapting into new sequential tasks in practical scenarios. We here investigate these… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 17 pages, 6 figures, 9 tables

  50. arXiv:2402.17322  [pdf, other

    cs.CG cs.DS

    Enclosing Points with Geometric Objects

    Authors: Timothy M. Chan, Qizheng He, Jie Xue

    Abstract: Let $X$ be a set of points in $\mathbb{R}^2$ and $\mathcal{O}$ be a set of geometric objects in $\mathbb{R}^2$, where $|X| + |\mathcal{O}| = n$. We study the problem of computing a minimum subset $\mathcal{O}^* \subseteq \mathcal{O}$ that encloses all points in $X$. Here a point $x \in X$ is enclosed by $\mathcal{O}^*$ if it lies in a bounded connected component of… ▽ More

    Submitted 1 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: In SoCG'24