Skip to main content

Showing 1–50 of 892 results for author: Liang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13621  [pdf, other

    cs.LG cs.AI cs.CR

    Differential Privacy Mechanisms in Neural Tangent Kernel Regression

    Authors: Jiuxiang Gu, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

    Abstract: Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal issues. To fundamentally understand how privacy mechanisms work in AI applications, we study differential privacy (DP) in the Neural Tangent Kernel (N… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13483  [pdf, other

    cs.CV

    SCAPE: A Simple and Strong Category-Agnostic Pose Estimator

    Authors: Yujia Liang, Zixuan Ye, Wenze Liu, Hao Lu

    Abstract: Category-Agnostic Pose Estimation (CAPE) aims to localize keypoints on an object of any category given few exemplars in an in-context manner. Prior arts involve sophisticated designs, e.g., sundry modules for similarity calculation and a two-stage framework, or takes in extra heatmap generation and supervision. We notice that CAPE is essentially a task about feature matching, which can be solved w… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Code is available at https://github.com/tiny-smart/SCAPE

  3. arXiv:2407.11486  [pdf, other

    cs.CV

    An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

    Authors: Jialong Huang, Gaojie Li, Shichao Kan, Jianfeng Liu, Yixiong Liang

    Abstract: Current cervical cytopathology whole slide image (WSI) screening primarily relies on detection-based approaches, which are limited in performance due to the expense and time-consuming annotation process. Multiple Instance Learning (MIL), a weakly supervised approach that relies solely on bag-level labels, can effectively alleviate these challenges. Nonetheless, MIL commonly employs frozen pretrain… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.10973  [pdf, other

    cs.AI

    Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

    Authors: Yongyuan Liang, Tingqiang Xu, Kaizhe Hu, Guangqi Jiang, Furong Huang, Huazhe Xu

    Abstract: Can we generate a control policy for an agent using just one demonstration of desired behaviors as a prompt, as effortlessly as creating an image from a textual description? In this paper, we present Make-An-Agent, a novel policy parameter generator that leverages the power of conditional diffusion models for behavior-to-policy generation. Guided by behavior embeddings that encode trajectory infor… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.10953  [pdf, other

    cs.CL

    MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models

    Authors: Chengguang Gan, Qingyu Yin, Xinyang He, Hanjun Wei, Yunhao Liang, Younghun Lim, Shijian Wang, Hexiang Huang, Qinghao Zhang, Shiwen Ni, Tatsunori Mori

    Abstract: The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under Review. 11 pages, 5 Figure

  6. arXiv:2407.05375  [pdf, other

    cs.LG cs.AI

    Online Drift Detection with Maximum Concept Discrepancy

    Authors: Ke Wan, Yi Liang, Susik Yoon

    Abstract: Continuous learning from an immense volume of data streams becomes exceptionally critical in the internet era. However, data streams often do not conform to the same distribution over time, leading to a phenomenon called concept drift. Since a fixed static model is unreliable for inferring concept-drifted data streams, establishing an adaptive mechanism for detecting concept drift is crucial. Curr… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  7. arXiv:2407.03900  [pdf, other

    cs.CV

    Oracle Bone Inscriptions Multi-modal Dataset

    Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

    Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  8. arXiv:2407.02038  [pdf, other

    cs.CV

    Camera-LiDAR Cross-modality Gait Recognition

    Authors: Wenxuan Guo, Yingping Liang, Zhiyu Pan, Ziheng Xi, Jianjiang Feng, Jie Zhou

    Abstract: Gait recognition is a crucial biometric identification technique. Camera-based gait recognition has been widely applied in both research and industrial fields. LiDAR-based gait recognition has also begun to evolve most recently, due to the provision of 3D structural information. However, in certain applications, cameras fail to recognize persons, such as in low-light environments and long-distance… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  9. arXiv:2407.01312  [pdf, other

    cs.CV

    ToCoAD: Two-Stage Contrastive Learning for Industrial Anomaly Detection

    Authors: Yun Liang, Zhiguang Hu, Junjie Huang, Donglin Di, Anyang Su, Lei Fan

    Abstract: Current unsupervised anomaly detection approaches perform well on public datasets but struggle with specific anomaly types due to the domain gap between pre-trained feature extractors and target-specific domains. To tackle this issue, this paper presents a two-stage training strategy, called \textbf{ToCoAD}. In the first stage, a discriminative network is trained by using synthetic anomalies in a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  10. arXiv:2407.01183  [pdf, other

    cs.DB

    TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval

    Authors: Wenbo Xu, Liang Yan, Peiyi Han, Haifeng Zhu, Chuanyi Liu, Shaoming Duan, Cuiyun Gao, Yingwei Liang

    Abstract: Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we pr… ▽ More

    Submitted 12 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  12. arXiv:2406.18549  [pdf

    eess.IV cs.CV

    Advancements in Feature Extraction Recognition of Medical Imaging Systems Through Deep Learning Technique

    Authors: Qishi Zhan, Dan Sun, Erdi Gao, Yuhan Ma, Yaxin Liang, Haowei Yang

    Abstract: This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simple… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: conference

  13. arXiv:2406.17442  [pdf, other

    cs.CV

    Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

    Authors: Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang

    Abstract: Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, w… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.17305  [pdf, other

    cs.CL

    Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

    Authors: Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang

    Abstract: The strong capability of large language models (LLMs) has been applied to information extraction (IE) through either retrieval augmented prompting or instruction tuning (IT). However, the best way to incorporate information with LLMs for IE remains an open question. In this paper, we explore Retrieval Augmented Instruction Tuning (RA-IT) for IE, focusing on the task of open named entity recognitio… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  15. arXiv:2406.16807  [pdf, other

    cs.LG cs.CL cs.CV

    Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

    Authors: Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham

    Abstract: Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  16. arXiv:2406.16734  [pdf, other

    cs.DS

    Scheduling with Obligatory Tests

    Authors: Konstantinos Dogeas, Thomas Erlebach, Ya-Chun Liang

    Abstract: Motivated by settings such as medical treatments or aircraft maintenance, we consider a scheduling problem with jobs that consist of two operations, a test and a processing part. The time required to execute the test is known in advance while the time required to execute the processing part becomes known only upon completion of the test. We use competitive analysis to study algorithms for minimizi… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    ACM Class: F.2.2

  17. arXiv:2406.16437  [pdf, other

    cs.LG cs.AI

    Theory on Mixture-of-Experts in Continual Learning

    Authors: Hongbo Li, Sen Lin, Lingjie Duan, Yingbin Liang, Ness B. Shroff

    Abstract: Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks. The Mixture-of-Experts (MoE) model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network to sparsify… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  18. arXiv:2406.16416  [pdf, other

    cs.CL

    Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

    Authors: Xue zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou

    Abstract: Multilingual knowledge editing (MKE) aims to simultaneously revise factual knowledge across multilingual languages within large language models (LLMs). However, most existing MKE methods just adapt existing monolingual editing methods to multilingual scenarios, overlooking the deep semantic connections of the same factual knowledge between different languages, thereby limiting edit performance. To… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures, 7 tables

  19. arXiv:2406.15306  [pdf

    cs.LG cs.CL cs.CV

    Advanced Multimodal Deep Learning Architecture for Image-Text Matching

    Authors: Jinyin Wang, Haijing Zhang, Yihao Zhong, Yingbin Liang, Rongwei Ji, Yiru Cang

    Abstract: Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17460 by other authors

  20. arXiv:2406.15215  [pdf

    cs.CR cs.CY econ.GN

    Sound and Fury, Signifying Nothing? Impact of Data Breach Disclosure Laws

    Authors: Muhammad Zia Hydari, Yangfan Liang, Rahul Telang

    Abstract: Data breach disclosure (DBD) is presumed to improve firms' cybersecurity practices by inducing fear of subsequent revenue loss. This revenue loss, the theory goes, will occur if customers punish an offending firm by refusing to buy from them and is assumed to be the primary mechanism through which DBD laws will change firm behavior ex ante. However, our analysis of a large-scale data breach at a U… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    ACM Class: K.4; K.5; K.6

  21. arXiv:2406.14635  [pdf, other

    cs.AI cs.LG

    Harvesting Efficient On-Demand Order Pooling from Skilled Couriers: Enhancing Graph Representation Learning for Refining Real-time Many-to-One Assignments

    Authors: Yile Liang, Jiuxia Zhao, Donghui Li, Jie Feng, Chen Zhang, Xuetao Ding, Jinghua Hao, Renqing He

    Abstract: The recent past has witnessed a notable surge in on-demand food delivery (OFD) services, offering delivery fulfillment within dozens of minutes after an order is placed. In OFD, pooling multiple orders for simultaneous delivery in real-time order assignment is a pivotal efficiency source, which may in turn extend delivery time. Constructing high-quality order pooling to harmonize platform efficien… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted in KDD 2024 ADS Track

  22. arXiv:2406.14043  [pdf, other

    cs.IR cs.CL

    Taxonomy-Guided Zero-Shot Recommendations with LLMs

    Authors: Yueqing Liang, Liangwei Yang, Chen Wang, Xiongxiao Xu, Philip S. Yu, Kai Shu

    Abstract: With the emergence of large language models (LLMs) and their ability to perform a variety of tasks, their application in recommender systems (RecSys) has shown promise. However, we are facing significant challenges when deploying LLMs into RecSys, such as limited prompt length, unstructured item information, and un-constrained generation of recommendations, leading to sub-optimal performance. To a… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  23. arXiv:2406.14036  [pdf, other

    cs.LG cs.AI cs.CL

    Toward Infinite-Long Prefix in Transformer

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

    Abstract: Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoretical understanding of how these methods work. In this paper, we aim to relieve this limitation by studying the learning ability of Prefix Learning fro… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  24. arXiv:2406.13672  [pdf, other

    cs.CV

    Q-SNNs: Quantized Spiking Neural Networks

    Authors: Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence. However, the current focus within the SNN community prioritizes accuracy optimization through the development of large-scale models, limiting their viability in r… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  25. PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search with Supplementary Materials

    Authors: Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xianjing Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann

    Abstract: Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images. Recent work has initiated a new research direction to find accurate information within a local area covered by one satellite image centered at a location prior (… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ACM Multimedia 2023. This version contains additional supplementary materials

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023) 56-66

  26. arXiv:2406.13358  [pdf, other

    cs.CV eess.IV

    Multi-scale Restoration of Missing Data in Optical Time-series Images with Masked Spatial-Temporal Attention Network

    Authors: Zaiyan Zhang, Jining Yan, Yuanqi Liang, Jiaxin Feng, Haixu He, Wei Han

    Abstract: Due to factors such as thick cloud cover and sensor limitations, remote sensing images often suffer from significant missing data, resulting in incomplete time-series information. Existing methods for imputing missing values in remote sensing images do not fully exploit spatio-temporal auxiliary information, leading to limited accuracy in restoration. Therefore, this paper proposes a novel deep le… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.12747  [pdf, other

    cs.LG cs.AI

    TSI-Bench: Benchmarking Time Series Imputation

    Authors: Wenjie Du, Jun Wang, Linglong Qian, Yiyuan Yang, Fanxing Liu, Zepu Wang, Zina Ibrahim, Haoxin Liu, Zhiyuan Zhao, Yingjie Zhou, Wenjia Wang, Kaize Ding, Yuxuan Liang, B. Aditya Prakash, Qingsong Wen

    Abstract: Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellen… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  28. arXiv:2406.12539  [pdf, other

    cs.LG cs.AI

    The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

    Authors: Kun Wang, Guibin Zhang, Xinnan Zhang, Junfeng Fang, Xun Wu, Guohao Li, Shirui Pan, Wei Huang, Yuxuan Liang

    Abstract: Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{fo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  29. arXiv:2406.12091  [pdf, other

    cs.LG cs.CL cs.CR

    Is poisoning a real threat to LLM alignment? Maybe more so than you think

    Authors: Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang

    Abstract: Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLH… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Journal ref: ICML 2024 Workshop MHFAIA

  30. arXiv:2406.08838  [pdf

    cs.CL cs.AI cs.LG

    Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

    Authors: Dan Sun, Yaxin Liang, Yining Yang, Yuhan Ma, Qishi Zhan, Erdi Gao

    Abstract: This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two group… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  31. arXiv:2406.08837  [pdf

    eess.IV cs.CV cs.LG

    Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

    Authors: Houze Liu, Iris Li, Yaxin Liang, Dan Sun, Yining Yang, Haowei Yang

    Abstract: Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  32. arXiv:2406.07648  [pdf, other

    cs.CV

    M-LRM: Multi-view Large Reconstruction Model

    Authors: Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  33. arXiv:2406.06542  [pdf, other

    cs.AR cs.LG

    vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

    Authors: Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang

    Abstract: IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile devices, which makes it challenging to map DNNs onto MCUs. Previous work separates memory management and kernel implementation for MCU and relies on coarse-graine… ▽ More

    Submitted 1 May, 2024; originally announced June 2024.

  34. arXiv:2406.04584  [pdf, other

    cs.LG cs.AI cs.CV

    CLoG: Benchmarking Continual Learning of Image Generation Models

    Authors: Haotian Zhang, Junting Zhou, Haowei Lin, Hang Ye, Jianhua Zhu, Zihao Wang, Liangcai Gao, Yizhou Wang, Yitao Liang

    Abstract: Continual Learning (CL) poses a significant challenge in Artificial Intelligence, aiming to mirror the human ability to incrementally acquire knowledge and skills. While extensive research has focused on CL within the context of classification tasks, the advent of increasingly powerful generative models necessitates the exploration of Continual Learning of Generative models (CLoG). This paper advo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  35. arXiv:2406.03710  [pdf, other

    cs.LG cs.AI

    TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

    Authors: Jiaxi Hu, Qingsong Wen, Sijie Ruan, Li Liu, Yuxuan Liang

    Abstract: Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-sta… ▽ More

    Submitted 14 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  36. arXiv:2406.01934  [pdf, other

    cs.CL

    Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking

    Authors: Zefeng Zhang, Jiawei Sheng, Chuang Zhang, Yunzhi Liang, Wenyuan Zhang, Siqi Wang, Tingwen Liu

    Abstract: Multimodal Entity Linking (MEL) aims to link ambiguous mentions in multimodal contexts to entities in a multimodal knowledge graph. A pivotal challenge is to fully leverage multi-element correlations between mentions and entities to bridge modality gap and enable fine-grained semantic matching. Existing methods attempt several local correlative mechanisms, relying heavily on the automatically lear… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  37. arXiv:2406.01467  [pdf, other

    cs.GR cs.CV

    RaDe-GS: Rasterizing Depth in Gaussian Splatting

    Authors: Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, Ping Tan

    Abstract: Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent tech… ▽ More

    Submitted 24 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  38. arXiv:2406.01391  [pdf, other

    astro-ph.IM cs.DL

    Knowledge Graph in Astronomical Research with Large Language Models: Quantifying Driving Forces in Interdisciplinary Scientific Discovery

    Authors: Zechang Sun, Yuan-Sen Ting, Yaobo Liang, Nan Duan, Song Huang, Zheng Cai

    Abstract: Identifying and predicting the factors that contribute to the success of interdisciplinary research is crucial for advancing scientific discovery. However, there is a lack of methods to quantify the integration of new ideas and technological advancements in astronomical research and how these new technologies drive further scientific breakthroughs. Large language models, with their ability to extr… ▽ More

    Submitted 15 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: An interactive version of the knowledge graph is made publicly available at https://astrokg.github.io/. Accepted to IJCAI 2024 AI4Research Workshop. Comments are welcome

  39. arXiv:2405.19711  [pdf

    cs.DS

    SimiSketch: Efficiently Estimating Similarity of streaming Multisets

    Authors: Fenghao Dong, Yang He, Yutong Liang, Zirui Liu, Yuhan Wu, Peiqing Chen, Tong Yang

    Abstract: The challenge of estimating similarity between sets has been a significant concern in data science, finding diverse applications across various domains. However, previous approaches, such as MinHash, have predominantly centered around hashing techniques, which are well-suited for sets but less naturally adaptable to multisets, a common occurrence in scenarios like network streams and text data. Mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  40. arXiv:2405.19592  [pdf, other

    cs.LG cs.AI cs.CL

    Why Larger Language Models Do In-context Learning Differently?

    Authors: Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

    Abstract: Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  41. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 10 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  42. arXiv:2405.18910  [pdf, other

    cs.AI

    Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

    Authors: Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

    Abstract: The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domain… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024 (Multi-Year Track On AI And Social Good with ~20% acceptance rate)

  43. arXiv:2405.16418  [pdf, other

    cs.LG cs.AI cs.CV

    Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

    Abstract: Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this paper, we bridge this gap by providing a detailed examination of these smoothness properties for the case where the target data distribution is a mixtur… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  44. arXiv:2405.16411  [pdf, other

    cs.LG cs.AI cs.CL

    Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

    Abstract: Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention. However, the $Ω(n^3)$ time complexity of tensor attention poses a significant obstacle to its practical implementation in transformers, where $n$ is the input sequence length. In this work, we prove that the… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  45. arXiv:2405.16312  [pdf, other

    cs.LG cs.AI

    Time-SSM: Simplifying and Unifying State Space Models for Time Series Forecasting

    Authors: Jiaxi Hu, Disen Lan, Ziyu Zhou, Qingsong Wen, Yuxuan Liang

    Abstract: State Space Models (SSMs) have emerged as a potent tool in sequence modeling tasks in recent years. These models approximate continuous systems using a set of basis functions and discretize them to handle input data, making them well-suited for modeling time series data collected at specific frequencies from continuous systems. Despite its potential, the application of SSMs in time series forecast… ▽ More

    Submitted 14 July, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.11463

  46. arXiv:2405.15317  [pdf, other

    cs.LG cs.AI

    NuwaTS: a Foundation Model Mending Every Incomplete Time Series

    Authors: Jinguo Cheng, Chunwei Yang, Wanlin Cai, Yuxuan Liang, Yuankai Wu

    Abstract: Time series imputation plays a crucial role in various real-world systems and has been extensively explored. Models for time series imputation often require specialization, necessitating distinct designs for different domains and missing patterns. In this study, we introduce NuwaTS, a framework to repurpose Pre-trained Language Model (PLM) for general time series imputation. Once trained, this mod… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures

  47. arXiv:2405.15125  [pdf, other

    cs.CV

    HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

    Authors: Yuanhao Cai, Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille

    Abstract: High dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: The first 3D Gaussian Splatting-based method for HDR imaging

  48. arXiv:2405.14979  [pdf, other

    cs.GR cs.CV

    CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

    Authors: Weiyu Li, Jiarui Liu, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, Xiaoxiao Long

    Abstract: We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner. Despite the significant advancements in 3D generation, existing methods still struggle with lengthy optimization processes, irregular mes… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: HomePage: https://craftsman3d.github.io/, Code: https://github.com/wyysf-98/CraftsMan

  49. arXiv:2405.14252  [pdf, other

    cs.LG

    Time-FFM: Towards LM-Empowered Federated Foundation Model for Time Series Forecasting

    Authors: Qingxiang Liu, Xu Liu, Chenghao Liu, Qingsong Wen, Yuxuan Liang

    Abstract: Unlike natural language processing and computer vision, the development of Foundation Models (FMs) for time series forecasting is blocked due to data scarcity. While recent efforts are focused on building such FMs by unlocking the potential of language models (LMs) for time series analysis, dedicated parameters for various downstream forecasting tasks need training, which hinders the common knowle… ▽ More

    Submitted 25 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  50. arXiv:2405.14135  [pdf, other

    cs.LG cs.AI

    Learning Geospatial Region Embedding with Heterogeneous Graph

    Authors: Xingchen Zou, Jiani Huang, Xixuan Hao, Yuhao Yang, Haomin Wen, Yibo Yan, Chao Huang, Yuxuan Liang

    Abstract: Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.