Skip to main content

Showing 1–50 of 289 results for author: Cai, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09360  [pdf, other

    cs.LG math.OC

    Novel clustered federated learning based on local loss

    Authors: Endong Gu, Yongxin Chen, Hao Wen, Xingju Cai, Deren Han

    Abstract: This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning. LCFL aligns with federated learning requirements, accurately assessing client-to-client variations in data distribution. It offers advantages over existing clustered federated learning methods, addressing privacy concerns, improving applicability to non-convex models, and providing… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.06162  [pdf, other

    cs.CV cs.AI cs.LG

    RNNs, CNNs and Transformers in Human Action Recognition: A Survey and A Hybrid Model

    Authors: Khaled Alomar, Halil Ibrahim Aysel, Xiaohao Cai

    Abstract: Human Action Recognition (HAR) encompasses the task of monitoring human activities across various domains, including but not limited to medical, educational, entertainment, visual surveillance, video retrieval, and the identification of anomalous activities. Over the past decade, the field of HAR has witnessed substantial progress by leveraging Convolutional Neural Networks (CNNs) to effectively e… ▽ More

    Submitted 2 June, 2024; originally announced July 2024.

  3. arXiv:2407.06153  [pdf, other

    cs.SE cs.CL

    What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

    Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

    Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

  4. arXiv:2407.05765  [pdf, other

    cs.CV

    Enlarging Feature Support Overlap for Domain Generalization

    Authors: Yaoyao Zhu, Xiuding Cai, Dong Miao, Yu Yao, Zhongliang Fu

    Abstract: Deep models often struggle with out-of-distribution (OOD) generalization, limiting their real-world applicability beyond controlled laboratory settings. Invariant risk minimization (IRM) addresses this issue by learning invariant features and minimizing the risk across different domains. Thus, it avoids the pitfalls of pseudo-invariant features and spurious causality associated with empirical risk… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.04844  [pdf, other

    cs.CV cs.AI

    Neural varifolds: an aggregate representation for quantifying the geometry of point clouds

    Authors: Juheon Lee, Xiaohao Cai, Carola-Bibian Schönlieb, Simon Masnou

    Abstract: Point clouds are popular 3D representations for real-life objects (such as in LiDAR and Kinect) due to their detailed and compact representation of surface-based geometry. Recent approaches characterise the geometry of point clouds by bringing deep learning based techniques together with geometric fidelity metrics such as optimal transportation costs (e.g., Chamfer and Wasserstein metrics). In thi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: The first author, Juheon Lee, is an unaffiliated, independent researcher. This work is a personal endeavor, unrelated to his current job

  6. arXiv:2407.04061  [pdf, other

    cs.CV

    Detect Closer Surfaces that can be Seen: New Modeling and Evaluation in Cross-domain 3D Object Detection

    Authors: Ruixiao Zhang, Yihong Wu, Juheon Lee, Adam Prugel-Bennett, Xiaohao Cai

    Abstract: The performance of domain adaptation technologies has not yet reached an ideal level in the current 3D object detection field for autonomous driving, which is mainly due to significant differences in the size of vehicles, as well as the environments they operate in when applied across domains. These factors together hinder the effective transfer and application of knowledge learned from specific d… ▽ More

    Submitted 12 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by the 27th European Conference on Artificial Intelligence (ECAI 2024)

  7. arXiv:2407.03663  [pdf, other

    cs.CV

    Limited-View Photoacoustic Imaging Reconstruction Via High-quality Self-supervised Neural Representation

    Authors: Youshen xiao, Yuting Shen, Bowei Yao, Xiran Cai, Yuyao Zhang, Fei Gao

    Abstract: In practical applications within the human body, it is often challenging to fully encompass the target tissue or organ, necessitating the use of limited-view arrays, which can lead to the loss of crucial information. Addressing the reconstruction of photoacoustic sensor signals in limited-view detection spaces has become a focal point of current research. In this study, we introduce a self-supervi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  8. arXiv:2407.02867  [pdf, other

    cs.MM cs.CL

    Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion

    Authors: Yu Zhao, Ying Zhang, Baohang Zhou, Xinying Qian, Kehui Song, Xiangrui Cai

    Abstract: A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. More… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by SIGIR 2024

  9. arXiv:2407.00072  [pdf, other

    cs.IR cs.CL

    Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

    Authors: Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

    Abstract: In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing… ▽ More

    Submitted 11 July, 2024; v1 submitted 21 June, 2024; originally announced July 2024.

  10. arXiv:2406.18938  [pdf, other

    cs.IR

    Towards Personalized Federated Multi-scenario Multi-task Recommendation

    Authors: Yue Ding, Yanbiao Ji, Xun Cai, Xin Xin, Xiaofeng Gao, Hongtao Lu

    Abstract: In modern recommender system applications, such as e-commerce, predicting multiple targets like click-through rate (CTR) and post-view click-through \& conversion rate (CTCVR) is common. Multi-task recommender systems are gaining traction in research and practical use. Existing multi-task recommender systems tackle diverse business scenarios, merging and modeling these scenarios unlocks shared kno… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  11. arXiv:2406.18017  [pdf, other

    cs.IT cs.ET

    Dependence Analysis and Structured Construction for Batched Sparse Code

    Authors: Jiaxin Qing, Xiaohong Cai, Yijun Fan, Mingyang Zhu, Raymond W. Yeung

    Abstract: In coding theory, codes are usually designed with a certain level of randomness to facilitate analysis and accommodate different channel conditions. However, the resulting random code constructed can be suboptimal in practical implementations. Represented by a bipartite graph, the Batched Sparse Code (BATS Code) is a randomly constructed erasure code that utilizes network coding to achieve near-op… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  12. arXiv:2406.16872  [pdf, other

    eess.SP cs.AI

    Multi-channel Time Series Decomposition Network For Generalizable Sensor-Based Activity Recognition

    Authors: Jianguo Pan, Zhengxin Hu, Lingdun Zhang, Xia Cai

    Abstract: Sensor-based human activity recognition is important in daily scenarios such as smart healthcare and homes due to its non-intrusive privacy and low cost advantages, but the problem of out-of-domain generalization caused by differences in focusing individuals and operating environments can lead to significant accuracy degradation on cross-person behavior recognition due to the inconsistent distribu… ▽ More

    Submitted 28 March, 2024; originally announced June 2024.

  13. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.10664  [pdf, other

    cs.NI eess.SP

    A Novel Joint DRL-Based Utility Optimization for UAV Data Services

    Authors: Xuli Cai, Poonam Lohan, Burak Kantarci

    Abstract: In this paper, we propose a novel joint deep reinforcement learning (DRL)-based solution to optimize the utility of an uncrewed aerial vehicle (UAV)-assisted communication network. To maximize the number of users served within the constraints of the UAV's limited bandwidth and power resources, we employ deep Q-Networks (DQN) and deep deterministic policy gradient (DDPG) algorithms for optimal reso… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 6 pages, 9 figures

  15. arXiv:2406.04129  [pdf, other

    cs.CV

    LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification

    Authors: Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue

    Abstract: Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solution offers inherent privacy advantages, making it attractive for privacy-sensitive applications like face verification. Typical lensless face verification adopt… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  16. arXiv:2406.03853  [pdf, other

    cs.CL

    Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

    Authors: Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai

    Abstract: The recent advancements in large language models (LLMs) have been extraordinary, yet the escalating inference costs associated with them present challenges in real-world applications. To address these challenges, we propose a novel approach called Early-exiting Speculative Decoding (EESD) with lossless acceleration. Specifically, EESD utilizes a segment of the LLM to generate draft tokens, incorpo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 (Findings)

  17. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  18. arXiv:2406.00247  [pdf, other

    cs.IR cs.AI

    Large Language Models for Relevance Judgment in Product Search

    Authors: Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao

    Abstract: High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for… ▽ More

    Submitted 16 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 11 tables - SIGIR 2024, LLM4Eval

    ACM Class: H.3.3; I.2.7

  19. arXiv:2405.18842  [pdf, other

    cs.CV

    Descriptive Image Quality Assessment in the Wild

    Authors: Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue

    Abstract: With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-wor… ▽ More

    Submitted 12 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  20. arXiv:2405.16440  [pdf, other

    cs.LG cs.AI

    MambaTS: Improved Selective State Space Models for Long-term Time Series Forecasting

    Authors: Xiuding Cai, Yaoyao Zhu, Xueyao Wang, Yu Yao

    Abstract: In recent years, Transformers have become the de-facto architecture for long-term sequence forecasting (LTSF), but faces challenges such as quadratic complexity and permutation invariant bias. A recent model, Mamba, based on selective state space models (SSMs), has emerged as a competitive alternative to Transformer, offering comparable performance with higher throughput and linear complexity rela… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  21. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  22. arXiv:2405.14878  [pdf, other

    eess.IV cs.CV cs.LG stat.AP

    Improving and Evaluating Machine Learning Methods for Forensic Shoeprint Matching

    Authors: Divij Jain, Saatvik Kher, Lena Liang, Yufeng Wu, Ashley Zheng, Xizhen Cai, Anna Plantinga, Elizabeth Upton

    Abstract: We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random… ▽ More

    Submitted 2 April, 2024; originally announced May 2024.

  23. arXiv:2405.12821  [pdf, other

    cs.RO cs.CV

    Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

    Authors: Runwei Guan, Ruixiao Zhang, Ningwei Ouyang, Jianan Liu, Ka Lok Man, Xiaohao Cai, Ming Xu, Jeremy Smith, Eng Gee Lim, Yutao Yue, Hui Xiong

    Abstract: Embodied perception is essential for intelligent vehicles and robots, enabling more natural interaction and task execution. However, these advancements currently embrace vision level, rarely focusing on using 3D modeling sensors, which limits the full understanding of surrounding objects with multi-granular characteristics. Recently, as a promising automotive sensor with affordable cost, 4D Millim… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  24. arXiv:2405.12806  [pdf, other

    cs.CV

    MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

    Authors: Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu

    Abstract: Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcom… ▽ More

    Submitted 21 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:1710.03746 by other authors

  25. arXiv:2405.11742  [pdf, other

    cs.MM

    Universal Organizer of SAM for Unsupervised Semantic Segmentation

    Authors: Tingting Li, Gensheng Pei, Xinhao Cai, Huafeng Liu, Qiong Wang, Yazhou Yao

    Abstract: Unsupervised semantic segmentation (USS) aims to achieve high-quality segmentation without manual pixel-level annotations. Existing USS models provide coarse category classification for regions, but the results often have blurry and imprecise edges. Recently, a robust framework called the segment anything model (SAM) has been proven to deliver precise boundary object masks. Therefore, this paper p… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: accepted by IEEE International Conference on Multimedia & Expo

  26. arXiv:2405.10691  [pdf, other

    eess.IV cs.CV

    LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

    Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

    Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  27. arXiv:2405.04974  [pdf, other

    cs.CV cs.AI

    Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI

    Authors: Keqiang Fan, Xiaohao Cai, Mahesan Niranjan

    Abstract: Diffusion probabilistic models (DPMs) have exhibited significant effectiveness in computer vision tasks, particularly in image generation. However, their notable performance heavily relies on labelled datasets, which limits their application in medical images due to the associated high-cost annotations. Current DPM-related methods for lesion detection in medical imaging, which can be categorized i… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  28. arXiv:2405.01758  [pdf, other

    cs.RO cs.LG eess.SY

    CGD: Constraint-Guided Diffusion Policies for UAV Trajectory Planning

    Authors: Kota Kondo, Andrea Tagliabue, Xiaoyi Cai, Claudius Tewari, Olivia Garcia, Marcos Espitia-Alvarez, Jonathan P. How

    Abstract: Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly genera… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 8 pages, 3 figures

  29. arXiv:2404.14671  [pdf, other

    cs.CV

    LaneCorrect: Self-supervised Lane Detection

    Authors: Ming Nie, Xinyue Cai, Hang Xu, Li Zhang

    Abstract: Lane detection has evolved highly functional autonomous driving system to understand driving scenes even under complex environments. In this paper, we work towards developing a generalized computer vision system able to detect lanes without using any annotation. We make the following contributions: (i) We illustrate how to perform unsupervised 3D lane segmentation by leveraging the distinctive int… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  30. arXiv:2404.14043  [pdf, other

    cs.CL

    LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation

    Authors: Keheng Wang, Feiyu Duan, Peiguang Li, Sirui Wang, Xunliang Cai

    Abstract: Retrieval-Augmented Generation (RAG) demonstrates great value in alleviating outdated knowledge or hallucination by supplying LLMs with updated and relevant knowledge. However, there are still several difficulties for RAG in understanding complex multi-hop query and retrieving relevant documents, which require LLMs to perform reasoning and retrieve step by step. Inspired by human's reasoning proce… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  31. arXiv:2404.12022  [pdf, other

    cs.CL

    Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

    Authors: Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

    Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  32. arXiv:2404.07465  [pdf, other

    cs.LG

    Leveraging Domain-Unlabeled Data in Offline Reinforcement Learning across Two Domains

    Authors: Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, Masashi Sugiyama

    Abstract: In this paper, we investigate an offline reinforcement learning (RL) problem where datasets are collected from two domains. In this scenario, having datasets with domain labels facilitates efficient policy training. However, in practice, the task of assigning domain labels can be resource-intensive or infeasible at a large scale, leading to a prevalence of domain-unlabeled data. To formalize this… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  33. arXiv:2404.06809  [pdf, other

    cs.CL

    Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

    Authors: Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun

    Abstract: The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and corre… ▽ More

    Submitted 8 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Our code, benchmark, and models are available at https://github.com/panruotong/CAG

  34. arXiv:2404.06741  [pdf, other

    cs.CV

    An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video

    Authors: Xingyu Song, Zhan Li, Shi Chen, Xin-Qiang Cai, Kazuyuki Demachi

    Abstract: Action recognition, an essential component of computer vision, plays a pivotal role in multiple applications. Despite significant improvements brought by Convolutional Neural Networks (CNNs), these models suffer performance declines when trained with discontinuous video frames, which is a frequent scenario in real-world settings. This decline primarily results from the loss of temporal continuity,… ▽ More

    Submitted 30 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.13414

  35. arXiv:2404.02656  [pdf, other

    cs.CV cs.AI

    Non-negative Subspace Feature Representation for Few-shot Learning in Medical Imaging

    Authors: Keqiang Fan, Xiaohao Cai, Mahesan Niranjan

    Abstract: Unlike typical visual scene recognition domains, in which massive datasets are accessible to deep neural networks, medical image interpretations are often obstructed by the paucity of data. In this paper, we investigate the effectiveness of data-based few-shot learning in medical imaging by exploring different data attribute representations in a low-dimensional space. We introduce different types… ▽ More

    Submitted 4 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  36. LoS Sensing-based Channel Estimation in UAV-Assisted OFDM Systems

    Authors: Chaojin Qing, Zhiying Liu, Wenquan Hu, Yinjie Zhang, Xi Cai, Pengfei Du

    Abstract: In unmanned aerial vehicle (UAV)-assisted orthogonal frequency division multiplexing (OFDM) systems, the potential advantage of the line-of-sight (LoS) path, characterized by its high probability of existence, has not been fully harnessed, thereby impeding the improvement of channel estimation (CE) accuracy. Inspired by the ideas of integrated sensing and communication (ISAC), this letter develops… ▽ More

    Submitted 22 February, 2024; originally announced April 2024.

  37. arXiv:2403.18840  [pdf, other

    hep-th cond-mat.str-el cs.LG hep-ph physics.comp-ph

    Feynman Diagrams as Computational Graphs

    Authors: Pengcheng Hou, Tao Wang, Daniel Cerkoney, Xiansheng Cai, Zhiyi Li, Youjin Deng, Lei Wang, Kun Chen

    Abstract: We propose a computational graph representation of high-order Feynman diagrams in Quantum Field Theory (QFT), applicable to any combination of spatial, temporal, momentum, and frequency domains. Utilizing the Dyson-Schwinger and parquet equations, our approach effectively organizes these diagrams into a fractal structure of tensor operations, significantly reducing computational redundancy. This a… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

  38. arXiv:2403.16656  [pdf, other

    cs.LG cs.IR

    Graph Augmentation for Recommendation

    Authors: Qianru Zhang, Lianghao Xia, Xuheng Cai, Siuming Yiu, Chao Huang, Christian S. Jensen

    Abstract: Graph augmentation with contrastive learning has gained significant attention in the field of recommendation systems due to its ability to learn expressive user representations, even when labeled data is limited. However, directly applying existing GCL models to real-world recommendation environments poses challenges. There are two primary issues to address. Firstly, the lack of consideration for… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 13 pages and accepted by ICDE 2024

    Journal ref: ICDE 2024

  39. arXiv:2403.13244  [pdf

    cs.CL cs.AI

    Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

    Authors: Peng Zhou, Jianmin Wang, Chunyan Li, Zixu Wang, Yiping Liu, Siqi Sun, Jianxin Lin, Leyi Wei, Xibao Cai, Houtim Lai, Wei Liu, Longyue Wang, Xiangxiang Zeng

    Abstract: While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the 'tea… ▽ More

    Submitted 10 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 37 pages, 10 figures

  40. arXiv:2403.12100  [pdf, other

    cs.IR cs.AI cs.LG

    Learning Time Slot Preferences via Mobility Tree for Next POI Recommendation

    Authors: Tianhao Huang, Xuan Pan, Xiangrui Cai, Ying Zhang, Xiaojie Yuan

    Abstract: Next Point-of-Interests (POIs) recommendation task aims to provide a dynamic ranking of POIs based on users' current check-in trajectories. The recommendation performance of this task is contingent upon a comprehensive understanding of users' personalized behavioral patterns through Location-based Social Networks (LBSNs) data. While prior studies have adeptly captured sequential patterns and trans… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  41. arXiv:2403.10301  [pdf, other

    cs.CL cs.CV

    Uni-SMART: Universal Science Multimodal Analysis and Research Transformer

    Authors: Hengxing Cai, Xiaochen Cai, Shuwen Yang, Jiankun Wang, Lin Yao, Zhifeng Gao, Junhan Chang, Sihang Li, Mingjun Xu, Changxin Wang, Hongshuai Wang, Yongge Li, Mujie Lin, Yaqi Li, Yuqi Yin, Linfeng Zhang, Guolin Ke

    Abstract: In scientific research and its application, scientific literature analysis is crucial as it allows researchers to build on the work of others. However, the fast growth of scientific knowledge has led to a massive increase in scholarly articles, making in-depth literature analysis increasingly challenging and time-consuming. The emergence of Large Language Models (LLMs) has offered a new way to add… ▽ More

    Submitted 15 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  42. arXiv:2403.09209  [pdf, other

    cs.CR cs.AI cs.LG

    LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection

    Authors: Xiangrui Cai, Yang Wang, Sihan Xu, Hao Li, Ying Zhang, Zheli Liu, Xiaojie Yuan

    Abstract: Enterprises and organizations are faced with potential threats from insider employees that may lead to serious consequences. Previous studies on insider threat detection (ITD) mainly focus on detecting abnormal users or abnormal time periods (e.g., a week or a day). However, a user may have hundreds of thousands of activities in the log, and even within a day there may exist thousands of activitie… ▽ More

    Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 13 pages

  43. arXiv:2403.08479  [pdf, other

    eess.IV cs.CV physics.med-ph

    MD-Dose: A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction

    Authors: Linjie Fu, Xia Li, Xiuding Cai, Yingkai Wang, Xueyao Wang, Yali Shen, Yu Yao

    Abstract: Radiation therapy is crucial in cancer treatment. Experienced experts typically iteratively generate high-quality dose distribution maps, forming the basis for excellent radiation therapy plans. Therefore, automated prediction of dose distribution maps is significant in expediting the treatment process and providing a better starting point for developing radiation therapy plans. With the remarkabl… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  44. arXiv:2403.06873  [pdf, other

    math.OC cs.LG

    Last Iterate Convergence of Incremental Methods and Applications in Continual Learning

    Authors: Xufeng Cai, Jelena Diakonikolas

    Abstract: Incremental gradient and incremental proximal methods are a fundamental class of optimization algorithms used for solving finite sum problems, broadly studied in the literature. Yet, without strong convexity, their convergence guarantees have primarily been established for the ergodic (average) iterate. Motivated by applications in continual learning, we obtain the first convergence guarantees for… ▽ More

    Submitted 27 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  45. arXiv:2403.06563  [pdf, other

    cs.LG cs.CL

    Unraveling the Mystery of Scaling Laws: Part I

    Authors: Hui Su, Zhi Tian, Xiaoyu Shen, Xunliang Cai

    Abstract: Scaling law principles indicate a power-law correlation between loss and variables such as model size, dataset size, and computational resources utilized during training. These principles play a vital role in optimizing various aspects of model pre-training, ultimately contributing to the success of large language models such as GPT-4, Llama and Gemini. However, the original scaling law paper by O… ▽ More

    Submitted 5 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  46. arXiv:2403.06408  [pdf, other

    cs.LG cs.AI

    What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

    Authors: Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  47. arXiv:2403.06258  [pdf, other

    cs.CV

    Poly Kernel Inception Network for Remote Sensing Detection

    Authors: Xinhao Cai, Qiuxia Lai, Yuwei Wang, Wenguan Wang, Zeren Sun, Yazhou Yao

    Abstract: Object detection in remote sensing images (RSIs) often suffers from several increasing challenges, including the large variation in object scales and the diverse-ranging context. Prior methods tried to address these challenges by expanding the spatial receptive field of the backbone, either through large-kernel convolution or dilated convolution. However, the former typically introduces considerab… ▽ More

    Submitted 20 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: accepted by IEEE Conference on Computer Vision and Pattern Recognition, 2024

  48. arXiv:2403.06138  [pdf, other

    cs.CV

    BSDA: Bayesian Random Semantic Data Augmentation for Medical Image Classification

    Authors: Yaoyao Zhu, Xiuding Cai, Xueyao Wang, Xiaoqing Chen, Yu Yao, Zhongliang Fu

    Abstract: Data augmentation is a crucial regularization technique for deep neural networks, particularly in medical image classification. Mainstream data augmentation (DA) methods are usually applied at the image level. Due to the specificity and diversity of medical imaging, expertise is often required to design effective DA strategies, and improper augmentation operations can degrade model performance. Al… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  49. arXiv:2403.03689  [pdf, other

    cs.CL cs.AI

    General2Specialized LLMs Translation for E-commerce

    Authors: Kaidi Chen, Ben Chen, Dehong Gao, Huangyu Dai, Wen Jiang, Wei Ning, Shanqing Yu, Libin Yang, Xiaoyan Cai

    Abstract: Existing Neural Machine Translation (NMT) models mainly handle translation in the general domain, while overlooking domains with special writing formulas, such as e-commerce and legal documents. Taking e-commerce as an example, the texts usually include amounts of domain-related words and have more grammar problems, which leads to inferior performances of current NMT methods. To address these prob… ▽ More

    Submitted 6 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: 4 pages, 1 figure, WWW2024 accepted

  50. arXiv:2403.01976  [pdf, other

    cs.CL

    SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis

    Authors: Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Hongshuai Wang, Yongge Li, Mujie Lin, Shuwen Yang, Jiankun Wang, Mingjun Xu, Jin Huang, Fang Xi, Jiaxi Zhuang, Yuqi Yin, Yaqi Li, Changhong Chen, Zheng Cheng, Zifeng Zhao, Linfeng Zhang, Guolin Ke

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have revolutionized natural language understanding and generation, sparking significant interest in applying them to scientific literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency of LLMs in this domain, particularly in scenarios requiring higher-level abilities beyond mere memorization and the handling… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.