Skip to main content

Showing 1–50 of 156 results for author: Lei, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17249  [pdf, other

    cs.RO

    SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

    Authors: Xu Liu, Jiuzhou Lei, Ankit Prabhu, Yuezhan Tao, Igor Spasojevic, Pratik Chaudhari, Nikolay Atanasov, Vijay Kumar

    Abstract: This paper develops a real-time decentralized metric-semantic Simultaneous Localization and Mapping (SLAM) approach that leverages a sparse and lightweight object-based representation to enable a heterogeneous robot team to autonomously explore 3D environments featuring indoor, urban, and forested areas without relying on GPS. We use a hierarchical metric-semantic representation of the environment… ▽ More

    Submitted 2 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Preliminary release

  2. arXiv:2406.17090  [pdf, other

    q-bio.QM cs.AI cs.CE cs.LG

    Exploring Biomarker Relationships in Both Type 1 and Type 2 Diabetes Mellitus Through a Bayesian Network Analysis Approach

    Authors: Yuyang Sun, Jingyu Lei, Panagiotis Kosmas

    Abstract: Understanding the complex relationships of biomarkers in diabetes is pivotal for advancing treatment strategies, a pressing need in diabetes research. This study applies Bayesian network structure learning to analyze the Shanghai Type 1 and Type 2 diabetes mellitus datasets, revealing complex relationships among key diabetes-related biomarkers. The constructed Bayesian network presented notable pr… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Paper is accepted by EMBC 2024

  3. arXiv:2406.03712  [pdf, other

    cs.CL cs.LG

    A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

    Authors: Lei Liu, Xiaoyan Yang, Junchi Lei, Xiaoyang Liu, Yue Shen, Zhiqiang Zhang, Peng Wei, Jinjie Gu, Zhixuan Chu, Zhan Qin, Kui Ren

    Abstract: Large language models (LLMs), such as GPT series models, have received substantial attention due to their impressive capabilities for generating and understanding human-level language. More recently, LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services. This survey provides a compreh… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  4. arXiv:2406.01489  [pdf, other

    cs.CV

    DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention

    Authors: Yang Liu, Xiaofei Li, Jun Zhang, Shengze Hu, Jun Lei

    Abstract: The increasing difficulty in accurately detecting forged images generated by AIGC(Artificial Intelligence Generative Content) poses many risks, necessitating the development of effective methods to identify and further locate forged areas. In this paper, to facilitate research efforts, we construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model. Our goal i… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2405.17421  [pdf, other

    cs.CV cs.GR

    MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

    Authors: Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, Kostas Daniilidis

    Abstract: We introduce 4D Motion Scaffolds (MoSca), a neural information processing system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild. To address such a challenging and ill-posed inverse problem, we leverage prior knowledge from foundational vision models, lift the video data to a novel Motion Scaffold (MoSca) representation, whic… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: project page: https://www.cis.upenn.edu/~leijh/projects/mosca

  6. arXiv:2405.11441  [pdf, other

    cs.IR cs.CL

    EmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations

    Authors: Chiyu Zhang, Yifei Sun, Minghao Wu, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Rong Jin, Angli Liu, Ji Zhu, Sem Park, Ning Yao, Bo Long

    Abstract: Content-based recommendation systems play a crucial role in delivering personalized content to users in the digital world. In this work, we introduce EmbSum, a novel framework that enables offline pre-computations of users and candidate items while capturing the interactions within the user engagement history. By utilizing the pretrained encoder-decoder model and poly-attention layers, EmbSum deri… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Under review

  7. arXiv:2405.04095  [pdf, other

    cs.CR cs.AI

    Going Proactive and Explanatory Against Malware Concept Drift

    Authors: Yiling He, Junchi Lei, Zhan Qin, Kui Ren

    Abstract: Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive under… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2404.16863  [pdf

    physics.soc-ph cs.CY

    Efficient Strategies on Supply Chain Network Optimization for Industrial Carbon Emission Reduction

    Authors: Jihu Lei

    Abstract: This study investigates the efficient strategies for supply chain network optimization, specifically aimed at reducing industrial carbon emissions. Amidst escalating concerns about global climate change, industry sectors are motivated to counteract the negative environmental implications of their supply chain networks. This paper introduces a novel framework for optimizing these networks via strat… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Journal ref: Journal of Computational Methods in Engineering Applications (2022): 1-11

  9. arXiv:2404.16754  [pdf, other

    cs.CV

    RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

    Authors: Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In thi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  10. arXiv:2404.16006  [pdf, other

    cs.CV

    MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 77 pages, 41 figures

  11. arXiv:2404.15785  [pdf, other

    cs.CV

    Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer

    Authors: Jiaming Lei, Lin Li, Chunping Wang, Jun Xiao, Long Chen

    Abstract: Benefiting from strong generalization ability, pre-trained vision language models (VLMs), e.g., CLIP, have been widely utilized in zero-shot scene understanding. Unlike simple recognition tasks, grounded situation recognition (GSR) requires the model not only to classify salient activity (verb) in the image, but also to detect all semantic roles that participate in the action. This complex task us… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  12. arXiv:2404.15043  [pdf, other

    cs.DC

    Mapping Parallel Matrix Multiplication in GotoBLAS2 to the AMD Versal ACAP for Deep Learning

    Authors: Jie Lei, Enrique S. Quintana-Ortí

    Abstract: This paper investigates the design of parallel general matrix multiplication (GEMM) for a Versal Adaptive Compute Accelerated Platform (ACAP) equipped with a VC1902 system-on-chip and multiple Artificial Intelligence Engines (AIEs). Our efforts aim to port standard optimization techniques applied in the high-performance realization of GEMM on CPUs to the Versal ACAP. In particular, 1) we address t… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 12 pages

  13. arXiv:2404.13669  [pdf, other

    math.OC cs.DC cs.LG cs.MA

    Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

    Authors: Yaqun Yang, Jinlong Lei

    Abstract: We consider an $n$ agents distributed optimization problem with imperfect information characterized in a parametric sense, where the unknown parameter can be solved by a distinct distributed parameter learning problem. Though each agent only has access to its local parameter learning and computational problem, they mean to collaboratively minimize the average of their local cost functions. To addr… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 27 pages, 6 figures

  14. arXiv:2404.13300  [pdf, other

    cs.LG

    Capturing Momentum: Tennis Match Analysis Using Machine Learning and Time Series Theory

    Authors: Jingdi Lei, Tianqi Kang, Yuluan Cao, Shiwei Ren

    Abstract: This paper represents an analysis on the momentum of tennis match. And due to Generalization performance of it, it can be helpful in constructing a system to predict the result of sports game and analyze the performance of player based on the Technical statistics. We First use hidden markov models to predict the momentum which is defined as the performance of players. Then we use Xgboost to prove… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 16 pages, 18 figures

  15. arXiv:2404.11354  [pdf, other

    math.OC cs.DC cs.LG cs.MA

    Distributed Fractional Bayesian Learning for Adaptive Optimization

    Authors: Yaqun Yang, Jinlong Lei, Guanghui Wen, Yiguang Hong

    Abstract: This paper considers a distributed adaptive optimization problem, where all agents only have access to their local cost functions with a common unknown parameter, whereas they mean to collaboratively estimate the true parameter and find the optimal solution over a connected network. A general mathematical framework for such a problem has not been studied yet. We aim to provide valuable insights fo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 16 pages, 6 figures

  16. arXiv:2404.07687  [pdf, other

    cs.CV

    Chaos in Motion: Unveiling Robustness in Remote Heart Rate Measurement through Brain-Inspired Skin Tracking

    Authors: Jie Wang, Jing Lian, Minjie Ma, Junqiang Lei, Chunbiao Li, Bin Li, Jizhao Liu

    Abstract: Heart rate is an important physiological indicator of human health status. Existing remote heart rate measurement methods typically involve facial detection followed by signal extraction from the region of interest (ROI). These SOTA methods have three serious problems: (a) inaccuracies even failures in detection caused by environmental influences or subject movement; (b) failures for special patie… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 8 pages, 10 figures

  17. arXiv:2404.06153  [pdf, other

    cs.LG q-bio.GN

    scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling

    Authors: Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei

    Abstract: Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual d… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages, 4 figures,

  18. arXiv:2403.17931  [pdf, other

    cs.CV

    Track Everything Everywhere Fast and Robustly

    Authors: Yunzhou Song, Jiahui Lei, Ziyun Wang, Lingjie Liu, Kostas Daniilidis

    Abstract: We propose a novel test-time optimization approach for efficiently and robustly tracking any pixel at any time in a video. The latest state-of-the-art optimization-based tracking technique, OmniMotion, requires a prohibitively long optimization time, rendering it impractical for downstream applications. OmniMotion is sensitive to the choice of random seeds, leading to unstable convergence. To impr… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: project page: https://timsong412.github.io/FastOmniTrack/

  19. arXiv:2403.09323  [pdf, other

    cs.CV

    E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

    Authors: Jiaqing Zhang, Mingxiang Cao, Xue Yang, Weiying Xie, Jie Lei, Daixun Li, Wenbo Huang, Yunsong Li

    Abstract: Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high perfor… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  20. arXiv:2402.19020  [pdf, other

    eess.IV cs.CV

    Unsupervised Learning of High-resolution Light Field Imaging via Beam Splitter-based Hybrid Lenses

    Authors: Jianxin Lei, Chengcai Xu, Langqing Shi, Junhui Hou, Ping Zhou

    Abstract: In this paper, we design a beam splitter-based hybrid light field imaging prototype to record 4D light field image and high-resolution 2D image simultaneously, and make a hybrid light field dataset. The 2D image could be considered as the high-resolution ground truth corresponding to the low-resolution central sub-aperture image of 4D light field image. Subsequently, we propose an unsupervised lea… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  21. arXiv:2402.12851  [pdf, other

    cs.CL

    MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

    Authors: Tongxu Luo, Jiahe Lei, Fangyu Lei, Weihao Liu, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (P… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  22. arXiv:2402.10555  [pdf, other

    cs.IR cs.CL

    SPAR: Personalized Content-Based Recommendation via Long Engagement Attention

    Authors: Chiyu Zhang, Yifei Sun, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long

    Abstract: Leveraging users' long engagement histories is essential for personalized content recommendations. The success of pretrained language models (PLMs) in NLP has led to their use in encoding user histories and candidate items, framing content recommendations as textual semantic matching tasks. However, existing works still struggle with processing very long user historical text and insufficient user-… ▽ More

    Submitted 21 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Under review

  23. arXiv:2402.02405  [pdf, other

    cs.RO cs.CV

    Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied Scenarios

    Authors: Yuxin Wang, Zunlei Feng, Haofei Zhang, Yang Gao, Jie Lei, Li Sun, Mingli Song

    Abstract: Due to the inability to receive signals from the Global Navigation Satellite System (GNSS) in extreme conditions, achieving accurate and robust navigation for Unmanned Aerial Vehicles (UAVs) is a challenging task. Recently emerged, vision-based navigation has been a promising and feasible alternative to GNSS-based navigation. However, existing vision-based techniques are inadequate in addressing f… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 4 figures

  24. arXiv:2401.06969  [pdf, other

    cs.CV

    Domain Adaptation for Large-Vocabulary Object Detectors

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Jie Lei, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CL… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  25. arXiv:2401.05093  [pdf, other

    cs.CV

    SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

    Authors: Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, Yunsong Li

    Abstract: With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL o… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  26. arXiv:2401.03182  [pdf, other

    cs.CV

    Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

    Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Kai Jiang, Mingxiang Cao, Yunsong Li

    Abstract: Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the nov… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  27. arXiv:2401.03179  [pdf, other

    cs.CV

    Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification

    Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Geng Yang, Daixun Li, Yunsong Li

    Abstract: In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundanc… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

  28. arXiv:2312.16943  [pdf, other

    cs.CV

    Multi-scale direction-aware SAR object detection network via global information fusion

    Authors: Mingxiang Cao, Weiying Xie, Jie Lei, Jiaqing Zhang, Daixun Li, Yunsong Li

    Abstract: Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery. Existing methods, while achieving promising results, often struggle to effectively integrate local and global information, particularly direction-aware features. This paper proposes SAR-Net, a novel framework specifically designed for global fusion of direction-aware information in SAR o… ▽ More

    Submitted 22 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  29. arXiv:2312.06683  [pdf, other

    cs.IR

    AT4CTR: Auxiliary Match Tasks for Enhancing Click-Through Rate Prediction

    Authors: Qi Liu, Xuyang Hou, Defu Lian, Zhe Wang, Haoran Jin, Jia Cheng, Jun Lei

    Abstract: Click-through rate (CTR) prediction is a vital task in industrial recommendation systems. Most existing methods focus on the network architecture design of the CTR model for better accuracy and suffer from the data sparsity problem. Especially in industrial recommendation systems, the widely applied negative sample down-sampling technique due to resource limitation worsens the problem, resulting i… ▽ More

    Submitted 18 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  30. arXiv:2312.00851  [pdf, other

    cs.LG cs.CV

    Physics Inspired Criterion for Pruning-Quantization Joint Learning

    Authors: Weiying Xie, Xiaoyi Fan, Xin Zhang, Yunsong Li, Jie Lei, Leyuan Fang

    Abstract: Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices. However, most existing methods do not jointly learn a global criterion for pruning and quantization in an interpretable way. In this paper, we propose a novel physics inspired criterion for pruning-quantization joint learning (PIC-PQ), which is explored from an… ▽ More

    Submitted 4 June, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  31. arXiv:2312.00112  [pdf, other

    cs.CV cs.GR

    DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

    Authors: Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis

    Abstract: Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit o… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: Project page: https://agelosk.github.io/dynmf/

  32. arXiv:2311.16099  [pdf, other

    cs.CV cs.GR

    GART: Gaussian Articulated Template Models

    Authors: Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis

    Abstract: We introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnabl… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 13 pages, code available at https://www.cis.upenn.edu/~leijh/projects/gart/

  33. arXiv:2311.10764  [pdf, other

    cs.IR cs.AI

    Deep Group Interest Modeling of Full Lifelong User Behaviors for CTR Prediction

    Authors: Qi Liu, Xuyang Hou, Haoran Jin, jin Chen, Zhe Wang, Defu Lian, Tan Qu, Jia Cheng, Jun Lei

    Abstract: Extracting users' interests from their lifelong behavior sequence is crucial for predicting Click-Through Rate (CTR). Most current methods employ a two-stage process for efficiency: they first select historical behaviors related to the candidate item and then deduce the user's interest from this narrowed-down behavior sub-sequence. This two-stage paradigm, though effective, leads to information lo… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  34. arXiv:2310.15075  [pdf, other

    cs.CL

    TableQAKit: A Comprehensive and Practical Toolkit for Table-based Question Answering

    Authors: Fangyu Lei, Tongxu Luo, Pengqi Yang, Weihao Liu, Hanwen Liu, Jiahe Lei, Yiming Huang, Yifan Wei, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Table-based question answering (TableQA) is an important task in natural language processing, which requires comprehending tables and employing various reasoning ways to answer the questions. This paper introduces TableQAKit, the first comprehensive toolkit designed specifically for TableQA. The toolkit designs a unified platform that includes plentiful TableQA datasets and integrates popular meth… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Work in progress

  35. arXiv:2310.09909  [pdf, other

    cs.CV cs.CL

    Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

    Authors: Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public. In this study, we aim to assess the performance of OpenAI's newest model, GPT-4V(ision), specifically in the realm of multimodal medical diagnosis. Our evaluation encompasses 17 human body systems, including Central Nerv… ▽ More

    Submitted 4 December, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  36. arXiv:2310.09511  [pdf, other

    cs.GT cs.LG eess.SY

    Online Parameter Identification of Generalized Non-cooperative Game

    Authors: Jianguo Chen, Jinlong Lei, Hongsheng Qi, Yiguang Hong

    Abstract: This work studies the parameter identification problem of a generalized non-cooperative game, where each player's cost function is influenced by an observable signal and some unknown parameters. We consider the scenario where equilibrium of the game at some observable signals can be observed with noises, whereas our goal is to identify the unknown parameters with the observed data. Assuming that t… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 figures

  37. arXiv:2310.04992  [pdf, other

    eess.IV cs.CV

    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  38. arXiv:2309.14122  [pdf, other

    cs.CV cs.CR

    SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

    Authors: Zhongjie Ba, Jieming Zhong, Jiachen Lei, Peng Cheng, Qinglong Wang, Zhan Qin, Zhibo Wang, Kui Ren

    Abstract: Advanced text-to-image models such as DALL-E 2 and Midjourney possess the capacity to generate highly realistic images, raising significant concerns regarding the potential proliferation of unsafe content. This includes adult, violent, or deceptive imagery of political figures. Despite claims of rigorous safety mechanisms implemented in these models to restrict the generation of not-safe-for-work… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 14 pages, 11 figures

  39. arXiv:2309.13248  [pdf, other

    cs.CV

    Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

    Authors: Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

    Abstract: Video amodal segmentation is a particularly challenging task in computer vision, which requires to deduce the full shape of an object from the visible parts of it. Recently, some studies have achieved promising performance by using motion flow to integrate information across frames under a self-supervised setting. However, motion flow has a clear limitation by the two factors of moving cameras and… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  40. arXiv:2309.12669  [pdf, other

    cs.CL

    HRoT: Hybrid prompt strategy and Retrieval of Thought for Table-Text Hybrid Question Answering

    Authors: Tongxu Luo, Fangyu Lei, Jiahe Lei, Weihao Liu, Shihu He, Jun Zhao, Kang Liu

    Abstract: Answering numerical questions over hybrid contents from the given tables and text(TextTableQA) is a challenging task. Recently, Large Language Models (LLMs) have gained significant attention in the NLP community. With the emergence of large language models, In-Context Learning and Chain-of-Thought prompting have become two particularly popular research topics in this field. In this paper, we intro… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  41. arXiv:2309.06828  [pdf, other

    cs.CV cs.LG

    UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

    Authors: Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, Yanfeng Wang

    Abstract: Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed. However, the early explorations usually focus on the limited types of brain diseases in one study and train the model on the data in a small scale, yielding the bottleneck of generalization. Towards a more effective and sca… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  42. arXiv:2309.04790  [pdf, other

    cs.CL

    MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

    Authors: Weihao Liu, Fangyu Lei, Tongxu Luo, Jiahe Lei, Shizhu He, Jun Zhao, Kang Liu

    Abstract: In the real world, knowledge often exists in a multimodal and heterogeneous form. Addressing the task of question answering with hybrid data types, including text, tables, and images, is a challenging task (MMHQA). Recently, with the rise of large language models (LLM), in-context learning (ICL) has become the most popular way to solve QA problems. We propose MMHQA-ICL framework for addressing thi… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  43. STGIN: Spatial-Temporal Graph Interaction Network for Large-scale POI Recommendation

    Authors: Shaohua Liu, Yu Qi, Gen Li, Mingjian Chen, Teng Zhang, Jia Cheng, Jun Lei

    Abstract: In Location-Based Services, Point-Of-Interest(POI) recommendation plays a crucial role in both user experience and business opportunities. Graph neural networks have been proven effective in providing personalized POI recommendation services. However, there are still two critical challenges. First, existing graph models attempt to capture users' diversified interests through a unified graph, which… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: accepted by CIKM 2023

  44. arXiv:2309.01940  [pdf, other

    cs.CL cs.AI

    CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

    Authors: Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du, Weiming Zhang, Longteng Fan, Jiayi Lei, Renting Rui, Jianghao Lin, Yuchen Fang, Yifan Liu, Jingkuan Wang, Siyuan Qi, Kangning Zhang, Weinan Zhang, Yong Yu

    Abstract: With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is crucial as it reflects the multifaceted abilities of LLMs, and it has numerous downstream applications. In this paper, we propose CodeApex, a bilingual benchmark data… ▽ More

    Submitted 11 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: 33pages

  45. arXiv:2308.06037  [pdf, other

    cs.IR cs.AI

    Deep Context Interest Network for Click-Through Rate Prediction

    Authors: Xuyang Hou, Zhe Wang, Qi Liu, Tan Qu, Jia Cheng, Jun Lei

    Abstract: Click-Through Rate (CTR) prediction, estimating the probability of a user clicking on an item, is essential in industrial applications, such as online advertising. Many works focus on user behavior modeling to improve CTR prediction performance. However, most of those methods only model users' positive interests from users' click items while ignoring the context information, which is the display i… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: accepted by CIKM 2023

  46. arXiv:2307.14345  [pdf, other

    cs.IT cs.ET eess.SP

    NOMA for STAR-RIS Assisted UAV Networks

    Authors: Jiayi Lei, Tiankui Zhang, Xidong Mu, Yuanwei Liu

    Abstract: This paper proposes a novel simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted unmanned aerial vehicle (UAV) non-orthogonal multiple access (NOMA) emergency communication network. Multiple STAR-RISs are deployed to provide additional and intelligent transmission links between trapped users and UAV-mounted base station (BS). Each user selects the neare… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  47. arXiv:2307.06848  [pdf, other

    cs.NI cs.LG math.NA

    Tensor Completion via Leverage Sampling and Tensor QR Decomposition for Network Latency Estimation

    Authors: Jun Lei, Ji-Qian Zhao, Jing-Qi Wang, An-Bao Xu

    Abstract: In this paper, we consider the network latency estimation, which has been an important metric for network performance. However, a large scale of network latency estimation requires a lot of computing time. Therefore, we propose a new method that is much faster and maintains high accuracy. The data structure of network nodes can form a matrix, and the tensor model can be formed by introducing the t… ▽ More

    Submitted 27 June, 2023; originally announced July 2023.

    Comments: 20 pages, 7 figures

  48. arXiv:2307.02257  [pdf, other

    cs.IT cs.ET

    Hybrid NOMA for STAR-RIS Enhanced Communication

    Authors: Jiayi Lei, Tiankui Zhang, Yuanwei Liu

    Abstract: In this paper, a hybrid non-orthogonal multiple access (NOMA) framework for the simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) enhanced cell-edge communication is investigated. Specifically, one transmitted user and one reflected user are paired as one NOMA-pair, while multiple NOMA-pairs are served via time division multiple access (TDMA). The objective is… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  49. arXiv:2306.15988  [pdf, other

    cs.CV

    AFPN: Asymptotic Feature Pyramid Network for Object Detection

    Authors: Guoyu Yang, Jie Lei, Zhikuan Zhu, Siyu Cheng, Zunlei Feng, Ronghua Liang

    Abstract: Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an a… ▽ More

    Submitted 24 September, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

  50. arXiv:2306.11363  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Diffusion Models Are Fast Distribution Learners

    Authors: Jiachen Lei, Qinglong Wang, Peng Cheng, Zhongjie Ba, Zhan Qin, Zhibo Wang, Zhenguang Liu, Kui Ren

    Abstract: Diffusion model has emerged as the \emph{de-facto} model for image generation, yet the heavy training overhead hinders its broader adoption in the research community. We observe that diffusion models are commonly trained to learn all fine-grained visual information from scratch. This paradigm may cause unnecessary training costs hence requiring in-depth investigation. In this work, we show that it… ▽ More

    Submitted 27 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.