Zum Hauptinhalt springen

Showing 51–100 of 370 results for author: Bai, L

.
  1. arXiv:2403.16162  [pdf, other

    cs.AI

    Multi-Task Learning with Multi-Task Optimization

    Authors: Lu Bai, Abhishek Gupta, Yew-Soon Ong

    Abstract: Multi-task learning solves multiple correlated tasks. However, conflicts may exist between them. In such circumstances, a single solution can rarely optimize all the tasks, leading to performance trade-offs. To arrive at a set of optimized yet well-distributed models that collectively embody different trade-offs in one algorithmic pass, this paper proposes to view Pareto multi-task learning throug… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  2. arXiv:2403.16133  [pdf, other

    cs.AI cs.LG

    SSHPool: The Separated Subgraph-based Hierarchical Pooling

    Authors: Zhuo Xu, Lixin Cui, Ming Li, Yue Wang, Ziyu Lyu, Hangyuan Du, Lu Bai, Philip S. Yu, Edwin R. Hancock

    Abstract: In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. We commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ the local graph convolution units as the local structure to further compress each subgraph into a… ▽ More

    Submitted 13 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  3. arXiv:2403.16130  [pdf, other

    cs.LG cs.AI

    AKBR: Learning Adaptive Kernel-based Representations for Graph Classification

    Authors: Feifei Qian, Lixin Cui, Ming Li, Yue Wang, Hangyuan Du, Lixiang Xu, Lu Bai, Philip S. Yu, Edwin R. Hancock

    Abstract: In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation… ▽ More

    Submitted 13 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  4. arXiv:2403.14185  [pdf, other

    eess.SP

    A LiDAR-Aided Channel Model for Vehicular Intelligent Sensing-Communication Integration

    Authors: Ziwei Huang, Lu Bai, Mingran Sun, Xiang Cheng

    Abstract: In this paper, a novel channel modeling approach, named light detection and ranging (LiDAR)-aided geometry-based stochastic modeling (LA-GBSM), is developed. Based on the developed LA-GBSM approach, a new millimeter wave (mmWave) channel model for sixth-generation (6G) vehicular intelligent sensing-communication integration is proposed, which can support the design of intelligent transportation sy… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  5. arXiv:2403.11817  [pdf, other

    cs.CV

    HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

    Authors: Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

    Abstract: We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised manner. By exploiting the geometric relationship between RGB cameras and LiDAR sensors, the correspondence between the two modalities based on both image-plane view and bird-eye view can be established,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  6. arXiv:2403.11035  [pdf

    physics.optics cs.CV cs.NE physics.app-ph

    Multiplane Quantitative Phase Imaging Using a Wavelength-Multiplexed Diffractive Optical Processor

    Authors: Che-Yung Shen, Jingxi Li, Tianyi Gan, Yuhang Li, Langxing Bai, Mona Jarrahi, Aydogan Ozcan

    Abstract: Quantitative phase imaging (QPI) is a label-free technique that provides optical path length information for transparent specimens, finding utility in biology, materials science, and engineering. Here, we present quantitative phase imaging of a 3D stack of phase-only objects using a wavelength-multiplexed diffractive optical processor. Utilizing multiple spatially engineered diffractive layers tra… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 27 Pages, 9 Figures

    Journal ref: Advanced Photonics (2024)

  7. arXiv:2403.08447  [pdf, other

    physics.med-ph

    Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report

    Authors: Evi M. C. Huijben, Maarten L. Terpstra, Arthur Jr. Galapon, Suraj Pai, Adrian Thummerer, Peter Koopmans, Manya Afonso, Maureen van Eijnatten, Oliver Gurney-Champion, Zeli Chen, Yiwen Zhang, Kaiyi Zheng, Chuanpu Li, Haowen Pang, Chuyang Ye, Runqi Wang, Tao Song, Fuxin Fan, Jingna Qiu, Yixing Huang, Juhyung Ha, Jong Sung Park, Alexandra Alain-Beaudoin, Silvain Bériault, Pengxin Yu , et al. (34 additional authors not shown)

    Abstract: Radiation therapy plays a crucial role in cancer treatment, necessitating precise delivery of radiation to tumors while sparing healthy tissues over multiple days. Computed tomography (CT) is integral for treatment planning, offering electron density data crucial for accurate dose calculations. However, accurately representing patient anatomy is challenging, especially in adaptive radiotherapy, wh… ▽ More

    Submitted 11 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Preprint submitted to Medical Image Analysis

  8. arXiv:2403.07969  [pdf, other

    cs.LG cs.AI

    KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

    Authors: Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code… ▽ More

    Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  9. arXiv:2403.07687  [pdf, other

    cs.CV cs.AI cs.CL

    Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

    Authors: Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea

    Abstract: Current foundation models have shown impressive performance across various tasks. However, several studies have revealed that these models are not effective for everyone due to the imbalanced geographical and economic representation of the data used in the training process. Most of this data comes from Western countries, leading to poor results for underrepresented countries. To address this issue… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted at COLING 2024

  10. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  11. arXiv:2402.13270  [pdf, other

    physics.ao-ph cs.AI cs.LG physics.data-an

    Global Tropical Cyclone Intensity Forecasting with Multi-modal Multi-scale Causal Autoregressive Model

    Authors: Xinyu Wang, Kang Chen, Lei Liu, Tao Han, Bin Li, Lei Bai

    Abstract: Accurate forecasting of Tropical cyclone (TC) intensity is crucial for formulating disaster risk reduction strategies. Current methods predominantly rely on limited spatiotemporal information from ERA5 data and neglect the causal relationships between these physical variables, failing to fully capture the spatial and temporal patterns required for intensity forecasting. To address this issue, we p… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  12. arXiv:2402.12376  [pdf, other

    cs.CV

    FiT: Flexible Vision Transformer for Diffusion Model

    Authors: Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai

    Abstract: Nature is infinitely resolution-free. In the context of this reality, existing diffusion models, such as Diffusion Transformers, often face challenges when processing image resolutions outside of their trained domain. To overcome this limitation, we present the Flexible Vision Transformer (FiT), a transformer architecture specifically designed for generating images with unrestricted resolutions an… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  13. arXiv:2402.11476  [pdf, other

    cs.CV

    EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis

    Authors: Qiaozhi Tan, Long Bai, Guankun Wang, Mobarakol Islam, Hongliang Ren

    Abstract: Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: To appear in IEEE ISBI 2024

  14. arXiv:2402.06985  [pdf, other

    cs.CV cs.AI cs.RO

    OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted Surgery

    Authors: Long Bai, Guankun Wang, Jie Wang, Xiaoxiao Yang, Huxin Gao, Xin Liang, An Wang, Mobarakol Islam, Hongliang Ren

    Abstract: In the realm of automated robotic surgery and computer-assisted interventions, understanding robotic surgical activities stands paramount. Existing algorithms dedicated to surgical activity recognition predominantly cater to pre-defined closed-set paradigms, ignoring the challenges of real-world open-set scenarios. Such algorithms often falter in the presence of test samples originating from class… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: To appear in IEEE ICRA 2024

  15. arXiv:2402.06646  [pdf

    physics.ao-ph cs.LG physics.geo-ph

    Diffusion Model-based Probabilistic Downscaling for 180-year East Asian Climate Reconstruction

    Authors: Fenghua Ling, Zeyu Lu, Jing-Jia Luo, Lei Bai, Swadhin K. Behera, Dachao Jin, Baoxiang Pan, Huidong Jiang, Toshio Yamagata

    Abstract: As our planet is entering into the "global boiling" era, understanding regional climate change becomes imperative. Effective downscaling methods that provide localized insights are crucial for this target. Traditional approaches, including computationally-demanding regional dynamical models or statistical downscaling frameworks, are often susceptible to the influence of downscaling uncertainty. He… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  16. arXiv:2402.05860  [pdf, other

    cs.CV

    Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

    Authors: Mengya Xu, Mobarakol Islam, Long Bai, Hongliang Ren

    Abstract: Deep Neural Networks (DNNs) based semantic segmentation of the robotic instruments and tissues can enhance the precision of surgical activities in robot-assisted surgery. However, in biological learning, DNNs cannot learn incremental tasks over time and exhibit catastrophic forgetting, which refers to the sharp decline in performance on previously learned tasks after learning a new one. Specifical… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, IEEE Transactions on Medical Image (accepted)

  17. arXiv:2402.04290  [pdf, other

    cs.LG cs.AI

    CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

    Authors: Junchao Gong, Lei Bai, Peng Ye, Wanghan Xu, Na Liu, Jianhua Dai, Xiaokang Yang, Wanli Ouyang

    Abstract: Precipitation nowcasting based on radar data plays a crucial role in extreme weather prediction and has broad implications for disaster management. Despite progresses have been made based on deep learning, two key challenges of precipitation nowcasting are not well-solved: (i) the modeling of complex precipitation system evolutions with different scales, and (ii) accurate forecasts for extreme pre… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  18. arXiv:2402.01295  [pdf, other

    cs.LG cs.AI

    ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

    Authors: Wanghan Xu, Kang Chen, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

    Abstract: Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models. However, most of these ML models struggle with accurately predicting extreme weather, which is related to training loss and the uncertainty of weather systems. Through mathemat… ▽ More

    Submitted 16 August, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  19. arXiv:2402.00059  [pdf, other

    cs.LG cs.AI physics.ao-ph

    FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting

    Authors: Tao Han, Song Guo, Fenghua Ling, Kang Chen, Junchao Gong, Jingjia Luo, Junxia Gu, Kan Dai, Wanli Ouyang, Lei Bai

    Abstract: Kilometer-scale modeling of global atmosphere dynamics enables fine-grained weather forecasting and decreases the risk of disastrous weather and climate activity. Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain. Active international efforts have been made in past decades to improve the spatial resolution of numerical weather models. Non… ▽ More

    Submitted 28 January, 2024; originally announced February 2024.

    Comments: 19 pages

  20. arXiv:2401.16669  [pdf

    cs.LG cs.AI physics.ao-ph physics.geo-ph

    Improving Global Weather and Ocean Wave Forecast with Large Artificial Intelligence Models

    Authors: Fenghua Ling, Lin Ouyang, Boufeniza Redouane Larbi, Jing-Jia Luo, Tao Han, Xiaohui Zhong, Lei Bai

    Abstract: The rapid advancement of artificial intelligence technologies, particularly in recent years, has led to the emergence of several large parameter artificial intelligence weather forecast models. These models represent a significant breakthrough, overcoming the limitations of traditional numerical weather prediction models and indicating the emergence of profound potential tools for atmosphere-ocean… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  21. arXiv:2401.16416  [pdf, other

    cs.CV

    Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting

    Authors: Yiming Huang, Beilei Cui, Long Bai, Ziqi Guo, Mengya Xu, Mobarakol Islam, Hongliang Ren

    Abstract: In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previo… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  22. arXiv:2401.12681  [pdf, other

    cs.LG cs.AI

    Non-Neighbors Also Matter to Kriging: A New Contrastive-Prototypical Learning

    Authors: Zhishuai Li, Yunhao Nie, Ziyue Li, Lei Bai, Yisheng Lv, Rui Zhao

    Abstract: Kriging aims at estimating the attributes of unsampled geo-locations from observations in the spatial vicinity or physical connections, which helps mitigate skewed monitoring caused by under-deployed sensors. Existing works assume that neighbors' information offers the basis for estimating the attributes of the unobserved target while ignoring non-neighbors. However, non-neighbors could also offer… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted in AISTATS 2024

  23. arXiv:2401.12505  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci cond-mat.other

    Topological magnons in a non-coplanar magnetic order on the triangular lattice

    Authors: Linli Bai, Ken Chen

    Abstract: The bond-dependent Kitaev interaction $K$ is familiar in the effective spin model of transition metal compounds with octahedral ligands. In this work, we find a peculiar non-coplanar magnetic order can be formed with the help of $K$ and next-nearest neighbor Heisenberg coupling $J_2$ on the triangular lattice. It can be seen as a miniature version of skyrmion crystal, since it has nine spins and a… ▽ More

    Submitted 27 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  24. arXiv:2401.11960  [pdf, other

    cs.CV eess.IV

    Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

    Abstract: Downscaling (DS) of meteorological variables involves obtaining high-resolution states from low-resolution meteorological fields and is an important task in weather forecasting. Previous methods based on deep learning treat downscaling as a super-resolution task in computer vision and utilize high-resolution gridded meteorological fields as supervision to improve resolution at specific grid scales… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  25. arXiv:2401.09274  [pdf, other

    math.OC cs.LG

    Avoiding strict saddle points of nonconvex regularized problems

    Authors: Luwei Bai, Yaohua Hu, Hao Wang, Xiaoqi Yang

    Abstract: In this paper, we consider a class of non-convex and non-smooth sparse optimization problems, which encompass most existing nonconvex sparsity-inducing terms. We show the second-order optimality conditions only depend on the nonzeros of the stationary points. We propose two damped iterative reweighted algorithms including the iteratively reweighted $\ell_1$ algorithm (DIRL$_1$) and the iteratively… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 34 pages,4 figures

  26. arXiv:2401.06013  [pdf, other

    cs.CV cs.AI

    Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren

    Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoR… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by IPCAI 2024 (IJCAR Special Issue)

  27. arXiv:2401.04148  [pdf, other

    cs.LG cs.AI eess.SP

    Online Test-Time Adaptation of Spatial-Temporal Traffic Flow Forecasting

    Authors: Pengxin Guo, Pengrong Jin, Ziyue Li, Lei Bai, Yu Zhang

    Abstract: Accurate spatial-temporal traffic flow forecasting is crucial in aiding traffic managers in implementing control measures and assisting drivers in selecting optimal travel routes. Traditional deep-learning based methods for traffic flow forecasting typically rely on historical data to train their models, which are then used to make predictions on future data. However, the performance of the traine… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  28. arXiv:2401.01759  [pdf, other

    cs.SI cs.CL cs.CV cs.MM

    VGA: Vision and Graph Fused Attention Network for Rumor Detection

    Authors: Lin Bai, Caiyan Jia, Ziying Song, Chaoqun Cui

    Abstract: With the development of social media, rumors have been spread broadly on social media platforms, causing great harm to society. Beside textual information, many rumors also use manipulated images or conceal textual information within images to deceive people and avoid being detected, making multimodal rumor detection be a critical problem. The majority of multimodal rumor detection methods mainly… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  29. arXiv:2401.01117  [pdf, other

    cs.CV eess.IV

    Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

    Authors: Chunyi Li, Haoning Wu, Zicheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-R… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: 6 pages, 5 figures

  30. arXiv:2401.00496  [pdf, other

    cs.CV cs.AI cs.LG

    SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

    Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

    Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More

    Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  31. arXiv:2312.12462  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Towards an end-to-end artificial intelligence driven global weather forecasting system

    Authors: Kun Chen, Lei Bai, Fenghua Ling, Peng Ye, Tao Chen, Jing-Jia Luo, Hao Chen, Yi Xiao, Kang Chen, Tao Han, Wanli Ouyang

    Abstract: The weather forecasting system is important for science and society, and significant achievements have been made in applying artificial intelligence (AI) to medium-range weather forecasting. However, existing AI-based weather forecasting models rely on analysis or reanalysis products from traditional numerical weather prediction (NWP) systems as initial conditions for making predictions. Initial s… ▽ More

    Submitted 8 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  32. arXiv:2312.12455  [pdf, other

    physics.ao-ph cs.AI cs.LG

    FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation

    Authors: Yi Xiao, Lei Bai, Wei Xue, Kang Chen, Tao Han, Wanli Ouyang

    Abstract: Weather forecasting is a crucial yet highly challenging task. With the maturity of Artificial Intelligence (AI), the emergence of data-driven weather forecasting models has opened up a new paradigm for the development of weather forecasting systems. Despite the significant successes that have been achieved (e.g., surpassing advanced traditional physical models for global medium-range forecasting),… ▽ More

    Submitted 19 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 15 pages, 8 figures

  33. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  34. arXiv:2312.10429  [pdf, other

    physics.geo-ph cs.AI

    ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks

    Authors: Pumeng Lyu, Tao Tang, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Recent studies have shown that deep learning (DL) models can skillfully predict the El Niño-Southern Oscillation (ENSO) forecasts over 1.5 years ahead. However, concerns regarding the reliability of predictions made by DL methods persist, including potential overfitting issues and lack of interpretability. Here, we propose ResoNet, a DL model that combines convolutional neural network (CNN) and Tr… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 32 pages, 5 main figures and 12 supplementary figures

  35. arXiv:2312.09576  [pdf, other

    eess.IV cs.CV

    SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

    Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, Jin Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

    Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

  36. arXiv:2312.06428  [pdf, other

    cs.CV cs.AI cs.IR cs.LG

    VisionTraj: A Noise-Robust Trajectory Recovery Framework based on Large-scale Camera Network

    Authors: Zhishuai Li, Ziyue Li, Xiaoru Hu, Guoqing Du, Yunhao Nie, Feng Zhu, Lei Bai, Rui Zhao

    Abstract: Trajectory recovery based on the snapshots from the city-wide multi-camera network facilitates urban mobility sensing and driveway optimization. The state-of-the-art solutions devoted to such a vision-based scheme typically incorporate predefined rules or unsupervised iterative feedback, struggling with multi-fold challenges such as lack of open-source datasets for training the whole pipeline, and… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  37. arXiv:2312.01697  [pdf, other

    cs.CV cs.AI

    Hulk: A Universal Knowledge Translator for Human-Centric Tasks

    Authors: Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

    Abstract: Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis. There is a recent surge to develop human-centric foundation models that can benefit a broad range of human-centric perception tasks. While many human-centric foundation models have achieved success, they did no… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 24 pages, 5 figures

  38. arXiv:2311.02962  [pdf, other

    cs.AI cs.CL cs.IR

    Retrieval-Augmented Code Generation for Universal Information Extraction

    Authors: Yucan Guo, Zixuan Li, Xiaolong Jin, Yantao Liu, Yutao Zeng, Wenxuan Liu, Xiang Li, Pan Yang, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schemas and complex text expressions. Code, as a typical kind of formalized language, is capable of describing structural knowledge under various schemas in a universal way. On the other hand, Large Langua… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  39. arXiv:2311.02631  [pdf, other

    cs.LG cs.AI

    A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery

    Authors: Dedong Li, Ziyue Li, Zhishuai Li, Lei Bai, Qingyuan Gong, Lijun Sun, Wolfgang Ketter, Rui Zhao

    Abstract: The trajectory on the road traffic is commonly collected at a low sampling rate, and trajectory recovery aims to recover a complete and continuous trajectory from the sparse and discrete inputs. Recently, sequential language models have been innovatively adopted for trajectory recovery in a pre-trained manner: it learns road segment representation vectors, which will be used in the downstream task… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: Accepted in ACM SIGSPATIAL 2023

  40. arXiv:2311.00291  [pdf, other

    cs.CV

    Graph Representation Learning for Infrared and Visible Image Fusion

    Authors: Jing Li, Lu Bai, Bin Yang, Chang Li, Lingfei Ma, Edwin R. Hancock

    Abstract: Infrared and visible image fusion aims to extract complementary features to synthesize a single fused image. Many methods employ convolutional neural networks (CNNs) to extract local features due to its translation invariance and locality. However, CNNs fail to consider the image's non-local self-similarity (NLss), though it can expand the receptive field by pooling operations, it still inevitably… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  41. arXiv:2310.14174  [pdf, other

    cs.CL

    An In-Context Schema Understanding Method for Knowledge Base Question Answering

    Authors: Yantao Liu, Zixuan Li, Xiaolong Jin, Yucan Guo, Long Bai, Saiping Guan, Jiafeng Guo, Xueqi Cheng

    Abstract: The Knowledge Base Question Answering (KBQA) task aims to answer natural language questions based on a given knowledge base. Recently, Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task. In doing so, a major challenge for LLMs is to overcome the immensity and heterogeneity of knowledge base schemas.Existing methods bypass this c… ▽ More

    Submitted 10 February, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

  42. arXiv:2310.13447  [pdf, other

    cs.CV cs.AI cs.CL

    Superpixel Semantics Representation and Pre-training for Vision-Language Task

    Authors: Siyu Zhang, Yeming Chen, Yaoru Sun, Fang Wang, Jun Yang, Lizhi Bai, Shangce Gao

    Abstract: The key to integrating visual language tasks is to establish a good alignment strategy. Recently, visual semantic representation has achieved fine-grained visual understanding by dividing grids or image patches. However, the coarse-grained semantic interactions in image space should not be ignored, which hinders the extraction of complex contextual semantic relations at the scene boundaries. This… ▽ More

    Submitted 21 July, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  43. arXiv:2310.09937  [pdf, other

    eess.IV eess.SP

    Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion

    Authors: Long Bai, Shilong Yao, Kun Gao, Yanjun Huang, Ruijie Tang, Hong Yan, Max Q. -H. Meng, Hongliang Ren

    Abstract: Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: To appear in IEEE Sensors Journal

  44. arXiv:2310.08261  [pdf, other

    cs.CV

    GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi-Modal 3D Object Detection

    Authors: Ziying Song, Haiyue Wei, Lin Bai, Lei Yang, Caiyan Jia

    Abstract: LiDAR and cameras are complementary sensors for 3D object detection in autonomous driving. However, it is challenging to explore the unnatural interaction between point clouds and images, and the critical factor is how to conduct feature alignment of heterogeneous modalities. Currently, many methods achieve feature alignment by projection calibration only, without considering the problem of coordi… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  45. arXiv:2310.01994  [pdf, other

    cs.CV

    Understanding Masked Autoencoders From a Local Contrastive Perspective

    Authors: Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Luping Zhou, Wanli Ouyang

    Abstract: Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we fir… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

  46. arXiv:2309.15718  [pdf, other

    physics.chem-ph physics.comp-ph

    Geometry-enhanced Pre-training on Interatomic Potentials

    Authors: Taoyong Cui, Chenyu Tang, Mao Su, Shufei Zhang, Yuqiang Li, Lei Bai, Yuhan Dong, Xingao Gong, Wanli Ouyang

    Abstract: Machine learning interatomic potentials (MLIPs) enables molecular dynamics (MD) simulations with ab initio accuracy and has been applied to various fields of physical science. However, the performance and transferability of MLIPs are limited by insufficient labeled training data, which require expensive ab initio calculations to obtain the labels, especially for complex molecular systems. To addre… ▽ More

    Submitted 12 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Journal ref: Published in Nature Machine Intelligence 2024

  47. arXiv:2309.12960  [pdf, other

    cs.CL

    Nested Event Extraction upon Pivot Element Recogniton

    Authors: Weicheng Ren, Zixuan Li, Xiaolong Jin, Long Bai, Miao Su, Yantao Liu, Saiping Guan, Jiafeng Guo, Xueqi Cheng

    Abstract: Nested Event Extraction (NEE) aims to extract complex event structures where an event contains other events as its arguments recursively. Nested events involve a kind of Pivot Elements (PEs) that simultaneously act as arguments of outer-nest events and as triggers of inner-nest events, and thus connect them into nested structures. This special characteristic of PEs brings challenges to existing NE… ▽ More

    Submitted 7 April, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted at LREC-COLING 2024

  48. arXiv:2309.12892  [pdf, other

    cs.CL cs.AI

    ProtoEM: A Prototype-Enhanced Matching Framework for Event Relation Extraction

    Authors: Zhilei Hu, Zixuan Li, Daozhu Xu, Long Bai, Cheng Jin, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: Event Relation Extraction (ERE) aims to extract multiple kinds of relations among events in texts. However, existing methods singly categorize event relations as different classes, which are inadequately capturing the intrinsic semantics of these relations. To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation an… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Work in progress

  49. arXiv:2309.10431  [pdf, other

    cs.CV

    Sample-adaptive Augmentation for Point Cloud Recognition Against Real-world Corruptions

    Authors: Jie Wang, Lihe Ding, Tingfa Xu, Shaocong Dong, Xinli Xu, Long Bai, Jianan Li

    Abstract: Robust 3D perception under corruption has become an essential task for the realm of 3D vision. While current data augmentation techniques usually perform random transformations on all point cloud objects in an offline way and ignore the structure of the samples, resulting in over-or-under enhancement. In this work, we propose an alternative to make sample-adaptive transformations based on the stru… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV2023; code: https://github.com/Roywangj/AdaptPoint

  50. arXiv:2309.10242  [pdf, other

    math.OC

    Reinforcement Learning for optimal dividend problem under diffusion model

    Authors: Lihua Bai, Thejani Gamage, Jin Ma, Pengxu Xie

    Abstract: In this paper, we study the optimal dividend problem under the continuous time diffusion model with the dividend rate being restricted in a given finite interval. Unlike the standard literature, we shall particularly be interested in the case when the parameters (e.g. drift and diffusion coefficients) of the model are not specified so that the optimal control cannot be explicitly determined. We th… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.