Skip to main content

Showing 1–50 of 139 results for author: Zhuang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07372  [pdf, other

    eess.IV cs.CV

    Trustworthy Contrast-enhanced Brain MRI Synthesis

    Authors: Jiyao Liu, Yuxin Li, Shangqi Gao, Yuncheng Zhou, Xin Gao, Ningsheng Xu, Xiao-Yong Zhang, Xiahai Zhuang

    Abstract: Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking inte… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  2. arXiv:2407.04416  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Audio Generation with Visual Enhanced Caption

    Authors: Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

    Abstract: Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low quality and relatively small quantity of training data. In this work, we aim to create a large-scale audio dataset with rich captions for improving audi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages with 1 appendix

  3. arXiv:2406.19130  [pdf, other

    cs.CV

    Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis

    Authors: Yibo Gao, Zheyao Gao, Xin Gao, Yuanye Liu, Bomin Wang, Xiahai Zhuang

    Abstract: Due to the high stakes in medical decision-making, there is a compelling demand for interpretable deep learning methods in medical image analysis. Concept Bottleneck Models (CBM) have emerged as an active interpretable framework incorporating human-interpretable concepts into decision-making. However, their concept predictions may lack reliability when applied to clinical diagnosis, impeding conce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: accepted by MICCAI 2024

  4. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  5. arXiv:2406.17575  [pdf, other

    cs.CV

    Toward Universal Medical Image Registration via Sharpness-Aware Meta-Continual Learning

    Authors: Bomin Wang, Xinzhe Luo, Xiahai Zhuang

    Abstract: Current deep learning approaches in medical image registration usually face the challenges of distribution shift and data collection, hindering real-world deployment. In contrast, universal medical image registration aims to perform registration on a wide range of clinically relevant tasks simultaneously, thus having tremendous potential for clinical applications. In this paper, we present the fir… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  6. arXiv:2406.11045  [pdf, other

    cs.LG math.NA

    Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving PDEs based on Kolmogorov Arnold Networks

    Authors: Yizheng Wang, Jia Sun, Jinshuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu

    Abstract: AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.09676  [pdf, other

    eess.AS cs.CL

    Optimizing Byte-level Representation for End-to-end ASR

    Authors: Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan Zhuang

    Abstract: We propose a novel approach to optimizing a byte-level representation for end-to-end automatic speech recognition (ASR). Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages is large. The compactness and universality of byte-level representation allow the ASR models to use smaller output vocabularies and therefore, provid… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure

  8. arXiv:2406.09098  [pdf, other

    cs.CL

    SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

    Authors: Kehua Feng, Keyan Ding, Weijie Wang, Xiang Zhuang, Zeyuan Wang, Ming Qin, Yu Zhao, Jianhua Yao, Qiang Zhang, Huajun Chen

    Abstract: The burgeoning utilization of Large Language Models (LLMs) in scientific research necessitates advanced benchmarks capable of evaluating their understanding and application of scientific knowledge comprehensively. To address this need, we introduce the SciKnowEval benchmark, a novel framework that systematically evaluates LLMs across five progressive levels of scientific knowledge: studying extens… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 48 pages, 2 figures

  9. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2405.19689  [pdf, other

    cs.CV cs.IR

    Uncertainty-aware sign language video retrieval with probability distribution modeling

    Authors: Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu

    Abstract: Sign language video retrieval plays a key role in facilitating information access for the deaf community. Despite significant advances in video-text retrieval, the complexity and inherent uncertainty of sign language preclude the direct application of these techniques. Previous methods achieve the mapping between sign language video and text through fine-grained modal alignment. However, due to th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.04828  [pdf, other

    cs.CL

    ChuXin: 1.6B Technical Report

    Authors: Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu

    Abstract: In this report, we present ChuXin, an entirely open-source language model with a size of 1.6 billion parameters. Unlike the majority of works that only open-sourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research communit… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Technical Report

  12. arXiv:2405.02918  [pdf, other

    cs.CV

    MERIT: Multi-view Evidential learning for Reliable and Interpretable liver fibrosis sTaging

    Authors: Yuanye Liu, Zheyao Gao, Nannan Shi, Fuping Wu, Yuxin Shi, Qingchao Chen, Xiahai Zhuang

    Abstract: Accurate staging of liver fibrosis from magnetic resonance imaging (MRI) is crucial in clinical practice. While conventional methods often focus on a specific sub-region, multi-view learning captures more information by analyzing multiple patches simultaneously. However, previous multi-view approaches could not typically calculate uncertainty by nature, and they generally integrate features from d… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Submitted to Medical Image Analysis

    MSC Class: 68U10 ACM Class: I.4.6

  13. arXiv:2404.08979  [pdf, other

    cs.CV cs.LG

    BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

    Authors: Jian Zhang, Ruiteng Zhang, Xinyue Yan, Xiting Zhuang, Ruicheng Cao

    Abstract: Degraded underwater images decrease the accuracy of underwater object detection. However, existing methods for underwater image enhancement mainly focus on improving the indicators in visual aspects, which may not benefit the tasks of underwater image detection, and may lead to serious degradation in performance. To alleviate this problem, we proposed a bidirectional-guided method for underwater o… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures, 4 tables

    MSC Class: 68T07; 68T45 ACM Class: I.4.3; I.4.8; I.4.9; I.4.10; I.2.10

  14. arXiv:2404.07435  [pdf

    cs.CV

    Encoding Urban Ecologies: Automated Building Archetype Generation through Self-Supervised Learning for Energy Modeling

    Authors: Xinwei Zhuang, Zixun Huang, Wentao Zeng, Luisa Caldas

    Abstract: As the global population and urbanization expand, the building sector has emerged as the predominant energy consumer and carbon emission contributor. The need for innovative Urban Building Energy Modeling grows, yet existing building archetypes often fail to capture the unique attributes of local buildings and the nuanced distinctions between different cities, jeopardizing the precision of energy… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  15. arXiv:2403.19121  [pdf, other

    cs.CL

    Code Comparison Tuning for Code Large Language Models

    Authors: Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu

    Abstract: We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version c… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Preprint

  16. arXiv:2403.16406  [pdf

    cs.HC

    Development of a Chinese Human-Automation Trust Scale

    Authors: Zixin Cui, Xiangling Zhuang, Seul Chan Lee, Jieun Lee, Xintong Li, Makoto Itoh

    Abstract: The development of a reliable and valid assessment tool of human-automation trust is an important topic. This study aimed to develop a Chinese version of human-automation trust scale (C-HATS) with reasonable reliability and validity based on Lee and See (2004)'s trust model. After three phases of assessments including exploratory factor analysis, item analysis, and confirmatory factor analysis, di… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 26 pages with 3 figures

  17. arXiv:2403.01582  [pdf, other

    cs.LG

    Selection, Ensemble, and Adaptation: Advancing Multi-Source-Free Domain Adaptation via Architecture Zoo

    Authors: Jiangbo Pei, Ruizhe Li, Aidong Men, Yang Liu, Xiahai Zhuang, Qingchao Chen

    Abstract: Conventional Multi-Source Free Domain Adaptation (MSFDA) assumes that each source domain provides a single source model, and all source models adopt a uniform architecture. This paper introduces Zoo-MSFDA, a more general setting that allows each source domain to offer a zoo of multiple source models with different architectures. While it enriches the source knowledge, Zoo-MSFDA risks being dominat… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  18. arXiv:2402.04779  [pdf, other

    cs.CL cs.AI

    StableMask: Refining Causal Masking in Decoder-only Transformer

    Authors: Qingyu Yin, Xuzheng He, Xiang Zhuang, Yu Zhao, Jianhua Yao, Xiaoyu Shen, Qiang Zhang

    Abstract: The decoder-only Transformer architecture with causal masking and relative position encoding (RPE) has become the de facto choice in language modeling. Despite its exceptional performance across various tasks, we have identified two limitations: First, it requires all attention scores to be non-zero and sum up to 1, even if the current embedding has sufficient self-contained information. This comp… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Preprint

  19. arXiv:2401.14656  [pdf, other

    cs.CL

    Scientific Large Language Models: A Survey on Biological & Chemical Domains

    Authors: Qiang Zhang, Keyang Ding, Tianwen Lyv, Xinda Wang, Qingyu Yin, Yiwen Zhang, Jing Yu, Yuhao Wang, Xiaotong Li, Zhuoyi Xiang, Xiang Zhuang, Zeyuan Wang, Ming Qin, Mengyao Zhang, Jinlu Zhang, Jiyu Cui, Renjun Xu, Hongyang Chen, Xiaohui Fan, Huabin Xing, Huajun Chen

    Abstract: Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent o… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  20. arXiv:2401.02982  [pdf, other

    cs.CL cs.AI

    FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models

    Authors: Shu Liu, Shangqing Zhao, Chenghao Jia, Xinlin Zhuang, Zhaoguang Long, Jie Zhou, Aimin Zhou, Man Lan, Qingquan Wu, Chong Yang

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. However, their proficiency and reliability in the specialized domain of financial data analysis, particularly focusing on data-driven thinking, remain uncertain. To bridge this gap, we introduce \texttt{FinDABench}, a comprehensive benchmark designed to evaluate the financial data analysis capabili… ▽ More

    Submitted 14 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  21. arXiv:2401.02141  [pdf, other

    cs.CV

    Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and Geometry

    Authors: Xinzhe Luo, Xin Wang, Linda Shapiro, Chun Yuan, Jianfeng Feng, Xiahai Zhuang

    Abstract: This article presents a general Bayesian learning framework for multi-modal groupwise registration on medical images. The method builds on probabilistic modelling of the image generative process, where the underlying common anatomy and geometric variations of the observed images are explicitly disentangled as latent variables. Thus, groupwise registration is achieved through the solution to Bayesi… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  22. arXiv:2311.05323  [pdf, other

    cs.CV cs.LG

    Spatial Attention-based Distribution Integration Network for Human Pose Estimation

    Authors: Sihan Gao, Jing Zhu, Xiaoxuan Zhuang, Zhaoyue Wang, Qijin Li

    Abstract: In recent years, human pose estimation has made significant progress through the implementation of deep learning techniques. However, these techniques still face limitations when confronted with challenging scenarios, including occlusion, diverse appearances, variations in illumination, and overlap. To cope with such drawbacks, we present the Spatial Attention-based Distribution Integration Networ… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  23. arXiv:2310.14170  [pdf, other

    cs.LG

    Learning Invariant Molecular Representation in Latent Discrete Space

    Authors: Xiang Zhuang, Qiang Zhang, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen

    Abstract: Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shift… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  24. arXiv:2310.03269  [pdf, other

    q-bio.BM cs.CL

    InstructProtein: Aligning Human and Protein Language via Knowledge Instruction

    Authors: Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, Huajun Chen

    Abstract: Large Language Models (LLMs) have revolutionized the field of natural language processing, but they fall short in comprehending biological sequences such as proteins. To address this challenge, we propose InstructProtein, an innovative LLM that possesses bidirectional generation capabilities in both human and protein languages: (i) taking a protein sequence as input to predict its textual function… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  25. arXiv:2310.00180  [pdf, other

    cs.LG cs.CV cs.HC

    MARL: Multi-scale Archetype Representation Learning for Urban Building Energy Modeling

    Authors: Xinwei Zhuang, Zixun Huang, Wentao Zeng, Luisa Caldas

    Abstract: Building archetypes, representative models of building stock, are crucial for precise energy simulations in Urban Building Energy Modeling. The current widely adopted building archetypes are developed on a nationwide scale, potentially neglecting the impact of local buildings' geometric specificities. We present Multi-scale Archetype Representation Learning (MARL), an approach that leverages repre… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: *Equal Contribution

  26. arXiv:2309.10836  [pdf, other

    cs.CV

    CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

    Authors: Chengyan Wang, Jun Lyu, Shuo Wang, Chen Qin, Kunyuan Guo, Xinyu Zhang, Xiaotong Yu, Yan Li, Fanwen Wang, Jianhua Jin, Zhang Shi, Ziqiang Xu, Yapeng Tian, Sha Hua, Zhensen Chen, Meng Liu, Mengting Sun, Xutong Kuang, Kang Wang, Haoran Wang, Hao Li, Yinghua Chu, Guang Yang, Wenjia Bai, Xiahai Zhuang , et al. (3 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (CMR) has emerged as a valuable diagnostic tool for cardiac diseases. However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images. There has been growing interest in deep learning-based CMR imaging algorithms that can reconstruct high-quality images from highly under-sampled k-space data. However,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 14 pages, 8 figures

  27. Phase field method for quasi-static hydro-fracture in porous media under stress boundary condition considering the effect of initial stress field

    Authors: Shuwei Zhou, Xiaoying Zhuang, Timon Rabczuk

    Abstract: Phase field model (PFM) is an efficient fracture modeling method and has high potential for hydraulic fracturing (HF). However, the current PFMs in HF do not consider well the effect of in-situ stress field and the numerical examples of porous media with stress boundary conditions were rarely presented. The main reason is that if the remote stress is applied on the boundaries of the calculation do… ▽ More

    Submitted 11 July, 2023; originally announced September 2023.

    Journal ref: Theoretical and Applied Fracture Mechanics, 2020, 107: 102523

  28. arXiv:2309.08579  [pdf, ps, other

    cs.CE

    Polytopal composite finite elements for modeling concrete fracture based on nonlocal damage models

    Authors: Hai D. Huynh, S. Natarajan, H. Nguyen-Xuan, Xiaoying Zhuang

    Abstract: The paper presents an assumed strain formulation over polygonal meshes to accurately evaluate the strain fields in nonlocal damage models. An assume strained technique based on the Hu-Washizu variational principle is employed to generate a new strain approximation instead of direct derivation from the basis functions and the displacement fields. The underlying idea embedded in arbitrary finite pol… ▽ More

    Submitted 11 July, 2023; originally announced September 2023.

  29. arXiv:2309.03537  [pdf, other

    eess.SP cs.LG math.FA

    Data-Adaptive Graph Framelets with Generalized Vanishing Moments for Graph Signal Processing

    Authors: Ruigang Zheng, Xiaosheng Zhuang

    Abstract: In this paper, we propose a novel and general framework to construct tight framelet systems on graphs with localized supports based on hierarchical partitions. Our construction provides parametrized graph framelet systems with great generality based on partition trees, by which we are able to find the size of a low-dimensional subspace that best fits the low-rank structure of a family of signals.… ▽ More

    Submitted 30 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    MSC Class: 43A99; 41A45; 94A11; 94A16

  30. Phase-field modeling of fluid-driven dynamic cracking in porous media

    Authors: Shuwei Zhou, Xiaoying Zhuang, Timon Rabczuk

    Abstract: A phase field model for fluid-driven dynamic crack propagation in poroelastic media is proposed. Therefore, classical Biot poroelasticity theory is applied in the porous medium while arbitrary crack growth is naturally captured by the phase field model. We also account for the transition of the fluid property from the intact medium to the fully broken one by employing indicator functions. We emplo… ▽ More

    Submitted 11 July, 2023; originally announced September 2023.

    Journal ref: Computer Methods in Applied Mechanics and Engineering, 2019, 350: 169-198

  31. Phase field modeling of brittle compressive-shear fractures in rock-like materials: A new driving force and a hybrid formulation

    Authors: Shuwei Zhou, Xiaoying Zhuang, Timon Rabczuk

    Abstract: Compressive-shear fracture is commonly observed in rock-like materials. However, this fracture type cannot be captured by current phase field models (PFMs), which have been proven an effective tool for modeling fracture initiation, propagation, coalescence, and branching in solids. The existing PFMs also cannot describe the influence of cohesion and internal friction angle on load-displacement cur… ▽ More

    Submitted 11 July, 2023; originally announced August 2023.

    Journal ref: Computer Methods in Applied Mechanics and Engineering, 2019, 355: 729-752

  32. arXiv:2308.03421  [pdf, other

    cs.CL cs.AI

    RecycleGPT: An Autoregressive Language Model with Recyclable Module

    Authors: Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang

    Abstract: Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations and the next token in a… ▽ More

    Submitted 23 May, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Technical Report

  33. arXiv:2306.16780  [pdf, other

    cs.LG q-bio.BM

    Graph Sampling-based Meta-Learning for Molecular Property Prediction

    Authors: Xiang Zhuang, Qiang Zhang, Bin Wu, Keyan Ding, Yin Fang, Huajun Chen

    Abstract: Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-ba… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Accepted by IJCAI 2023

  34. arXiv:2306.12054  [pdf, other

    cs.CV

    A Reliable and Interpretable Framework of Multi-view Learning for Liver Fibrosis Staging

    Authors: Zheyao Gao, Yuanye Liu, Fuping Wu, NanNan Shi, Yuxin Shi, Xiahai Zhuang

    Abstract: Staging of liver fibrosis is important in the diagnosis and treatment planning of patients suffering from liver diseases. Current deep learning-based methods using abdominal magnetic resonance imaging (MRI) usually take a sub-region of the liver as an input, which nevertheless could miss critical information. To explore richer representations, we formulate this task as a multi-view learning proble… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Early accepted by MICCAI 2023

  35. arXiv:2306.04265  [pdf, other

    cs.LG cs.AI cs.SI math.FA

    Permutation Equivariant Graph Framelets for Heterophilous Graph Learning

    Authors: Jianfei Li, Ruigang Zheng, Han Feng, Ming Li, Xiaosheng Zhuang

    Abstract: The nature of heterophilous graphs is significantly different from that of homophilous graphs, which causes difficulties in early graph neural network models and suggests aggregations beyond the 1-hop neighborhood. In this paper, we develop a new way to implement multi-scale extraction via constructing Haar-type graph framelets with desired properties of permutation equivariance, efficiency, and s… ▽ More

    Submitted 17 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  36. arXiv:2304.08862  [pdf, other

    cs.CL eess.AS

    Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

    Authors: Maurits Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang

    Abstract: This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative exampl… ▽ More

    Submitted 16 August, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted to Interspeech 2023. 5 pages, 2 figures, 2 tables

  37. arXiv:2303.01710  [pdf, other

    cs.CV

    BayeSeg: Bayesian Modeling for Medical Image Segmentation with Interpretable Generalizability

    Authors: Shangqi Gao, Hangqi Zhou, Yibo Gao, Xiahai Zhuang

    Abstract: Due to the cross-domain distribution shift aroused from diverse medical imaging systems, many deep learning segmentation methods fail to perform well on unseen data, which limits their real-world applicability. Recent works have shown the benefits of extracting domain-invariant representations on domain generalization. However, the interpretability of domain-invariant features remains a great chal… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Submitted to Medical Image Analysis

    MSC Class: 68U10 ACM Class: I.4.6

  38. arXiv:2302.03537  [pdf, other

    eess.IV cs.CV

    Aligning Multi-Sequence CMR Towards Fully Automated Myocardial Pathology Segmentation

    Authors: Wangbin Ding, Lei Li, Junyi Qiu, Sihan Wang, Liqin Huang, Yinyin Chen, Shan Yang, Xiahai Zhuang

    Abstract: Myocardial pathology segmentation (MyoPS) is critical for the risk stratification and treatment planning of myocardial infarction (MI). Multi-sequence cardiac magnetic resonance (MS-CMR) images can provide valuable information. For instance, balanced steady-state free precession cine sequences present clear anatomical boundaries, while late gadolinium enhancement and T2-weighted CMR sequences visu… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  39. arXiv:2301.12459  [pdf, other

    cs.CV

    The Influences of Color and Shape Features in Visual Contrastive Learning

    Authors: Xiaoqi Zhuang

    Abstract: In the field of visual representation learning, performance of contrastive learning has been catching up with the supervised method which is commonly a classification convolutional neural network. However, most of the research work focuses on improving the accuracy of downstream tasks such as image classification and object detection. For visual contrastive learning, the influences of individual i… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  40. arXiv:2301.06043  [pdf, other

    eess.IV cs.CV

    Unsupervised Cardiac Segmentation Utilizing Synthesized Images from Anatomical Labels

    Authors: Sihan Wang, Fuping Wu, Lei Li, Zheyao Gao, Byung-Woo Hong, Xiahai Zhuang

    Abstract: Cardiac segmentation is in great demand for clinical practice. Due to the enormous labor of manual delineation, unsupervised segmentation is desired. The ill-posed optimization problem of this task is inherently challenging, requiring well-designed constraints. In this work, we propose an unsupervised framework for multi-class segmentation with both intensity and shape constraints. Firstly, we ext… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

  41. arXiv:2301.05392  [pdf, other

    cs.CV cs.LG

    Multi-Target Landmark Detection with Incomplete Images via Reinforcement Learning and Shape Prior

    Authors: Kaiwen Wan, Lei Li, Dengqiang Jia, Shangqi Gao, Wei Qian, Yingzhi Wu, Huandong Lin, Xiongzheng Mu, Xin Gao, Sijia Wang, Fuping Wu, Xiahai Zhuang

    Abstract: Medical images are generally acquired with limited field-of-view (FOV), which could lead to incomplete regions of interest (ROI), and thus impose a great challenge on medical image analysis. This is particularly evident for the learning-based multi-target landmark detection, where algorithms could be misleading to learn primarily the variation of background due to the varying FOV, failing the dete… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: 29 pages, 13 figures

  42. arXiv:2301.04882  [pdf, other

    cs.CV

    ZScribbleSeg: Zen and the Art of Scribble Supervised Medical Image Segmentation

    Authors: Ke Zhang, Xiahai Zhuang

    Abstract: Curating a large scale fully-annotated dataset can be both labour-intensive and expertise-demanding, especially for medical images. To alleviate this problem, we propose to utilize solely scribble annotations for weakly supervised segmentation. Existing solutions mainly leverage selective losses computed solely on annotated areas and generate pseudo gold standard segmentation by propagating labels… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    Comments: 31 pages, 10 figures

  43. MyoPS-Net: Myocardial Pathology Segmentation with Flexible Combination of Multi-Sequence CMR Images

    Authors: Junyi Qiu, Lei Li, Sihan Wang, Ke Zhang, Yinyin Chen, Shan Yang, Xiahai Zhuang

    Abstract: Myocardial pathology segmentation (MyoPS) can be a prerequisite for the accurate diagnosis and treatment planning of myocardial infarction. However, achieving this segmentation is challenging, mainly due to the inadequate and indistinct information from an image. In this work, we develop an end-to-end deep neural network, referred to as MyoPS-Net, to flexibly combine five-sequence cardiac magnetic… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  44. $\mathcal{X}$-Metric: An N-Dimensional Information-Theoretic Framework for Groupwise Registration and Deep Combined Computing

    Authors: Xinzhe Luo, Xiahai Zhuang

    Abstract: This paper presents a generic probabilistic framework for estimating the statistical dependency and finding the anatomical correspondences among an arbitrary number of medical images. The method builds on a novel formulation of the $N$-dimensional joint intensity distribution by representing the common anatomy as latent variables and estimating the appearance model with nonparametric estimators. T… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  45. arXiv:2211.01438  [pdf, other

    eess.AS cs.CL cs.SD

    Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

    Authors: Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang

    Abstract: This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries,… ▽ More

    Submitted 18 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: To appear in ICASSP 2023

    Journal ref: International Conference on Acoustics, Speech, and Signal Processing, 2023 International Conference on Acoustics, Speech, and Signal Processing International Conference on Acoustics, Speech, and Signal Processing

  46. arXiv:2210.15933  [pdf, other

    cs.CV

    PSFormer: Point Transformer for 3D Salient Object Detection

    Authors: Baian Chen, Lipeng Gu, Xin Zhuang, Yiyang Shen, Weiming Wang, Mingqiang Wei

    Abstract: We propose PSFormer, an effective point transformer model for 3D salient object detection. PSFormer is an encoder-decoder network that takes full advantage of transformers to model the contextual information in both multi-scale point- and scene-wise manners. In the encoder, we develop a Point Context Transformer (PCT) module to capture region contextual features at the point level; PCT contains tw… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  47. arXiv:2208.12881  [pdf, other

    eess.IV cs.CV

    Multi-Modality Cardiac Image Computing: A Survey

    Authors: Lei Li, Wangbin Ding, Liqun Huang, Xiahai Zhuang, Vicente Grau

    Abstract: Multi-modality cardiac imaging plays a key role in the management of patients with cardiovascular diseases. It allows a combination of complementary anatomical, morphological and functional information, increases diagnosis accuracy, and improves the efficacy of cardiovascular interventions and clinical outcomes. Fully-automated processing and quantitative analysis of multi-modality cardiac images… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

    Comments: 30 pages

  48. arXiv:2207.02307  [pdf, ps, other

    cs.CE physics.gen-ph

    Variational energy based XPINNs for phase field analysis in brittle fracture

    Authors: Ayan Chakraborty, Cosmin Anitescu, Somdatta Goswami, Xiaoying Zhuang, Timon Rabczuk

    Abstract: Modeling fracture is computationally expensive even in computational simulations of two-dimensional problems. Hence, scaling up the available approaches to be directly applied to large components or systems crucial for real applications become challenging. In this work. we propose domain decomposition framework for the variational physics-informed neural networks to accurately approximate the crac… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 9 Pages, 8 Figures

  49. arXiv:2206.09148  [pdf, other

    cs.CV

    Deep Compatible Learning for Partially-Supervised Medical Image Segmentation

    Authors: Ke Zhang, Xiahai Zhuang

    Abstract: Partially-supervised learning can be challenging for segmentation due to the lack of supervision for unlabeled structures, and the methods directly applying fully-supervised learning could lead to incompatibility, meaning ground truth is not in the solution set of the optimization problem given the loss function. To address the challenge, we propose a deep compatible learning (DCL) framework, whic… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: 16 pages, 13 figures

  50. arXiv:2206.05284  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Decoupling Predictions in Distributed Learning for Multi-Center Left Atrial MRI Segmentation

    Authors: Zheyao Gao, Lei Li, Fuping Wu, Sihan Wang, Xiahai Zhuang

    Abstract: Distributed learning has shown great potential in medical image analysis. It allows to use multi-center training data with privacy protection. However, data distributions in local centers can vary from each other due to different imaging vendors, and annotation protocols. Such variation degrades the performance of learning-based methods. To mitigate the influence, two groups of methods have been p… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: Accepted by MICCAI 2022