Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Phan, M H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14227  [pdf, other

    cs.CV

    TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation

    Authors: Anh-Dzung Doan, Vu Minh Hieu Phan, Surabhi Gupta, Markus Wagner, Tat-Jun Chin, Ian Reid

    Abstract: Infrared imaging offers resilience against changing lighting conditions by capturing object temperatures. Yet, in few scenarios, its lack of visual details compared to daytime visible images, poses a significant challenge for human and machine interpretation. This paper proposes a novel diffusion method, dubbed Temporally Consistent Patch Diffusion Models (TC-DPM), for infrared-to-visible video tr… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Technical report

  2. arXiv:2408.13491  [pdf, other

    cs.CV

    ESA: Annotation-Efficient Active Learning for Semantic Segmentation

    Authors: Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao

    Abstract: Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  3. arXiv:2408.02001  [pdf, other

    cs.CV

    AdaCBM: An Adaptive Concept Bottleneck Model for Explainable and Accurate Diagnosis

    Authors: Townim F. Chowdhury, Vu Minh Hieu Phan, Kewen Liao, Minh-Son To, Yutong Xie, Anton van den Hengel, Johan W. Verjans, Zhibin Liao

    Abstract: The integration of vision-language models such as CLIP and Concept Bottleneck Models (CBMs) offers a promising approach to explaining deep neural network (DNN) decisions using concepts understandable by humans, addressing the black-box concern of DNNs. While CLIP provides both explainability and zero-shot classification capability, its pre-training on generic image and text data may limit its clas… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted at MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

  4. arXiv:2407.19546  [pdf, other

    cs.CV

    XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

    Authors: Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu

    Abstract: Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modelling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second,… ▽ More

    Submitted 2 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

  5. arXiv:2406.18967  [pdf, other

    cs.CV

    Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis

    Authors: Vu Minh Hieu Phan, Yutong Xie, Bowen Zhang, Yuankai Qi, Zhibin Liao, Antonios Perperidis, Son Lam Phung, Johan W. Verjans, Minh-Son To

    Abstract: Unpaired medical image synthesis aims to provide complementary information for an accurate clinical diagnostics, and address challenges in obtaining aligned multi-modal medical scans. Transformer-based models excel in imaging translation tasks thanks to their ability to capture long-range dependencies. Although effective in supervised training settings, their performance falters in unpaired image… ▽ More

    Submitted 28 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: MICCAI version before camera ready

  6. arXiv:2404.02388  [pdf, other

    cs.CV

    CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

    Authors: Townim Faisal Chowdhury, Kewen Liao, Vu Minh Hieu Phan, Minh-Son To, Yutong Xie, Kevin Hung, David Ross, Anton van den Hengel, Johan W. Verjans, Zhibin Liao

    Abstract: Deep Neural Networks (DNNs) are widely used for visual classification tasks, but their complex computation process and black-box nature hinder decision transparency and interpretability. Class activation maps (CAMs) and recent variants provide ways to visually explain the DNN decision-making process by displaying 'attention' heatmaps of the DNNs. Nevertheless, the CAM explanation only offers relat… ▽ More

    Submitted 4 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  7. arXiv:2403.07636  [pdf, other

    cs.CV

    Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

    Authors: Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, Bowen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans

    Abstract: Medical vision language pre-training (VLP) has emerged as a frontier of research, enabling zero-shot pathological recognition by comparing the query image with the textual descriptions for each disease. Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports. This leads to the misalignment with the ta… ▽ More

    Submitted 31 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR2024. Pre-print before final camera-ready version

    Journal ref: CVPR2024

  8. arXiv:2307.16143  [pdf, other

    eess.IV cs.CV

    Structure-Preserving Synthesis: MaskGAN for Unpaired MR-CT Translation

    Authors: Minh Hieu Phan, Zhibin Liao, Johan W. Verjans, Minh-Son To

    Abstract: Medical image synthesis is a challenging task due to the scarcity of paired data. Several methods have applied CycleGAN to leverage unpaired data, but they often generate inaccurate mappings that shift the anatomy. This problem is further exacerbated when the images from the source and target modalities are heavily misaligned. Recently, current methods have aimed to address this issue by incorpora… ▽ More

    Submitted 31 July, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: Accepted to MICCAI 2023

    Journal ref: MICCAI 2023

  9. arXiv:2306.08075  [pdf, other

    cs.CV

    BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation

    Authors: Liyang Liu, Zihan Wang, Minh Hieu Phan, Bowen Zhang, Jinchao Ge, Yifan Liu

    Abstract: Current knowledge distillation approaches in semantic segmentation tend to adopt a holistic approach that treats all spatial locations equally. However, for dense prediction, students' predictions on edge regions are highly uncertain due to contextual information leakage, requiring higher spatial sensitivity knowledge than the body regions. To address this challenge, this paper proposes a novel ap… ▽ More

    Submitted 31 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 17 pages, 9 figures, 9 tables

  10. arXiv:2306.06289  [pdf, other

    cs.CV

    SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

    Authors: Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

    Abstract: This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}. In this study, we introduce a novel Attention-to-Mask (\atm) module to design a lightweight decoder effective for plain ViT. The proposed ATM converts the global attention map into semantic masks for high-quality segmentation res… ▽ More

    Submitted 30 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: IJCV 2023 accepted, 21 pages, 8 figures, 12 tables