Zum Hauptinhalt springen

Showing 1–50 of 60 results for author: Zhen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09736  [pdf, other

    eess.IV cs.CV

    Coarse-Fine View Attention Alignment-Based GAN for CT Reconstruction from Biplanar X-Rays

    Authors: Zhi Qiao, Hanqiang Ouyang, Dongheng Chu, Huishu Yuan, Xiantong Zhen, Pei Dong, Zhen Qian

    Abstract: For surgical planning and intra-operation imaging, CT reconstruction using X-ray images can potentially be an important alternative when CT imaging is not available or not feasible. In this paper, we aim to use biplanar X-rays to reconstruct a 3D CT image, because biplanar X-rays convey richer information than single-view X-rays and are more commonly used by surgeons. Different from previous studi… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2408.09731  [pdf, other

    eess.IV cs.CV

    Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning

    Authors: Zhi Qiao, Xuhui Liu, Xiaopeng Wang, Runkun Liu, Xiantong Zhen, Pei Dong, Zhen Qian

    Abstract: Intraoperative CT imaging serves as a crucial resource for surgical guidance; however, it may not always be readily accessible or practical to implement. In scenarios where CT imaging is not an option, reconstructing CT scans from X-rays can offer a viable alternative. In this paper, we introduce an innovative method for 3D CT reconstruction utilizing biplanar X-rays. Distinct from previous resear… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.09715  [pdf, other

    cs.AI cs.CV cs.LG eess.IV

    HYDEN: Hyperbolic Density Representations for Medical Images and Reports

    Authors: Zhi Qiao, Linbin Han, Xiantong Zhen, Jia-Hong Gao, Zhen Qian

    Abstract: In light of the inherent entailment relations between images and text, hyperbolic point vector embeddings, leveraging the hierarchical modeling advantages of hyperbolic space, have been utilized for visual semantic representation learning. However, point vector embedding approaches fail to address the issue of semantic uncertainty, where an image may have multiple interpretations, and text may ref… ▽ More

    Submitted 19 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2407.13545  [pdf, other

    eess.IV cs.CV

    DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

    Authors: Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Juan Zhang, Xiantong Zhen, Zhen Qian, Baochang Zhang

    Abstract: Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific res… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2406.16282  [pdf, other

    cs.LG cs.AI

    Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

    Authors: Yuchen Yang, Yingdong Shi, Cheems Wang, Xiantong Zhen, Yuxuan Shi, Jun Xu

    Abstract: Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibil… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 25 pages, ICML 2024 Accepted

  6. arXiv:2403.11418  [pdf, other

    cs.LG cs.AI

    Variational Sampling of Temporal Trajectories

    Authors: Jurijs Nazarovs, Zhichun Huang, Xingjian Zhen, Sourav Pal, Rudrasis Chakraborty, Vikas Singh

    Abstract: A deterministic temporal process can be determined by its trajectory, an element in the product space of (a) initial condition $z_0 \in \mathcal{Z}$ and (b) transition function $f: (\mathcal{Z}, \mathcal{T}) \to \mathcal{Z}$ often influenced by the control of the underlying dynamical system. Existing methods often model the transition function as a differential equation or as a recurrent neural ne… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  7. arXiv:2310.18713  [pdf, other

    cs.LG

    Episodic Multi-Task Learning with Heterogeneous Neural Processes

    Authors: Jiayi Shen, Xiantong Zhen, Qi, Wang, Marcel Worring

    Abstract: This paper focuses on the data-insufficiency problem in multi-task learning within an episodic training setup. Specifically, we explore the potential of heterogeneous information across tasks and meta-knowledge among episodes to effectively tackle each task with limited data. Existing meta-learning methods often fail to take advantage of crucial heterogeneous information in a single episode, while… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: 28 pages, spotlight of NeurIPS 2023

  8. arXiv:2309.13258  [pdf, other

    cs.CV cs.LG

    Order-preserving Consistency Regularization for Domain Adaptation and Generalization

    Authors: Mengmeng Jing, Xiantong Zhen, Jingjing Li, Cees Snoek

    Abstract: Deep learning models fail on cross-domain challenges if the model is oversensitive to domain-specific attributes, e.g., lightning, background, camera angle, etc. To alleviate this problem, data augmentation coupled with consistency regularization are commonly adopted to make the model less sensitive to domain-specific attributes. Consistency regularization enforces the model to output the same rep… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  9. arXiv:2309.02041  [pdf, other

    cs.CV

    Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

    Authors: Guanghui Li, Mingqi Gao, Heng Liu, Xiantong Zhen, Feng Zheng

    Abstract: Referring video object segmentation (RVOS), as a supervised learning task, relies on sufficient annotated data for a given scene. However, in more realistic scenarios, only minimal annotations are available for a new scene, which poses significant challenges to existing RVOS methods. With this in mind, we propose a simple yet effective model with a newly designed cross-modal affinity (CMA) module… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV2023

  10. arXiv:2309.00917  [pdf, other

    cs.CL cs.AI

    Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports

    Authors: Tom van Sonsbeek, Xiantong Zhen, Marcel Worring

    Abstract: The way we analyse clinical texts has undergone major changes over the last years. The introduction of language models such as BERT led to adaptations for the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on large databases of archived medical documents. While performing well in terms of accuracy, both the lack of interpretability and limitations to transfer across langua… ▽ More

    Submitted 14 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    MSC Class: 68T07

  11. arXiv:2308.11186  [pdf, other

    cs.CV

    Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

    Authors: Baoshuo Kan, Teng Wang, Wenpeng Lu, Xiantong Zhen, Weili Guan, Feng Zheng

    Abstract: Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable prompts achieve state-of-the-art performance, which however are prone to overfit to seen classes, failing to generalize to unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language mode… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  12. arXiv:2307.04033  [pdf, other

    cs.LG cs.AI

    Probabilistic Test-Time Generalization by Variational Neighbor-Labeling

    Authors: Sameer Ambekar, Zehao Xiao, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek

    Abstract: This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed on unseen target domains. We follow the strict separation of source training and target testing, but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to genera… ▽ More

    Submitted 1 July, 2024; v1 submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted by CoLLAs 2024

  13. arXiv:2307.03998  [pdf, other

    cs.CV eess.IV

    Lightweight Improved Residual Network for Efficient Inverse Tone Mapping

    Authors: Liqi Xue, Tianyi Xu, Yongbao Song, Yan Liu, Lei Zhang, Xiantong Zhen, Jun Xu

    Abstract: The display devices like HDR10 televisions are increasingly prevalent in our daily life for visualizing high dynamic range (HDR) images. But the majority of media images on the internet remain in 8-bit standard dynamic range (SDR) format. Therefore, converting SDR images to HDR ones by inverse tone mapping (ITM) is crucial to unlock the full potential of abundant media images. However, existing IT… ▽ More

    Submitted 16 December, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

  14. arXiv:2306.05189  [pdf, other

    cs.LG

    EMO: Episodic Memory Optimization for Few-Shot Meta-Learning

    Authors: Yingjun Du, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek

    Abstract: Few-shot meta-learning presents a challenge for gradient descent optimization due to the limited number of training samples per task. To address this issue, we propose an episodic memory optimization for meta-learning, we call EMO, which is inspired by the human ability to recall past learning experiences from the brain's memory. EMO retains the gradient history of past experienced tasks in extern… ▽ More

    Submitted 26 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted by CoLLAs 2023

  15. arXiv:2305.10309  [pdf, other

    cs.LG

    MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks

    Authors: Wenfang Sun, Yingjun Du, Xiantong Zhen, Fan Wang, Ling Wang, Cees G. M. Snoek

    Abstract: Meta-learning algorithms are able to learn a new task using previously learned knowledge, but they often require a large number of meta-training tasks which may not be readily available. To address this issue, we propose a method for few-shot learning with fewer tasks, which we call MetaModulation. The key idea is to use a neural network to increase the density of the meta-training tasks by modula… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

  16. arXiv:2305.09924  [pdf, other

    cs.CV

    CageViT: Convolutional Activation Guided Efficient Vision Transformer

    Authors: Hao Zheng, Jinbao Wang, Xiantong Zhen, Hong Chen, Jingkuan Song, Feng Zheng

    Abstract: Recently, Transformers have emerged as the go-to architecture for both vision and language modeling tasks, but their computational efficiency is limited by the length of the input sequence. To address this, several efficient variants of Transformers have been proposed to accelerate computation or reduce memory consumption while preserving performance. This paper presents an efficient vision Transf… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 9 pages, 3 figures, NeurIPS conference

    MSC Class: 68T45 ACM Class: I.4.10

  17. arXiv:2304.00101  [pdf, other

    cs.CV

    SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail

    Authors: Yingjun Du, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek

    Abstract: Modern image classifiers perform well on populated classes, while degrading considerably on tail classes with only a few instances. Humans, by contrast, effortlessly handle the long-tailed recognition challenge, since they can learn the tail representation based on different levels of semantic abstraction, making the learned tail features more discriminative. This phenomenon motivated us to propos… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023

  18. arXiv:2303.16491  [pdf, other

    cs.CV

    Implicit Diffusion Models for Continuous Super-Resolution

    Authors: Sicheng Gao, Xuhui Liu, Bohan Zeng, Sheng Xu, Yanjing Li, Xiaoyan Luo, Jianzhuang Liu, Xiantong Zhen, Baochang Zhang

    Abstract: Image super-resolution (SR) has attracted increasing attention due to its wide applications. However, current SR methods generally suffer from over-smoothing and artifacts, and most work only with fixed magnifications. This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution. IDM integrates an implicit neural representation and a denoising diffusi… ▽ More

    Submitted 3 September, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: 8 pages, 9 figures, published to CVPR2023

  19. arXiv:2303.11055  [pdf, other

    eess.IV cs.CV

    Parameter-Free Channel Attention for Image Classification and Super-Resolution

    Authors: Yuxuan Shi, Lingxiao Yang, Wangpeng An, Xiantong Zhen, Liuqing Wang

    Abstract: The channel attention mechanism is a useful technique widely employed in deep convolutional neural networks to boost the performance for image processing tasks, eg, image classification and image super-resolution. It is usually designed as a parameterized sub-network and embedded into the convolutional layers of the network to learn more powerful feature representations. However, current channel a… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  20. arXiv:2302.14794  [pdf, other

    cs.CV

    Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning

    Authors: Ivona Najdenkoska, Xiantong Zhen, Marcel Worring

    Abstract: Multimodal few-shot learning is challenging due to the large domain gap between vision and language modalities. Existing methods are trying to communicate visual concepts as prompts to frozen language models, but rely on hand-engineered task induction to reduce the hypothesis space. To make the whole process learnable, we introduce a multimodal meta-learning approach. Specifically, our approach de… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: International Conference on Learning Representations 2023

  21. arXiv:2302.11215  [pdf, other

    cs.LG

    Energy-Based Test Sample Adaptation for Domain Generalization

    Authors: Zehao Xiao, Xiantong Zhen, Shengcai Liao, Cees G. M. Snoek

    Abstract: In this paper, we propose energy-based sample adaptation at test time for domain generalization. Where previous works adapt their models to target domains, we adapt the unseen target samples to source-trained models. To this end, we design a discriminative energy-based model, which is trained on source domains to jointly model the conditional distribution for classification and data distribution f… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted by ICLR 2023

  22. arXiv:2302.09027  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension

    Authors: Zhi Zhang, Helen Yannakoudakis, Xiantong Zhen, Ekaterina Shutova

    Abstract: The task of multimodal referring expression comprehension (REC), aiming at localizing an image region described by a natural language expression, has recently received increasing attention within the research comminity. In this paper, we specifically focus on referring expression comprehension with commonsense knowledge (KB-Ref), a task which typically requires reasoning beyond spatial, visual or… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  23. arXiv:2210.10378  [pdf, other

    cs.LG cs.CV

    Variational Model Perturbation for Source-Free Domain Adaptation

    Authors: Mengmeng Jing, Xiantong Zhen, Jingjing Li, Cees G. M. Snoek

    Abstract: We aim for source-free domain adaptation, where the task is to deploy a model pre-trained on source domains to target domains. The challenges stem from the distribution shift from the source to the target domain, coupled with the unavailability of any source data and labeled target data for optimization. Rather than fine-tuning the model by updating the parameters, we propose to perturb the source… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  24. arXiv:2210.06980  [pdf, other

    cs.CV

    Probabilistic Integration of Object Level Annotations in Chest X-ray Classification

    Authors: Tom van Sonsbeek, Xiantong Zhen, Dwarikanath Mahapatra, Marcel Worring

    Abstract: Medical image datasets and their annotations are not growing as fast as their equivalents in the general domain. This makes translation from the newest, more data-intensive methods that have made a large impact on the vision field increasingly more difficult and less efficient. In this paper, we propose a new probabilistic latent variable model for disease classification in chest X-ray images. Spe… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: WACV 2023

    MSC Class: 68T07

  25. arXiv:2210.04637  [pdf, other

    cs.CV

    Association Graph Learning for Multi-Task Classification with Category Shifts

    Authors: Jiayi Shen, Zehao Xiao, Xiantong Zhen, Cees G. M. Snoek, Marcel Worring

    Abstract: In this paper, we focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. In particular, we tackle a new setting, which is more realistic than currently addressed in the literature, where categories shift from training to test data. Hence, individual tasks do not contain complete training data for the categories in the test… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  26. arXiv:2207.09684  [pdf, other

    cs.CV

    On the Versatile Uses of Partial Distance Correlation in Deep Learning

    Authors: Xingjian Zhen, Zihang Meng, Rudrasis Chakraborty, Vikas Singh

    Abstract: Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic com… ▽ More

    Submitted 8 November, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: This paper has been selected as best paper award for ECCV 2022!

    Journal ref: ECCV 2022

  27. arXiv:2207.03367  [pdf, other

    cs.CV

    Joint Super-Resolution and Inverse Tone-Mapping: A Feature Decomposition Aggregation Network and A New Benchmark

    Authors: Gang Xu, Yu-chen Yang, Liang Wang, Xian-Tong Zhen, Jun Xu

    Abstract: Joint Super-Resolution and Inverse Tone-Mapping (joint SR-ITM) aims to increase the resolution and dynamic range of low-resolution and standard dynamic range images. Recent networks mainly resort to image decomposition techniques with complex multi-branch architectures. However, the fixed decomposition techniques would largely restricts their power on versatile images. To exploit the potential pow… ▽ More

    Submitted 7 September, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: update the authors info and the article template

  28. arXiv:2204.05737  [pdf, other

    cs.CV

    LifeLonger: A Benchmark for Continual Disease Classification

    Authors: Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Tom van Sonsbeek, Xiantong Zhen, Dwarikanath Mahapatra, Marcel Worring, Cees G. M. Snoek

    Abstract: Deep learning models have shown a great effectiveness in recognition of findings in medical images. However, they cannot handle the ever-changing clinical environment, bringing newly annotated medical data from different sources. To exploit the incoming streams of data, these models would benefit largely from sequentially learning from new samples, without forgetting the previously obtained knowle… ▽ More

    Submitted 30 June, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    MSC Class: 68T07

  29. arXiv:2203.03624  [pdf, other

    eess.IV cs.CV

    FCNet: A Convolutional Neural Network for Arbitrary-Length Exposure Estimation

    Authors: Jin Liang, Yuchen Yang, Anran Zhang, Jun Xu, Hui Li, Xiantong Zhen

    Abstract: The photographs captured by digital cameras usually suffer from over or under exposure problems. For image exposure enhancement, the tasks of Single-Exposure Correction (SEC) and Multi-Exposure Fusion (MEF) are widely studied in the image processing community. However, current SEC or MEF methods are developed under different motivations and thus ignore the internal correlation between SEC and MEF,… ▽ More

    Submitted 7 September, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

  30. arXiv:2202.08045  [pdf, other

    cs.LG cs.CV

    Learning to Generalize across Domains on Single Test Samples

    Authors: Zehao Xiao, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: We strive to learn a model from a set of source domains that generalizes well to unseen target domains. The main challenge in such a domain generalization scenario is the unavailability of any target domain data during training, resulting in the learned model not being explicitly adapted to the unseen target domains. We propose learning to generalize across domains on single test samples. We lever… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  31. arXiv:2112.13410  [pdf, other

    cs.LG cs.AI

    Generative Kernel Continual learning

    Authors: Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: Kernel continual learning by \citet{derakhshani2021kernel} has recently emerged as a strong continual learner due to its non-parametric ability to tackle task interference and catastrophic forgetting. Unfortunately its success comes at the expense of an explicit memory to store samples from past tasks, which hampers scalability to continual learning settings with a large number of tasks. In this p… ▽ More

    Submitted 26 December, 2021; originally announced December 2021.

    Comments: work in progress

  32. arXiv:2112.08181  [pdf, other

    cs.LG

    Hierarchical Variational Memory for Few-shot Learning Across Domains

    Authors: Yingjun Du, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: Neural memory enables fast adaptation to new tasks with just a few training samples. Existing memory models store features only from the single last layer, which does not generalize well in presence of a domain shift between training and test distributions. Rather than relying on a flat memory, we propose a hierarchical alternative that stores features at different semantic levels. We introduce a… ▽ More

    Submitted 20 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 17 pages, 5 figures

    Journal ref: ICLR 2022

  33. arXiv:2111.05820  [pdf, other

    cs.LG

    Multi-Task Neural Processes

    Authors: Jiayi Shen, Xiantong Zhen, Marcel Worring, Ling Shao

    Abstract: Neural processes have recently emerged as a class of powerful neural latent variable models that combine the strengths of neural networks and stochastic processes. As they can encode contextual data in the network's function space, they offer a new way to model task relatedness in multi-task learning. To study its potential, we develop multi-task neural processes, a new variant of neural processes… ▽ More

    Submitted 2 December, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

  34. arXiv:2111.05323  [pdf, other

    cs.LG

    Variational Multi-Task Learning with Gumbel-Softmax Priors

    Authors: Jiayi Shen, Xiantong Zhen, Marcel Worring, Ling Shao

    Abstract: Multi-task learning aims to explore task relatedness to improve individual tasks, which is of particular significance in the challenging scenario that only limited data is available for each task. To tackle this challenge, we propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks. We cast multi-task learning as a variational… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: 19 pages, 6 figures, accepted by NeurIPS 2021

  35. arXiv:2108.13393  [pdf, other

    cs.CV

    Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation

    Authors: Hongjun Chen, Jinbao Wang, Hong Cai Chen, Xiantong Zhen, Feng Zheng, Rongrong Ji, Ling Shao

    Abstract: Annotation burden has become one of the biggest barriers to semantic segmentation. Approaches based on click-level annotations have therefore attracted increasing attention due to their superior trade-off between supervision and annotation cost. In this paper, we propose seminar learning, a new learning paradigm for semantic segmentation with click-level supervision. The fundamental rationale of s… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

  36. arXiv:2107.07314  [pdf, other

    cs.CV cs.LG eess.IV

    Variational Topic Inference for Chest X-Ray Report Generation

    Authors: Ivona Najdenkoska, Xiantong Zhen, Marcel Worring, Ling Shao

    Abstract: Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: To be published in the International Conference on Medical Image Computing and Computer Assisted Intervention 2021

  37. arXiv:2107.05757  [pdf, other

    cs.LG cs.AI

    Kernel Continual Learning

    Authors: Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: This paper introduces kernel continual learning, a simple but effective variant of continual learning that leverages the non-parametric nature of kernel methods to tackle catastrophic forgetting. We deploy an episodic memory unit that stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression. This does not require memory replay and systematically… ▽ More

    Submitted 14 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: accepted to ICML 2021

  38. arXiv:2106.02960  [pdf, other

    cs.CL

    Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation

    Authors: Yingjun Du, Nithin Holla, Xiantong Zhen, Cees G. M. Snoek, Ekaterina Shutova

    Abstract: A critical challenge faced by supervised word sense disambiguation (WSD) is the lack of large annotated datasets with sufficient coverage of words in their diversity of senses. This inspired recent research on few-shot WSD using meta-learning. While such work has successfully applied meta-learning to learn new word senses from very few examples, its performance still lags behind its fully supervis… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    Comments: 15 pages, 5 figures

    Journal ref: ACL-IJCNLP 2021

  39. arXiv:2105.06668  [pdf, other

    cs.CV

    Attentional Prototype Inference for Few-Shot Segmentation

    Authors: Haoliang Sun, Xiankai Lu, Haochen Wang, Yilong Yin, Xiantong Zhen, Cees G. M. Snoek, Ling Shao

    Abstract: This paper aims to address few-shot segmentation. While existing prototype-based methods have achieved considerable success, they suffer from uncertainty and ambiguity caused by limited labeled examples. In this work, we propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype o… ▽ More

    Submitted 29 May, 2023; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: Pattern Recognition Journal

  40. arXiv:2105.04030  [pdf, other

    cs.LG

    A Bit More Bayesian: Domain-Invariant Learning with Uncertainty

    Authors: Zehao Xiao, Jiayi Shen, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: Domain generalization is challenging due to the domain shift and the uncertainty caused by the inaccessibility of target domain data. In this paper, we address both challenges with a probabilistic framework based on variational Bayesian inference, by incorporating uncertainty into neural network weights. We couple domain invariance in a probabilistic formula with the variational Bayesian inference… ▽ More

    Submitted 14 July, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: accepted to ICML 2021

  41. arXiv:2105.03781  [pdf, other

    cs.LG cs.CV

    MetaKernel: Learning Variational Random Features with Limited Labels

    Authors: Yingjun Du, Haoliang Sun, Xiantong Zhen, Jun Xu, Yilong Yin, Ling Shao, Cees G. M. Snoek

    Abstract: Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: 19 pages,7 figures. arXiv admin note: substantial text overlap with arXiv:2006.06707

  42. arXiv:2105.02626  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations

    Authors: Varun Nagaraj Rao, Xingjian Zhen, Karen Hovsepian, Mingwei Shen

    Abstract: Explainable deep learning models are advantageous in many situations. Prior work mostly provide unimodal explanations through post-hoc approaches not part of the original system design. Explanation mechanisms also ignore useful textual information present in images. In this paper, we propose MTXNet, an end-to-end trainable multimodal architecture to generate multimodal explanations, which focuses… ▽ More

    Submitted 28 April, 2021; originally announced May 2021.

    Comments: This paper is done when Xingjian was an intern in Amazon PARS group, summer 2020. This paper is accepted by NAACL-MAI-Workshop, 2021

  43. arXiv:2104.05888  [pdf, other

    cs.LG cs.AI cs.CV

    Simpler Certified Radius Maximization by Propagating Covariances

    Authors: Xingjian Zhen, Rudrasis Chakraborty, Vikas Singh

    Abstract: One strategy for adversarially training a robust model is to maximize its certified radius -- the neighborhood around a given training sample for which the model's prediction remains unchanged. The scheme typically involves analyzing a "smoothed" classifier where one estimates the prediction corresponding to Gaussian samples in the neighborhood of each sample in the mini-batch, accomplished in pra… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: This paper has been accepted by CVPR 2021 as an oral presentation. An introduction video can be found: https://youtu.be/m1ya2oNf5iE

  44. arXiv:2103.10825  [pdf, other

    eess.IV cs.CV

    Variational Knowledge Distillation for Disease Classification in Chest X-Rays

    Authors: Tom van Sonsbeek, Xiantong Zhen, Marcel Worring, Ling Shao

    Abstract: Disease classification relying solely on imaging data attracts great interest in medical image analysis. Current models could be further improved, however, by also employing Electronic Health Records (EHRs), which contain rich information on patients and findings from clinicians. It is challenging to incorporate this information into disease classification due to the high reliance on clinician inp… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

  45. Direct Estimation of Spinal Cobb Angles by Structured Multi-Output Regression

    Authors: Haoliang Sun, Xiantong Zhen, Chris Bailey, Parham Rasoulinejad, Yilong Yin, Shuo Li

    Abstract: The Cobb angle that quantitatively evaluates the spinal curvature plays an important role in the scoliosis diagnosis and treatment. Conventional measurement of these angles suffers from huge variability and low reliability due to intensive manual intervention. However, since there exist high ambiguity and variability around boundaries of vertebrae, it is challenging to obtain Cobb angles automatic… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: Proceedings of International Conference on Information Processing in Medical Imaging (IPMI 2017)

  46. arXiv:2012.10013  [pdf, other

    cs.CV

    Flow-based Generative Models for Learning Manifold to Manifold Mappings

    Authors: Xingjian Zhen, Rudrasis Chakraborty, Liu Yang, Vikas Singh

    Abstract: Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold data is quite sparse. Partly due to this gap, there… ▽ More

    Submitted 1 March, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: This paper has been accepted by AAAI 2021. A video introduction is on YouTube: https://youtu.be/0r96U0vXsCM The official GitHub repo is: https://github.com/zhenxingjian/Dual_Manifold_GLOW

  47. arXiv:2010.10341  [pdf, other

    cs.LG

    Learning to Learn Variational Semantic Memory

    Authors: Xiantong Zhen, Yingjun Du, Huan Xiong, Qiang Qiu, Cees G. M. Snoek, Ling Shao

    Abstract: In this paper, we introduce variational semantic memory into meta-learning to acquire long-term knowledge for few-shot learning. The variational semantic memory accrues and stores semantic information for the probabilistic inference of class prototypes in a hierarchical Bayesian framework. The semantic memory is grown from scratch and gradually consolidated by absorbing information from tasks it e… ▽ More

    Submitted 14 July, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: accepted to NeurIPS 2020; code is available in https://github.com/YDU-uva/VSM

  48. arXiv:2007.07645  [pdf, other

    cs.CV

    Learning to Learn with Variational Information Bottleneck for Domain Generalization

    Authors: Yingjun Du, Jun Xu, Huan Xiong, Qiang Qiu, Xiantong Zhen, Cees G. M. Snoek, Ling Shao

    Abstract: Domain generalization models learn to generalize to previously unseen domains, but suffer from prediction uncertainty and domain shift. In this paper, we address both problems. We introduce a probabilistic meta-learning model for domain generalization, in which classifier parameters shared across domains are modeled as distributions. This enables better handling of prediction uncertainty on unseen… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 15 pages, 4 figures, ECCV2020

  49. arXiv:2006.06707  [pdf, other

    cs.LG stat.ML

    Learning to Learn Kernels with Variational Random Features

    Authors: Xiantong Zhen, Haoliang Sun, Yingjun Du, Jun Xu, Yilong Yin, Ling Shao, Cees Snoek

    Abstract: In this work, we introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability. We propose meta variational random features (MetaVRF) to learn adaptive kernels for the base-learner, which is developed in a latent variable model by treating the random feature basis as the latent variable. We formulate the optimization of MetaVRF as… ▽ More

    Submitted 13 August, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: ICML'2020; code is available in: https://github.com/Yingjun-Du/MetaVRF

  50. Conditional Variational Image Deraining

    Authors: Ying-Jun Du, Jun Xu, Xian-Tong Zhen, Ming-Ming Cheng, Ling Shao

    Abstract: Image deraining is an important yet challenging image processing task. Though deterministic image deraining methods are developed with encouraging performance, they are infeasible to learn flexible representations for probabilistic inference and diverse predictions. Besides, rain intensity varies both in spatial locations and across color channels, making this task more difficult. In this paper, w… ▽ More

    Submitted 8 May, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: 14pages, 11 figures, 5 tables, newly accepted by TIP