Search | arXiv e-print repository

Subgroup Analysis via Model-based Rule Forest

Authors: I-Ling Cheng, Chan Hsu, Chantung Ku, Pei-Ju Lee, Yihuang Kang

Abstract: Machine learning models are often criticized for their black-box nature, raising concerns about their applicability in critical decision-making scenarios. Consequently, there is a growing demand for interpretable models in such contexts. In this study, we introduce Model-based Deep Rule Forests (mobDRF), an interpretable representation learning algorithm designed to extract transparent models from… ▽ More Machine learning models are often criticized for their black-box nature, raising concerns about their applicability in critical decision-making scenarios. Consequently, there is a growing demand for interpretable models in such contexts. In this study, we introduce Model-based Deep Rule Forests (mobDRF), an interpretable representation learning algorithm designed to extract transparent models from data. By leveraging IF-THEN rules with multi-level logic expressions, mobDRF enhances the interpretability of existing models without compromising accuracy. We apply mobDRF to identify key risk factors for cognitive decline in an elderly population, demonstrating its effectiveness in subgroup analysis and local model optimization. Our method offers a promising solution for developing trustworthy and interpretable machine learning models, particularly valuable in fields like healthcare, where understanding differential effects across patient subgroups can lead to more personalized and effective treatments. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2404.15293 [pdf, other]

Interactive Manipulation and Visualization of 3D Brain MRI for Surgical Training

Authors: Siddharth Jha, Zichen Gui, Benjamin Delbos, Richard Moreau, Arnaud Leleve, Irene Cheng

Abstract: In modern medical diagnostics, magnetic resonance imaging (MRI) is an important technique that provides detailed insights into anatomical structures. In this paper, we present a comprehensive methodology focusing on streamlining the segmentation, reconstruction, and visualization process of 3D MRI data. Segmentation involves the extraction of anatomical regions with the help of state-of-the-art de… ▽ More In modern medical diagnostics, magnetic resonance imaging (MRI) is an important technique that provides detailed insights into anatomical structures. In this paper, we present a comprehensive methodology focusing on streamlining the segmentation, reconstruction, and visualization process of 3D MRI data. Segmentation involves the extraction of anatomical regions with the help of state-of-the-art deep learning algorithms. Then, 3D reconstruction converts segmented data from the previous step into multiple 3D representations. Finally, the visualization stage provides efficient and interactive presentations of both 2D and 3D MRI data. Integrating these three steps, the proposed system is able to augment the interpretability of the anatomical information from MRI scans according to our interviews with doctors. Even though this system was originally designed and implemented as part of human brain haptic feedback simulation for surgeon training, it can also provide experienced medical practitioners with an effective tool for clinical data analysis, surgical planning and other purposes △ Less

Submitted 24 March, 2024; originally announced April 2024.

arXiv:2403.06107 [pdf, other]

Textureless Object Recognition: An Edge-based Approach

Authors: Frincy Clement, Kirtan Shah, Dhara Pancholi, Gabriel Lugo Bustillo, Irene Cheng

Abstract: Textureless object recognition has become a significant task in Computer Vision with the advent of Robotics and its applications in manufacturing sector. It has been challenging to obtain good accuracy in real time because of its lack of discriminative features and reflectance properties which makes the techniques for textured object recognition insufficient for textureless objects. A lot of work… ▽ More Textureless object recognition has become a significant task in Computer Vision with the advent of Robotics and its applications in manufacturing sector. It has been challenging to obtain good accuracy in real time because of its lack of discriminative features and reflectance properties which makes the techniques for textured object recognition insufficient for textureless objects. A lot of work has been done in the last 20 years, especially in the recent 5 years after the TLess and other textureless dataset were introduced. In this project, by applying image processing techniques we created a robust augmented dataset from initial imbalanced smaller dataset. We extracted edge features, feature combinations and RGB images enhanced with feature/feature combinations to create 15 datasets, each with a size of ~340,000. We then trained four classifiers on these 15 datasets to arrive at a conclusion as to which dataset performs the best overall and whether edge features are important for textureless objects. Based on our experiments and analysis, RGB images enhanced with combination of 3 edge features performed the best compared to all others. Model performance on dataset with HED edges performed comparatively better than other edge detectors like Canny or Prewitt. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:1910.14255

arXiv:2403.05658 [pdf, other]

Feature CAM: Interpretable AI in Image Classification

Authors: Frincy Clement, Ji Yang, Irene Cheng

Abstract: Deep Neural Networks have often been called the black box because of the complex, deep architecture and non-transparency presented by the inner layers. There is a lack of trust to use Artificial Intelligence in critical and high-precision fields such as security, finance, health, and manufacturing industries. A lot of focused work has been done to provide interpretable models, intending to deliver… ▽ More Deep Neural Networks have often been called the black box because of the complex, deep architecture and non-transparency presented by the inner layers. There is a lack of trust to use Artificial Intelligence in critical and high-precision fields such as security, finance, health, and manufacturing industries. A lot of focused work has been done to provide interpretable models, intending to deliver meaningful insights into the thoughts and behavior of neural networks. In our research, we compare the state-of-the-art methods in the Activation-based methods (ABM) for interpreting predictions of CNN models, specifically in the application of Image Classification. We then extend the same for eight CNN-based architectures to compare the differences in visualization and thus interpretability. We introduced a novel technique Feature CAM, which falls in the perturbation-activation combination, to create fine-grained, class-discriminative visualizations. The resulting saliency maps from our experiments proved to be 3-4 times better human interpretable than the state-of-the-art in ABM. At the same time it reserves machine interpretability, which is the average confidence scores in classification. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2203.00314 [pdf, other]

VScript: Controllable Script Generation with Visual Presentation

Authors: Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

Abstract: In order to offer a customized script tool and inspire professional scriptwriters, we present VScript. It is a controllable pipeline that generates complete scripts, including dialogues and scene descriptions, as well as presents visually using video retrieval. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of… ▽ More In order to offer a customized script tool and inspire professional scriptwriters, we present VScript. It is a controllable pipeline that generates complete scripts, including dialogues and scene descriptions, as well as presents visually using video retrieval. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of the generated script. We adopt a hierarchical structure, which first generates the plot, then the script and its visual presentation. A novel approach is also introduced to plot-guided dialogue generation by treating it as an inverse dialogue summarization. The experiment results show that our approach outperforms the baselines on both automatic and human evaluations, especially in genre control. △ Less

Submitted 13 October, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Journal ref: AACL Demo (2022)

arXiv:2102.03982 [pdf]

doi 10.1145/2996296

Subjective and Objective Visual Quality Assessment of Textured 3D Meshes

Authors: Jinjiang Guo, Vincent Vidal, Irene Cheng, Anup Basu, Atilla Baskurt, Guillaume Lavoue

Abstract: Objective visual quality assessment of 3D models is a fundamental issue in computer graphics. Quality assessment metrics may allow a wide range of processes to be guided and evaluated, such as level of detail creation, compression, filtering, and so on. Most computer graphics assets are composed of geometric surfaces on which several texture images can be mapped to 11 make the rendering more reali… ▽ More Objective visual quality assessment of 3D models is a fundamental issue in computer graphics. Quality assessment metrics may allow a wide range of processes to be guided and evaluated, such as level of detail creation, compression, filtering, and so on. Most computer graphics assets are composed of geometric surfaces on which several texture images can be mapped to 11 make the rendering more realistic. While some quality assessment metrics exist for geometric surfaces, almost no research has been conducted on the evaluation of texture-mapped 3D models. In this context, we present a new subjective study to evaluate the perceptual quality of textured meshes, based on a paired comparison protocol. We introduce both texture and geometry distortions on a set of 5 reference models to produce a database of 136 distorted models, evaluated using two rendering protocols. Based on analysis of the results, we propose two new metrics for visual quality assessment of textured mesh, as optimized linear combinations of accurate geometry and texture quality measurements. These proposed perceptual metrics outperform their counterparts in terms of correlation with human opinion. The database, along with the associated subjective scores, will be made publicly available online. △ Less

Submitted 7 February, 2021; originally announced February 2021.

arXiv:2010.03710 [pdf]

Topic Diffusion Discovery Based on Deep Non-negative Autoencoder

Authors: Sheng-Tai Huang, Yihuang Kang, Shao-Min Hung, Bowen Kuo, I-Ling Cheng

Abstract: Researchers have been overwhelmed by the explosion of research articles published by various research communities. Many research scholarly websites, search engines, and digital libraries have been created to help researchers identify potential research topics and keep up with recent progress on research of interests. However, it is still difficult for researchers to keep track of the research topi… ▽ More Researchers have been overwhelmed by the explosion of research articles published by various research communities. Many research scholarly websites, search engines, and digital libraries have been created to help researchers identify potential research topics and keep up with recent progress on research of interests. However, it is still difficult for researchers to keep track of the research topic diffusion and evolution without spending a large amount of time reviewing numerous relevant and irrelevant articles. In this paper, we consider a novel topic diffusion discovery technique. Specifically, we propose using a Deep Non-negative Autoencoder with information divergence measurement that monitors evolutionary distance of the topic diffusion to understand how research topics change with time. The experimental results show that the proposed approach is able to identify the evolution of research topics as well as to discover topic diffusions in online fashions. △ Less

Submitted 7 October, 2020; originally announced October 2020.

arXiv:2007.12496 [pdf, other]

Parkinson's Disease Detection with Ensemble Architectures based on ILSVRC Models

Authors: Tahjid Ashfaque Mostafa, Irene Cheng

Abstract: In this work, we explore various neural network architectures using Magnetic Resonance (MR) T1 images of the brain to identify Parkinson's Disease (PD), which is one of the most common neurodegenerative and movement disorders. We propose three ensemble architectures combining some winning Convolutional Neural Network models of ImageNet Large Scale Visual Recognition Challenge (ILSVRC). All of our… ▽ More In this work, we explore various neural network architectures using Magnetic Resonance (MR) T1 images of the brain to identify Parkinson's Disease (PD), which is one of the most common neurodegenerative and movement disorders. We propose three ensemble architectures combining some winning Convolutional Neural Network models of ImageNet Large Scale Visual Recognition Challenge (ILSVRC). All of our proposed architectures outperform existing approaches to detect PD from MR images, achieving upto 95\% detection accuracy. We also find that when we construct our ensemble architecture using models pretrained on the ImageNet dataset unrelated to PD, the detection performance is significantly better compared to models without any prior training. Our finding suggests a promising direction when no or insufficient training data is available. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.00682

arXiv:2007.00682 [pdf, other]

Parkinson's Disease Detection Using Ensemble Architecture from MR Images

Authors: Tahjid Ashfaque Mostafa, Irene Cheng

Abstract: Parkinson's Disease(PD) is one of the major nervous system disorders that affect people over 60. PD can cause cognitive impairments. In this work, we explore various approaches to identify Parkinson's using Magnetic Resonance (MR) T1 images of the brain. We experiment with ensemble architectures combining some winning Convolutional Neural Network models of ImageNet Large Scale Visual Recognition C… ▽ More Parkinson's Disease(PD) is one of the major nervous system disorders that affect people over 60. PD can cause cognitive impairments. In this work, we explore various approaches to identify Parkinson's using Magnetic Resonance (MR) T1 images of the brain. We experiment with ensemble architectures combining some winning Convolutional Neural Network models of ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and propose two architectures. We find that detection accuracy increases drastically when we focus on the Gray Matter (GM) and White Matter (WM) regions from the MR images instead of using whole MR images. We achieved an average accuracy of 94.7\% using smoothed GM and WM extracts and one of our proposed architectures. We also perform occlusion analysis and determine which brain areas are relevant in the architecture decision making process. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2001.09631 [pdf, other]

doi 10.1109/LGRS.2020.3010504

An Unsupervised Generative Neural Approach for InSAR Phase Filtering and Coherence Estimation

Authors: Subhayan Mukherjee, Aaron Zimmer, Xinyao Sun, Parwant Ghuman, Irene Cheng

Abstract: Phase filtering and pixel quality (coherence) estimation is critical in producing Digital Elevation Models (DEMs) from Interferometric Synthetic Aperture Radar (InSAR) images, as it removes spatial inconsistencies (residues) and immensely improves the subsequent unwrapping. Large amount of InSAR data facilitates Wide Area Monitoring (WAM) over geographical regions. Advances in parallel computing h… ▽ More Phase filtering and pixel quality (coherence) estimation is critical in producing Digital Elevation Models (DEMs) from Interferometric Synthetic Aperture Radar (InSAR) images, as it removes spatial inconsistencies (residues) and immensely improves the subsequent unwrapping. Large amount of InSAR data facilitates Wide Area Monitoring (WAM) over geographical regions. Advances in parallel computing have accelerated Convolutional Neural Networks (CNNs), giving them advantages over human performance on visual pattern recognition, which makes CNNs a good choice for WAM. Nevertheless, this research is largely unexplored. We thus propose "GenInSAR", a CNN-based generative model for joint phase filtering and coherence estimation, that directly learns the InSAR data distribution. GenInSAR's unsupervised training on satellite and simulated noisy InSAR images outperforms other five related methods in total residue reduction (over 16.5% better on average) with less over-smoothing/artefacts around branch cuts. GenInSAR's Phase, and Coherence Root-Mean-Squared-Error and Phase Cosine Error have average improvements of 0.54, 0.07, and 0.05 respectively compared to the related methods. △ Less

Submitted 9 August, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

Comments: to be published in a future issue of IEEE Geoscience and Remote Sensing Letters

arXiv:2001.06983 [pdf]

doi 10.1007/978-3-030-04375-9_17

Adaptive Dithering Using Curved Markov-Gaussian Noise in the Quantized Domain for Mapping SDR to HDR Image

Authors: Subhayan Mukherjee, Guan-Ming Su, Irene Cheng

Abstract: High Dynamic Range (HDR) imaging is gaining increased attention due to its realistic content, for not only regular displays but also smartphones. Before sufficient HDR content is distributed, HDR visualization still relies mostly on converting Standard Dynamic Range (SDR) content. SDR images are often quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video transmission. Quant… ▽ More High Dynamic Range (HDR) imaging is gaining increased attention due to its realistic content, for not only regular displays but also smartphones. Before sufficient HDR content is distributed, HDR visualization still relies mostly on converting Standard Dynamic Range (SDR) content. SDR images are often quantized, or bit depth reduced, before SDR-to-HDR conversion, e.g. for video transmission. Quantization can easily lead to banding artefacts. In some computing and/or memory I/O limited environment, the traditional solution using spatial neighborhood information is not feasible. Our method includes noise generation (offline) and noise injection (online), and operates on pixels of the quantized image. We vary the magnitude and structure of the noise pattern adaptively based on the luma of the quantized pixel and the slope of the inverse-tone mapping function. Subjective user evaluations confirm the superior performance of our technique. △ Less

Submitted 20 January, 2020; originally announced January 2020.

Comments: 2018 International Conference on Smart Multimedia

arXiv:2001.06961 [pdf, other]

doi 10.1007/978-3-030-27202-9_10

CNN-Based Real-Time Parameter Tuning for Optimizing Denoising Filter Performance

Authors: Subhayan Mukherjee, Navaneeth Kamballur Kottayil, Xinyao Sun, Irene Cheng

Abstract: We propose a novel direction to improve the denoising quality of filtering-based denoising algorithms in real time by predicting the best filter parameter value using a Convolutional Neural Network (CNN). We take the use case of BM3D, the state-of-the-art filtering-based denoising algorithm, to demonstrate and validate our approach. We propose and train a simple, shallow CNN to predict in real tim… ▽ More We propose a novel direction to improve the denoising quality of filtering-based denoising algorithms in real time by predicting the best filter parameter value using a Convolutional Neural Network (CNN). We take the use case of BM3D, the state-of-the-art filtering-based denoising algorithm, to demonstrate and validate our approach. We propose and train a simple, shallow CNN to predict in real time, the optimum filter parameter value, given the input noisy image. Each training example consists of a noisy input image (training data) and the filter parameter value that produces the best output (training label). Both qualitative and quantitative results using the widely used PSNR and SSIM metrics on the popular BSD68 dataset show that the CNN-guided BM3D outperforms the original, unguided BM3D across different noise levels. Thus, our proposed method is a CNN-based improvement on the original BM3D which uses a fixed, default parameter value for all images. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: 2019 International Conference on Image Analysis and Recognition

arXiv:2001.06956 [pdf]

doi 10.1109/ICSENS.2018.8589742

CNN-based InSAR Coherence Classification

Authors: Subhayan Mukherjee, Aaron Zimmer, Xinyao Sun, Parwant Ghuman, Irene Cheng

Abstract: Interferometric Synthetic Aperture Radar (InSAR) imagery based on microwaves reflected off ground targets is becoming increasingly important in remote sensing for ground movement estimation. However, the reflections are contaminated by noise, which distorts the signal's wrapped phase. Demarcation of image regions based on degree of contamination ("coherence") is an important component of the InSAR… ▽ More Interferometric Synthetic Aperture Radar (InSAR) imagery based on microwaves reflected off ground targets is becoming increasingly important in remote sensing for ground movement estimation. However, the reflections are contaminated by noise, which distorts the signal's wrapped phase. Demarcation of image regions based on degree of contamination ("coherence") is an important component of the InSAR processing pipeline. We introduce Convolutional Neural Networks (CNNs) to this problem domain and show their effectiveness in improving coherence-based demarcation and reducing misclassifications in completely incoherent regions through intelligent preprocessing of training data. Quantitative and qualitative comparisons prove superiority of proposed method over three established methods. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: 2018 IEEE SENSORS

arXiv:2001.06954 [pdf]

doi 10.1109/ICSENS.2018.8589920

CNN-based InSAR Denoising and Coherence Metric

Authors: Subhayan Mukherjee, Aaron Zimmer, Navaneeth Kamballur Kottayil, Xinyao Sun, Parwant Ghuman, Irene Cheng

Abstract: Interferometric Synthetic Aperture Radar (InSAR) imagery for estimating ground movement, based on microwaves reflected off ground targets is gaining increasing importance in remote sensing. However, noise corrupts microwave reflections received at satellite and contaminates the signal's wrapped phase. We introduce Convolutional Neural Networks (CNNs) to this problem domain and show the effectivene… ▽ More Interferometric Synthetic Aperture Radar (InSAR) imagery for estimating ground movement, based on microwaves reflected off ground targets is gaining increasing importance in remote sensing. However, noise corrupts microwave reflections received at satellite and contaminates the signal's wrapped phase. We introduce Convolutional Neural Networks (CNNs) to this problem domain and show the effectiveness of autoencoder CNN architectures to learn InSAR image denoising filters in the absence of clean ground truth images, and for artefact reduction in estimated coherence through intelligent preprocessing of training data. We compare our results with four established methods to illustrate superiority of proposed method. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: 2018 IEEE SENSORS

arXiv:1911.11903 [pdf, other]

doi 10.1007/978-3-030-54407-2_8

Potential of deep features for opinion-unaware, distortion-unaware, no-reference image quality assessment

Authors: Subhayan Mukherjee, Giuseppe Valenzise, Irene Cheng

Abstract: Image Quality Assessment algorithms predict a quality score for a pristine or distorted input image, such that it correlates with human opinion. Traditional methods required a non-distorted "reference" version of the input image to compare with, in order to predict this score. However, recent "No-reference" methods circumvent this requirement by modelling the distribution of clean image features,… ▽ More Image Quality Assessment algorithms predict a quality score for a pristine or distorted input image, such that it correlates with human opinion. Traditional methods required a non-distorted "reference" version of the input image to compare with, in order to predict this score. However, recent "No-reference" methods circumvent this requirement by modelling the distribution of clean image features, thereby making them more suitable for practical use. However, majority of such methods either use hand-crafted features or require training on human opinion scores (supervised learning), which are difficult to obtain and standardise. We explore the possibility of using deep features instead, particularly, the encoded (bottleneck) feature maps of a Convolutional Autoencoder neural network architecture. Also, we do not train the network on subjective scores (unsupervised learning). The primary requirements for an IQA method are monotonic increase in predicted scores with increasing degree of input image distortion, and consistent ranking of images with the same distortion type and content, but different distortion levels. Quantitative experiments using the Pearson, Kendall and Spearman correlation scores on a diverse set of images show that our proposed method meets the above requirements better than the state-of-art method (which uses hand-crafted features) for three types of distortions: blurring, noise and compression artefacts. This demonstrates the potential for future research in this relatively unexplored sub-area within IQA. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: International Conference on Smart Multimedia (Springer), 16-18 December 2019, San Diego, California, USA

arXiv:1909.03120 [pdf, other]

DeepInSAR: A Deep Learning Framework for SAR Interferometric Phase Restoration and Coherence Estimation

Authors: Xinyao Sun, Aaron Zimmer, Subhayan Mukherjee, Navaneeth Kamballur Kottayil, Parwant Ghuman, Irene Cheng

Abstract: Over the past decade, Interferometric Synthetic Aperture Radar (InSAR) has become a successful remote sensing technique. However, during the acquisition step, microwave reflections received at satellite are usually disturbed by strong noise, leading to a noisy single-look complex (SLC) SAR image. The quality of their interferometric phase is even worse. InSAR phase filtering is an ill-posed proble… ▽ More Over the past decade, Interferometric Synthetic Aperture Radar (InSAR) has become a successful remote sensing technique. However, during the acquisition step, microwave reflections received at satellite are usually disturbed by strong noise, leading to a noisy single-look complex (SLC) SAR image. The quality of their interferometric phase is even worse. InSAR phase filtering is an ill-posed problem and plays a key role in subsequent processing. However, most of existing methods usually require expert supervision or heavy runtime, which limits the usability and scalability for practical usages such as wide-area monitoring and forecasting. In this work, we propose a deep convolutional neural network (CNN) based model DeepInSAR to intelligently solve both the phase filtering and coherence estimation problems. We demonstrate our DeepInSAR using both simulated and real data. A teacher-student framework is proposed to deal with the issue that there is no ground truth sample for real-world InSAR data. Quantitative and qualitative comparisons show that DeepInSAR achieves comparable or even better results than its stacked-based teacher method on new test datasets but requiring fewer pairs of SLCs as well as outperforms three other established non-stack based methods with less running time and no human supervision. △ Less

Submitted 27 May, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

Comments: 19 pages

arXiv:1907.06333 [pdf, ps, other]

Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-trained Language Models

Authors: Sedrick Scott Keh, I-Tsun Cheng

Abstract: The Myers-Briggs Type Indicator (MBTI) is a popular personality metric that uses four dichotomies as indicators of personality traits. This paper examines the use of pre-trained language models to predict MBTI personality types based on scraped labeled texts. The proposed model reaches an accuracy of $0.47$ for correctly predicting all 4 types and $0.86$ for correctly predicting at least 2 types.… ▽ More The Myers-Briggs Type Indicator (MBTI) is a popular personality metric that uses four dichotomies as indicators of personality traits. This paper examines the use of pre-trained language models to predict MBTI personality types based on scraped labeled texts. The proposed model reaches an accuracy of $0.47$ for correctly predicting all 4 types and $0.86$ for correctly predicting at least 2 types. Furthermore, we investigate the possible uses of a fine-tuned BERT model for personality-specific language generation. This is a task essential for both modern psychology and for intelligent empathetic systems. △ Less

Submitted 15 July, 2019; originally announced July 2019.

arXiv:1907.01723 [pdf]

Towards Interpretable Deep Extreme Multi-label Learning

Authors: Yihuang Kang, I-Ling Cheng, Wenjui Mao, Bowen Kuo, Pei-Ju Lee

Abstract: Many Machine Learning algorithms, such as deep neural networks, have long been criticized for being "black-boxes"-a kind of models unable to provide how it arrive at a decision without further efforts to interpret. This problem has raised concerns on model applications' trust, safety, nondiscrimination, and other ethical issues. In this paper, we discuss the machine learning interpretability of a… ▽ More Many Machine Learning algorithms, such as deep neural networks, have long been criticized for being "black-boxes"-a kind of models unable to provide how it arrive at a decision without further efforts to interpret. This problem has raised concerns on model applications' trust, safety, nondiscrimination, and other ethical issues. In this paper, we discuss the machine learning interpretability of a real-world application, eXtreme Multi-label Learning (XML), which involves learning models from annotated data with many pre-defined labels. We propose a two-step XML approach that combines deep non-negative autoencoder with other multi-label classifiers to tackle different data applications with a large number of labels. Our experimental result shows that the proposed approach is able to cope with many-label problems as well as to provide interpretable label hierarchies and dependencies that helps us understand how the model recognizes the existences of objects in an image. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 6 pages

arXiv:1905.00469 [pdf]

Fully Automatic Brain Tumor Segmentation using a Normalized Gaussian Bayesian Classifier and 3D Fluid Vector Flow

Authors: Tao Wang, Irene Cheng, Anup Basu

Abstract: Brain tumor segmentation from Magnetic Resonance Images (MRIs) is an important task to measure tumor responses to treatments. However, automatic segmentation is very challenging. This paper presents an automatic brain tumor segmentation method based on a Normalized Gaussian Bayesian classification and a new 3D Fluid Vector Flow (FVF) algorithm. In our method, a Normalized Gaussian Mixture Model (N… ▽ More Brain tumor segmentation from Magnetic Resonance Images (MRIs) is an important task to measure tumor responses to treatments. However, automatic segmentation is very challenging. This paper presents an automatic brain tumor segmentation method based on a Normalized Gaussian Bayesian classification and a new 3D Fluid Vector Flow (FVF) algorithm. In our method, a Normalized Gaussian Mixture Model (NGMM) is proposed and used to model the healthy brain tissues. Gaussian Bayesian Classifier is exploited to acquire a Gaussian Bayesian Brain Map (GBBM) from the test brain MR images. GBBM is further processed to initialize the 3D FVF algorithm, which segments the brain tumor. This algorithm has two major contributions. First, we present a NGMM to model healthy brains. Second, we extend our 2D FVF algorithm to 3D space and use it for brain tumor segmentation. The proposed method is validated on a publicly available dataset. △ Less

Submitted 1 May, 2019; originally announced May 2019.

Comments: ICIP 2010

arXiv:1807.06604 [pdf, ps, other]

doi 10.1007/s11517-018-1829-9

A Fast Segmentation-free Fully Automated Approach to White Matter Injury Detection in Preterm Infants

Authors: Subhayan Mukherjee, Irene Cheng, Steven Miller, Jessie Guo, Vann Chau, Anup Basu

Abstract: White Matter Injury (WMI) is the most prevalent brain injury in the preterm neonate leading to developmental deficits. However, detecting WMI in Magnetic Resonance (MR) images of preterm neonate brains using traditional WM segmentation-based methods is difficult mainly due to lack of reliable preterm neonate brain atlases to guide segmentation. Hence, we propose a segmentation-free, fast, unsuperv… ▽ More White Matter Injury (WMI) is the most prevalent brain injury in the preterm neonate leading to developmental deficits. However, detecting WMI in Magnetic Resonance (MR) images of preterm neonate brains using traditional WM segmentation-based methods is difficult mainly due to lack of reliable preterm neonate brain atlases to guide segmentation. Hence, we propose a segmentation-free, fast, unsupervised, atlas-free WMI detection method. We detect the ventricles as blobs using a fast linear Maximally Stable Extremal Regions algorithm. A reference contour equidistant from the blobs and the brain-background boundary is used to identify tissue adjacent to the blobs. Assuming normal distribution of the gray-value intensity of this tissue, the outlier intensities in the entire brain region are identified as potential WMI candidates. Thereafter, false positives are discriminated using appropriate heuristics. Experiments using an expert-annotated dataset show that the proposed method runs 20 times faster than our earlier work which relied on time-consuming segmentation of the WM region, without compromising WMI detection accuracy. △ Less

Submitted 17 July, 2018; originally announced July 2018.

Journal ref: Medical and Biological Engineering and Computing (Springer), 2018

arXiv:1807.04386 [pdf]

Topic Diffusion Discovery based on Sparseness-constrained Non-negative Matrix Factorization

Authors: Yihuang Kang, Keng-Pei Lin, I-Ling Cheng

Abstract: Due to recent explosion of text data, researchers have been overwhelmed by ever-increasing volume of articles produced by different research communities. Various scholarly search websites, citation recommendation engines, and research databases have been created to simplify the text search tasks. However, it is still difficult for researchers to be able to identify potential research topics withou… ▽ More Due to recent explosion of text data, researchers have been overwhelmed by ever-increasing volume of articles produced by different research communities. Various scholarly search websites, citation recommendation engines, and research databases have been created to simplify the text search tasks. However, it is still difficult for researchers to be able to identify potential research topics without doing intensive reviews on a tremendous number of articles published by journals, conferences, meetings, and workshops. In this paper, we consider a novel topic diffusion discovery technique that incorporates sparseness-constrained Non-negative Matrix Factorization with generalized Jensen-Shannon divergence to help understand term-topic evolutions and identify topic diffusions. Our experimental result shows that this approach can extract more prominent topics from large article databases, visualize relationships between terms of interest and abstract topics, and further help researchers understand whether given terms/topics have been widely explored or whether new topics are emerging from literature. △ Less

Submitted 11 July, 2018; originally announced July 2018.

arXiv:1806.07489 [pdf, other]

Towards the identification of Parkinson's Disease using only T1 MR Images

Authors: Sara Soltaninejad, Irene Cheng, Anup Basu

Abstract: Parkinson's Disease (PD) is one of the most common types of neurological diseases caused by progressive degeneration of dopamin- ergic neurons in the brain. Even though there is no fixed cure for this neurodegenerative disease, earlier diagnosis followed by earlier treatment can help patients have a better quality of life. Magnetic Resonance Imag- ing (MRI) has been one of the most popular diagnos… ▽ More Parkinson's Disease (PD) is one of the most common types of neurological diseases caused by progressive degeneration of dopamin- ergic neurons in the brain. Even though there is no fixed cure for this neurodegenerative disease, earlier diagnosis followed by earlier treatment can help patients have a better quality of life. Magnetic Resonance Imag- ing (MRI) has been one of the most popular diagnostic tool in recent years because it avoids harmful radiations. In this paper, we investi- gate the plausibility of using MRIs for automatically diagnosing PD. Our proposed method has three main steps : 1) Preprocessing, 2) Fea- ture Extraction, and 3) Classification. The FreeSurfer library is used for the first and the second steps. For classification, three main types of classifiers, including Logistic Regression (LR), Random Forest (RF) and Support Vector Machine (SVM), are applied and their classification abil- ity is compared. The Parkinsons Progression Markers Initiative (PPMI) data set is used to evaluate the proposed method. The proposed system prove to be promising in assisting the diagnosis of PD. △ Less

Submitted 19 June, 2018; originally announced June 2018.

Comments: ICSM 2018

arXiv:1806.03695 [pdf, other]

doi 10.1016/j.ultras.2017.11.020

Segmentation of Arterial Walls in Intravascular Ultrasound Cross-Sectional Images Using Extremal Region Selection

Authors: Mehdi Faraji, Irene Cheng, Iris Naudin, Anup Basu

Abstract: Intravascular Ultrasound (IVUS) is an intra-operative imaging modality that facilitates observing and appraising the vessel wall structure of the human coronary arteries. Segmentation of arterial wall boundaries from the IVUS images is not only crucial for quantitative analysis of the vessel walls and plaque characteristics, but is also necessary for generating 3D reconstructed models of the arter… ▽ More Intravascular Ultrasound (IVUS) is an intra-operative imaging modality that facilitates observing and appraising the vessel wall structure of the human coronary arteries. Segmentation of arterial wall boundaries from the IVUS images is not only crucial for quantitative analysis of the vessel walls and plaque characteristics, but is also necessary for generating 3D reconstructed models of the artery. The aim of this study is twofold. Firstly, we investigate the feasibility of using a recently proposed region detector, namely Extremal Region of Extremum Level (EREL) to delineate the luminal and media-adventitia borders in IVUS frames acquired by 20 MHz probes. Secondly, we propose a region selection strategy to label two ERELs as lumen and media based on the stability of their textural information. We extensively evaluated our selection strategy on the test set of a standard publicly available dataset containing 326 IVUS B-mode images. We showed that in the best case, the average Hausdorff Distances (HD) between the extracted ERELs and the actual lumen and media were $0.22$ mm and $0.45$ mm, respectively. The results of our experiments revealed that our selection strategy was able to segment the lumen with $\le 0.3$ mm HD to the gold standard even though the images contained major artifacts such as bifurcations, shadows, and side branches. Moreover, when there was no artifact, our proposed method was able to delineate media-adventitia boundaries with $0.31$ mm HD to the gold standard. Furthermore, our proposed segmentation method runs in time that is linear in the number of pixels in each frame. Based on the results of this work, by using a 20 MHz IVUS probe with controlled pullback, not only can we now analyze the internal structure of human arteries more accurately, but also segment each frame during the pullback procedure because of the low run time of our proposed segmentation method. △ Less

Submitted 10 June, 2018; originally announced June 2018.

Comments: 15 pages, 5 figures, published in Elsevier Ultrasonics

arXiv:1803.04053 [pdf, other]

Learning Local Distortion Visibility From Image Quality Data-sets

Authors: Navaneeth Kamballur Kottayil, Giuseppe Valenzise, Frederic Dufaux, Irene Cheng

Abstract: Accurate prediction of local distortion visibility thresholds is critical in many image and video processing applications. Existing methods require an accurate modeling of the human visual system, and are derived through pshycophysical experiments with simple, artificial stimuli. These approaches, however, are difficult to generalize to natural images with complex types of distortion. In this pape… ▽ More Accurate prediction of local distortion visibility thresholds is critical in many image and video processing applications. Existing methods require an accurate modeling of the human visual system, and are derived through pshycophysical experiments with simple, artificial stimuli. These approaches, however, are difficult to generalize to natural images with complex types of distortion. In this paper, we explore a different perspective, and we investigate whether it is possible to learn local distortion visibility from image quality scores. We propose a convolutional neural network based optimization framework to infer local detection thresholds in a distorted image. Our model is trained on multiple quality datasets, and the results are correlated with empirical visibility thresholds collected on complex stimuli in a recent study. Our results are comparable to state-of-the-art mathematical models that were trained on phsycovisual data directly. This suggests that it is possible to predict psychophysical phenomena from visibility information embedded in image quality scores. △ Less

Submitted 11 March, 2018; originally announced March 2018.

arXiv:1712.07269 [pdf, other]

doi 10.1109/TIP.2017.2778570

Blind High Dynamic Range Quality estimation by disentangling perceptual and noise features in images

Authors: Navaneeth Kamballur Kottayil, Giuseppe Valenzise, Frederic Dufaux, Irene Cheng

Abstract: Assessing the visual quality of High Dynamic Range (HDR) images is an unexplored and an interesting research topic that has become relevant with the current boom in HDR technology. We propose a new convolutional neural network based model for No reference image quality assessment(NR-IQA) on HDR data. This model predicts the amount and location of noise, perceptual influence of image pixels on the… ▽ More Assessing the visual quality of High Dynamic Range (HDR) images is an unexplored and an interesting research topic that has become relevant with the current boom in HDR technology. We propose a new convolutional neural network based model for No reference image quality assessment(NR-IQA) on HDR data. This model predicts the amount and location of noise, perceptual influence of image pixels on the noise, and the perceived quality, of a distorted image without any reference image. The proposed model extracts numerical values corresponding to the noise present in any given distorted image, and the perceptual effects exhibited by a human eye when presented with the same. These two measures are extracted separately yet sequentially and combined in a mixing function to compute the quality of the distorted image perceived by a human eye. Our training process derives the the component that computes perceptual effects from a real world image quality dataset, rather than using results of psycovisual experiments. With the proposed model, we demonstrate state of the art performance for HDR NR-IQA and our results show performance similar to HDR Full Reference Image Quality Assessment algorithms (FR-IQA). △ Less

Submitted 19 December, 2017; originally announced December 2017.

arXiv:1712.00048 [pdf, other]

doi 10.1109/EMBC.2016.7591611

Investigation of Gaze Patterns in Multi View Laparoscopic Surgery

Authors: Navaneeth Kamballur Kottayil, Rositsa Bogdanova, Irene Cheng, Anup Basu, Bin Zheng

Abstract: Laparoscopic Surgery (LS) is a modern surgical technique whereby the surgery is performed through an incision with tools and camera as opposed to conventional open surgery. This promises minimal recovery times and less hemorrhaging. Multi view LS is the latest development in the field, where the system uses multiple cameras to give the surgeon more information about the surgical site, potentially… ▽ More Laparoscopic Surgery (LS) is a modern surgical technique whereby the surgery is performed through an incision with tools and camera as opposed to conventional open surgery. This promises minimal recovery times and less hemorrhaging. Multi view LS is the latest development in the field, where the system uses multiple cameras to give the surgeon more information about the surgical site, potentially making the surgery easier. In this publication, we study the gaze patterns of a high performing subject in a multi-view LS environment and compare it with that of a novice to detect the differences between the gaze behavior. This was done by conducting a user study with 20 university students with varying levels of expertise in Multi-view LS. The subjects performed an laparoscopic task in simulation with three cameras (front/top/side). The subjects were then separated as high and low performers depending on the performance times and their data was analyzed. Our results show statistically significant differences between the two behaviors. This opens up new areas from of training novices to Multi-view LS to making smart displays that guide your shows the optimum view depending on the situation. △ Less

Submitted 30 November, 2017; originally announced December 2017.

Journal ref: 38th Annual International Conference of the IEEE EMBC, Orlando, FL, 2016, pp. 4031-4034

arXiv:1712.00043 [pdf, other]

doi 10.1007/s11760-016-0873-x

A Color Intensity Invariant Low Level Feature Optimization Framework for Image Quality Assessment

Authors: Navaneeth K. Kottayil, Irene Cheng, Frederic Dufaux, Anup Basu

Abstract: Image Quality Assessment (IQA) algorithms evaluate the perceptual quality of an image using evaluation scores that assess the similarity or difference between two images. We propose a new low-level feature based IQA technique, which applies filter-bank decomposition and center-surround methodology. Differing from existing methods, our model incorporates color intensity adaptation and frequency sca… ▽ More Image Quality Assessment (IQA) algorithms evaluate the perceptual quality of an image using evaluation scores that assess the similarity or difference between two images. We propose a new low-level feature based IQA technique, which applies filter-bank decomposition and center-surround methodology. Differing from existing methods, our model incorporates color intensity adaptation and frequency scaling optimization at each filter-bank level and spatial orientation to extract and enhance perceptually significant features. Our computational model exploits the concept of object detection and encapsulates characteristics proposed in other IQA algorithms in a unified architecture. We also propose a systematic approach to review the evolution of IQA algorithms using unbiased test datasets, instead of looking at individual scores in isolation. Experimental results demonstrate the feasibility of our approach. △ Less

Submitted 30 November, 2017; originally announced December 2017.

Journal ref: Signal, Image and Video Processing 10.6 (2016):1169-1176

arXiv:1711.10515 [pdf, other]

doi 10.1109/ICIP.2016.7532308

Highlighting objects of interest in an image by integrating saliency and depth

Authors: Subhayan Mukherjee, Irene Cheng, Anup Basu

Abstract: Stereo images have been captured primarily for 3D reconstruction in the past. However, the depth information acquired from stereo can also be used along with saliency to highlight certain objects in a scene. This approach can be used to make still images more interesting to look at, and highlight objects of interest in the scene. We introduce this novel direction in this paper, and discuss the the… ▽ More Stereo images have been captured primarily for 3D reconstruction in the past. However, the depth information acquired from stereo can also be used along with saliency to highlight certain objects in a scene. This approach can be used to make still images more interesting to look at, and highlight objects of interest in the scene. We introduce this novel direction in this paper, and discuss the theoretical framework behind the approach. Even though we use depth from stereo in this work, our approach is applicable to depth data acquired from any sensor modality. Experimental results on both indoor and outdoor scenes demonstrate the benefits of our algorithm. △ Less

Submitted 28 November, 2017; originally announced November 2017.

arXiv:1711.10412 [pdf]

doi 10.1109/IVMSPW.2016.7528177

Entropy-difference based stereo error detection

Authors: Subhayan Mukherjee, Irene Cheng, Ram Mohana Reddy Guddeti, Anup Basu

Abstract: Stereo depth estimation is error-prone; hence, effective error detection methods are desirable. Most such existing methods depend on characteristics of the stereo matching cost curve, making them unduly dependent on functional details of the matching algorithm. As a remedy, we propose a novel error detection approach based solely on the input image and its depth map. Our assumption is that, entrop… ▽ More Stereo depth estimation is error-prone; hence, effective error detection methods are desirable. Most such existing methods depend on characteristics of the stereo matching cost curve, making them unduly dependent on functional details of the matching algorithm. As a remedy, we propose a novel error detection approach based solely on the input image and its depth map. Our assumption is that, entropy of any point on an image will be significantly higher than the entropy of its corresponding point on the image's depth map. In this paper, we propose a confidence measure, Entropy-Difference (ED) for stereo depth estimates and a binary classification method to identify incorrect depths. Experiments on the Middlebury dataset show the effectiveness of our method. Our proposed stereo confidence measure outperforms 17 existing measures in all aspects except occlusion detection. Established metrics such as precision, accuracy, recall, and area-under-curve are used to demonstrate the effectiveness of our method. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Showing 1–29 of 29 results for author: Cheng, I