-
Time-Ordered Ad-hoc Resource Sharing for Independent Robotic Agents
Authors:
Arjo Chakravarty,
Michael X. Grey,
M. A. Viraj J. Muthugala,
Mohan Rajesh Elara
Abstract:
Resource sharing is a crucial part of a multi-robot system. We propose a Boolean satisfiability based approach to resource sharing. Our key contributions are an algorithm for converting any constrained assignment to a weighted-SAT based optimization. We propose a theorem that allows optimal resource assignment problems to be solved via repeated application of a SAT solver. Additionally we show a w…
▽ More
Resource sharing is a crucial part of a multi-robot system. We propose a Boolean satisfiability based approach to resource sharing. Our key contributions are an algorithm for converting any constrained assignment to a weighted-SAT based optimization. We propose a theorem that allows optimal resource assignment problems to be solved via repeated application of a SAT solver. Additionally we show a way to encode continuous time ordering constraints using Conjunctive Normal Form (CNF). We benchmark our new algorithms and show that they can be used in an ad-hoc setting. We test our algorithms on a fleet of simulated and real world robots and show that the algorithms are able to handle real world situations. Our algorithms and test harnesses are opensource and build on Open-RMFs fleet management system.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Time-Equivariant Contrastive Learning for Degenerative Disease Progression in Retinal OCT
Authors:
Taha Emre,
Arunava Chakravarty,
Dmitrii Lachinov,
Antoine Rivail,
Ursula Schmidt-Erfurth,
Hrvoje Bogunović
Abstract:
Contrastive pretraining provides robust representations by ensuring their invariance to different image transformations while simultaneously preventing representational collapse. Equivariant contrastive learning, on the other hand, provides representations sensitive to specific image transformations while remaining invariant to others. By introducing equivariance to time-induced transformations, s…
▽ More
Contrastive pretraining provides robust representations by ensuring their invariance to different image transformations while simultaneously preventing representational collapse. Equivariant contrastive learning, on the other hand, provides representations sensitive to specific image transformations while remaining invariant to others. By introducing equivariance to time-induced transformations, such as disease-related anatomical changes in longitudinal imaging, the model can effectively capture such changes in the representation space. In this work, we pro-pose a Time-equivariant Contrastive Learning (TC) method. First, an encoder embeds two unlabeled scans from different time points of the same patient into the representation space. Next, a temporal equivariance module is trained to predict the representation of a later visit based on the representation from one of the previous visits and the corresponding time interval with a novel regularization loss term while preserving the invariance property to irrelevant image transformations. On a large longitudinal dataset, our model clearly outperforms existing equivariant contrastive methods in predicting progression from intermediate age-related macular degeneration (AMD) to advanced wet-AMD within a specified time-window.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Authors:
Aditya Chakravarty
Abstract:
Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enha…
▽ More
Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enhances privacy, boosts performance, and promotes sustainability by effectively balancing energy use and accuracy for specific applications. This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs. We found that changing precision from fp32 to fp16 halves the energy consumption for audio transcription across different models, with minimal performance degradation. A larger model size and number of parameters neither guarantees better resilience to noise, nor predicts the energy consumption for a given transcription load. These, along with several other findings offer novel insights for optimizing ASR systems within energy- and memory-limited environments, crucial for the development of efficient on-device ASR solutions. The code and input data needed to reproduce the results in this article are open sourced are available on [https://github.com/zzadiues3338/ASR-energy-jetson].
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
3DTINC: Time-Equivariant Non-Contrastive Learning for Predicting Disease Progression from Longitudinal OCTs
Authors:
Taha Emre,
Arunava Chakravarty,
Antoine Rivail,
Dmitrii Lachinov,
Oliver Leingang,
Sophie Riedl,
Julia Mai,
Hendrik P. N. Scholl,
Sobha Sivaprasad,
Daniel Rueckert,
Andrew Lotery,
Ursula Schmidt-Erfurth,
Hrvoje Bogunović
Abstract:
Self-supervised learning (SSL) has emerged as a powerful technique for improving the efficiency and effectiveness of deep learning models. Contrastive methods are a prominent family of SSL that extract similar representations of two augmented views of an image while pushing away others in the representation space as negatives. However, the state-of-the-art contrastive methods require large batch s…
▽ More
Self-supervised learning (SSL) has emerged as a powerful technique for improving the efficiency and effectiveness of deep learning models. Contrastive methods are a prominent family of SSL that extract similar representations of two augmented views of an image while pushing away others in the representation space as negatives. However, the state-of-the-art contrastive methods require large batch sizes and augmentations designed for natural images that are impractical for 3D medical images. To address these limitations, we propose a new longitudinal SSL method, 3DTINC, based on non-contrastive learning. It is designed to learn perturbation-invariant features for 3D optical coherence tomography (OCT) volumes, using augmentations specifically designed for OCT. We introduce a new non-contrastive similarity loss term that learns temporal information implicitly from intra-patient scans acquired at different times. Our experiments show that this temporal information is crucial for predicting progression of retinal diseases, such as age-related macular degeneration (AMD). After pretraining with 3DTINC, we evaluated the learned representations and the prognostic models on two large-scale longitudinal datasets of retinal OCTs where we predict the conversion to wet-AMD within a six months interval. Our results demonstrate that each component of our contributions is crucial for learning meaningful representations useful in predicting disease progression from longitudinal volumetric scans.
△ Less
Submitted 13 May, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
From Concept to Field Tests: Accelerated Development of Multi-AUV Missions Using a High-Fidelity Faster-than-Real-Time Simulator
Authors:
Timothy R. Player,
Arjo Chakravarty,
Mabel M. Zhang,
Ben Yair Raanan,
Brian Kieft,
Yanwu Zhang,
Brett Hobson
Abstract:
We designed and validated a novel simulator for efficient development of multi-robot marine missions. To accelerate development of cooperative behaviors, the simulator models the robots' operating conditions with moderately high fidelity and runs significantly faster than real time, including acoustic communications, dynamic environmental data, and high-resolution bathymetry in large worlds. The s…
▽ More
We designed and validated a novel simulator for efficient development of multi-robot marine missions. To accelerate development of cooperative behaviors, the simulator models the robots' operating conditions with moderately high fidelity and runs significantly faster than real time, including acoustic communications, dynamic environmental data, and high-resolution bathymetry in large worlds. The simulator's ability to exceed a real-time factor (RTF) of 100 has been stress-tested with a robust continuous integration suite and was used to develop a multi-robot field experiment.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT
Authors:
Taha Emre,
Marzieh Oghbaie,
Arunava Chakravarty,
Antoine Rivail,
Sophie Riedl,
Julia Mai,
Hendrik P. N. Scholl,
Sobha Sivaprasad,
Daniel Rueckert,
Andrew Lotery,
Ursula Schmidt-Erfurth,
Hrvoje Bogunović
Abstract:
In the field of medical imaging, 3D deep learning models play a crucial role in building powerful predictive models of disease progression. However, the size of these models presents significant challenges, both in terms of computational resources and data requirements. Moreover, achieving high-quality pretraining of 3D models proves to be even more challenging. To address these issues, hybrid 2.5…
▽ More
In the field of medical imaging, 3D deep learning models play a crucial role in building powerful predictive models of disease progression. However, the size of these models presents significant challenges, both in terms of computational resources and data requirements. Moreover, achieving high-quality pretraining of 3D models proves to be even more challenging. To address these issues, hybrid 2.5D approaches provide an effective solution for utilizing 3D volumetric data efficiently using 2D models. Combining 2D and 3D techniques offers a promising avenue for optimizing performance while minimizing memory requirements. In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers. In addition, leveraging the benefits of recent non-contrastive pretraining approaches in 2D, we enhanced the performance and data efficiency of 2.5D techniques even further. We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period on two large longitudinal OCT datasets.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Morph-SSL: Self-Supervision with Longitudinal Morphing to Predict AMD Progression from OCT
Authors:
Arunava Chakravarty,
Taha Emre,
Oliver Leingang,
Sophie Riedl,
Julia Mai,
Hendrik P. N. Scholl,
Sobha Sivaprasad,
Daniel Rueckert,
Andrew Lotery,
Ursula Schmidt-Erfurth,
Hrvoje Bogunović
Abstract:
The lack of reliable biomarkers makes predicting the conversion from intermediate to neovascular age-related macular degeneration (iAMD, nAMD) a challenging task. We develop a Deep Learning (DL) model to predict the future risk of conversion of an eye from iAMD to nAMD from its current OCT scan. Although eye clinics generate vast amounts of longitudinal OCT scans to monitor AMD progression, only a…
▽ More
The lack of reliable biomarkers makes predicting the conversion from intermediate to neovascular age-related macular degeneration (iAMD, nAMD) a challenging task. We develop a Deep Learning (DL) model to predict the future risk of conversion of an eye from iAMD to nAMD from its current OCT scan. Although eye clinics generate vast amounts of longitudinal OCT scans to monitor AMD progression, only a small subset can be manually labeled for supervised DL. To address this issue, we propose Morph-SSL, a novel Self-supervised Learning (SSL) method for longitudinal data. It uses pairs of unlabelled OCT scans from different visits and involves morphing the scan from the previous visit to the next. The Decoder predicts the transformation for morphing and ensures a smooth feature manifold that can generate intermediate scans between visits through linear interpolation. Next, the Morph-SSL trained features are input to a Classifier which is trained in a supervised manner to model the cumulative probability distribution of the time to conversion with a sigmoidal function. Morph-SSL was trained on unlabelled scans of 399 eyes (3570 visits). The Classifier was evaluated with a five-fold cross-validation on 2418 scans from 343 eyes with clinical labels of the conversion date. The Morph-SSL features achieved an AUC of 0.766 in predicting the conversion to nAMD within the next 6 months, outperforming the same network when trained end-to-end from scratch or pre-trained with popular SSL methods. Automated prediction of the future risk of nAMD onset can enable timely treatment and individualized AMD management.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
e-Inu: Simulating A Quadruped Robot With Emotional Sentience
Authors:
Abhiruph Chakravarty,
Jatin Karthik Tripathy,
Sibi Chakkaravarthy S,
Aswani Kumar Cherukuri,
S. Anitha,
Firuz Kamalov,
Annapurna Jonnalagadda
Abstract:
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via soun…
▽ More
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Learning Spatio-Temporal Model of Disease Progression with NeuralODEs from Longitudinal Volumetric Data
Authors:
Dmitrii Lachinov,
Arunava Chakravarty,
Christoph Grechenig,
Ursula Schmidt-Erfurth,
Hrvoje Bogunovic
Abstract:
Robust forecasting of the future anatomical changes inflicted by an ongoing disease is an extremely challenging task that is out of grasp even for experienced healthcare professionals. Such a capability, however, is of great importance since it can improve patient management by providing information on the speed of disease progression already at the admission stage, or it can enrich the clinical t…
▽ More
Robust forecasting of the future anatomical changes inflicted by an ongoing disease is an extremely challenging task that is out of grasp even for experienced healthcare professionals. Such a capability, however, is of great importance since it can improve patient management by providing information on the speed of disease progression already at the admission stage, or it can enrich the clinical trials with fast progressors and avoid the need for control arms by the means of digital twins. In this work, we develop a deep learning method that models the evolution of age-related disease by processing a single medical scan and providing a segmentation of the target anatomy at a requested future point in time. Our method represents a time-invariant physical process and solves a large-scale problem of modeling temporal pixel-level changes utilizing NeuralODEs. In addition, we demonstrate the approaches to incorporate the prior domain-specific constraints into our method and define temporal Dice loss for learning temporal objectives. To evaluate the applicability of our approach across different age-related diseases and imaging modalities, we developed and tested the proposed method on the datasets with 967 retinal OCT volumes of 100 patients with Geographic Atrophy, and 2823 brain MRI volumes of 633 patients with Alzheimer's Disease. For Geographic Atrophy, the proposed method outperformed the related baseline models in the atrophy growth prediction. For Alzheimer's Disease, the proposed method demonstrated remarkable performance in predicting the brain ventricle changes induced by the disease, achieving the state-of-the-art result on TADPOLE challenge.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
TINC: Temporally Informed Non-Contrastive Learning for Disease Progression Modeling in Retinal OCT Volumes
Authors:
Taha Emre,
Arunava Chakravarty,
Antoine Rivail,
Sophie Riedl,
Ursula Schmidt-Erfurth,
Hrvoje Bogunović
Abstract:
Recent contrastive learning methods achieved state-of-the-art in low label regimes. However, the training requires large batch sizes and heavy augmentations to create multiple views of an image. With non-contrastive methods, the negatives are implicitly incorporated in the loss, allowing different images and modalities as pairs. Although the meta-information (i.e., age, sex) in medical imaging is…
▽ More
Recent contrastive learning methods achieved state-of-the-art in low label regimes. However, the training requires large batch sizes and heavy augmentations to create multiple views of an image. With non-contrastive methods, the negatives are implicitly incorporated in the loss, allowing different images and modalities as pairs. Although the meta-information (i.e., age, sex) in medical imaging is abundant, the annotations are noisy and prone to class imbalance. In this work, we exploited already existing temporal information (different visits from a patient) in a longitudinal optical coherence tomography (OCT) dataset using temporally informed non-contrastive loss (TINC) without increasing complexity and need for negative pairs. Moreover, our novel pair-forming scheme can avoid heavy augmentations and implicitly incorporates the temporal information in the pairs. Finally, these representations learned from the pretraining are more successful in predicting disease progression where the temporal information is crucial for the downstream task. More specifically, our model outperforms existing models in predicting the risk of conversion within a time frame from intermediate age-related macular degeneration (AMD) to the late wet-AMD stage.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Training and pattern recognition by an opto-magnetic neural network
Authors:
A. Chakravarty,
J. H. Mentink,
S. Semin,
A. V. Kimel,
Th. Rasing
Abstract:
Neuromorphic computing aims to mimic the architecture of the human brain to carry out computational tasks that are challenging and much more energy consuming for standard hardware. Despite progress in several fields of physics and engineering, the realization of artificial neural networks which combine high operating speeds with fast and low-energy adaptability remains a challenge. Here we demonst…
▽ More
Neuromorphic computing aims to mimic the architecture of the human brain to carry out computational tasks that are challenging and much more energy consuming for standard hardware. Despite progress in several fields of physics and engineering, the realization of artificial neural networks which combine high operating speeds with fast and low-energy adaptability remains a challenge. Here we demonstrate an opto-magnetic neural network capable of learning and classification of digitized 3x3 characters exploiting local storage in the magnetic material. Using picosecond laser pulses, we find that micrometer sized synapses absorb well below 100 picojoule per synapse per laser pulse, with favorable scaling to smaller spatial dimensions. We thus succeeded in combining the speed and low-dissipation of optical networks with the low-energy adaptability and non-volatility of magnetism, providing a promising approach to fast and energy-efficient neuromorphic computing.
△ Less
Submitted 29 September, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
A Two-Stage Multiple Instance Learning Framework for the Detection of Breast Cancer in Mammograms
Authors:
Sarath Chandra K,
Arunava Chakravarty,
Nirmalya Ghosh,
Tandra Sarkar,
Ramanathan Sethuraman,
Debdoot Sheet
Abstract:
Mammograms are commonly employed in the large scale screening of breast cancer which is primarily characterized by the presence of malignant masses. However, automated image-level detection of malignancy is a challenging task given the small size of the mass regions and difficulty in discriminating between malignant, benign mass and healthy dense fibro-glandular tissue. To address these issues, we…
▽ More
Mammograms are commonly employed in the large scale screening of breast cancer which is primarily characterized by the presence of malignant masses. However, automated image-level detection of malignancy is a challenging task given the small size of the mass regions and difficulty in discriminating between malignant, benign mass and healthy dense fibro-glandular tissue. To address these issues, we explore a two-stage Multiple Instance Learning (MIL) framework. A Convolutional Neural Network (CNN) is trained in the first stage to extract local candidate patches in the mammograms that may contain either a benign or malignant mass. The second stage employs a MIL strategy for an image level benign vs. malignant classification. A global image-level feature is computed as a weighted average of patch-level features learned using a CNN. Our method performed well on the task of localization of masses with an average Precision/Recall of 0.76/0.80 and acheived an average AUC of 0.91 on the imagelevel classification task using a five-fold cross-validation on the INbreast dataset. Restricting the MIL only to the candidate patches extracted in Stage 1 led to a significant improvement in classification performance in comparison to a dense extraction of patches from the entire mammogram.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Learning Decision Ensemble using a Graph Neural Network for Comorbidity Aware Chest Radiograph Screening
Authors:
Arunava Chakravarty,
Tandra Sarkar,
Nirmalya Ghosh,
Ramanathan Sethuraman,
Debdoot Sheet
Abstract:
Chest radiographs are primarily employed for the screening of cardio, thoracic and pulmonary conditions. Machine learning based automated solutions are being developed to reduce the burden of routine screening on Radiologists, allowing them to focus on critical cases. While recent efforts demonstrate the use of ensemble of deep convolutional neural networks(CNN), they do not take disease comorbidi…
▽ More
Chest radiographs are primarily employed for the screening of cardio, thoracic and pulmonary conditions. Machine learning based automated solutions are being developed to reduce the burden of routine screening on Radiologists, allowing them to focus on critical cases. While recent efforts demonstrate the use of ensemble of deep convolutional neural networks(CNN), they do not take disease comorbidity into consideration, thus lowering their screening performance. To address this issue, we propose a Graph Neural Network (GNN) based solution to obtain ensemble predictions which models the dependencies between different diseases. A comprehensive evaluation of the proposed method demonstrated its potential by improving the performance over standard ensembling technique across a wide range of ensemble constructions. The best performance was achieved using the GNN ensemble of DenseNet121 with an average AUC of 0.821 across thirteen disease comorbidities.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
A Systematic Search over Deep Convolutional Neural Network Architectures for Screening Chest Radiographs
Authors:
Arka Mitra,
Arunava Chakravarty,
Nirmalya Ghosh,
Tandra Sarkar,
Ramanathan Sethuraman,
Debdoot Sheet
Abstract:
Chest radiographs are primarily employed for the screening of pulmonary and cardio-/thoracic conditions. Being undertaken at primary healthcare centers, they require the presence of an on-premise reporting Radiologist, which is a challenge in low and middle income countries. This has inspired the development of machine learning based automation of the screening process. While recent efforts demons…
▽ More
Chest radiographs are primarily employed for the screening of pulmonary and cardio-/thoracic conditions. Being undertaken at primary healthcare centers, they require the presence of an on-premise reporting Radiologist, which is a challenge in low and middle income countries. This has inspired the development of machine learning based automation of the screening process. While recent efforts demonstrate a performance benchmark using an ensemble of deep convolutional neural networks (CNN), our systematic search over multiple standard CNN architectures identified single candidate CNN models whose classification performances were found to be at par with ensembles. Over 63 experiments spanning 400 hours, executed on a 11:3 FP32 TensorTFLOPS compute system, we found the Xception and ResNet-18 architectures to be consistent performers in identifying co-existing disease conditions with an average AUC of 0.87 across nine pathologies. We conclude on the reliability of the models by assessing their saliency maps generated using the randomized input sampling for explanation (RISE) method and qualitatively validating them against manual annotations locally sourced from an experienced Radiologist. We also draw a critical note on the limitations of the publicly available CheXpert dataset primarily on account of disparity in class distribution in training vs. testing sets, and unavailability of sufficient samples for few classes, which hampers quantitative reporting due to sample insufficiency.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Supervised learning of an opto-magnetic neural network with ultrashort laser pulses
Authors:
A. Chakravarty,
J. H. Mentink,
C. S. Davies,
K. T. Yamada,
A. V. Kimel,
Th. Rasing
Abstract:
The explosive growth of data and its related energy consumption is pushing the need to develop energy-efficient brain-inspired schemes and materials for data processing and storage. Here, we demonstrate experimentally that Co/Pt films can be used as artificial synapses by manipulating their magnetization state using circularly-polarized ultrashort optical pulses at room temperature. We also show a…
▽ More
The explosive growth of data and its related energy consumption is pushing the need to develop energy-efficient brain-inspired schemes and materials for data processing and storage. Here, we demonstrate experimentally that Co/Pt films can be used as artificial synapses by manipulating their magnetization state using circularly-polarized ultrashort optical pulses at room temperature. We also show an efficient implementation of supervised perceptron learning on an opto-magnetic neural network, built from such magnetic synapses. Importantly, we demonstrate that the optimization of synaptic weights can be achieved using a global feedback mechanism, such that the learning does not rely on external storage or additional optimization schemes. These results suggest there is high potential for realizing artificial neural networks using optically-controlled magnetization in technologically relevant materials, that can learn not only fast but also energy-efficient.
△ Less
Submitted 28 May, 2019; v1 submitted 4 November, 2018;
originally announced November 2018.
-
A Deep Learning based Joint Segmentation and Classification Framework for Glaucoma Assesment in Retinal Color Fundus Images
Authors:
Arunava Chakravarty,
Jayanthi Sivswamy
Abstract:
Automated Computer Aided diagnostic tools can be used for the early detection of glaucoma to prevent irreversible vision loss. In this work, we present a Multi-task Convolutional Neural Network (CNN) that jointly segments the Optic Disc (OD), Optic Cup (OC) and predicts the presence of glaucoma in color fundus images. The CNN utilizes a combination of image appearance features and structural featu…
▽ More
Automated Computer Aided diagnostic tools can be used for the early detection of glaucoma to prevent irreversible vision loss. In this work, we present a Multi-task Convolutional Neural Network (CNN) that jointly segments the Optic Disc (OD), Optic Cup (OC) and predicts the presence of glaucoma in color fundus images. The CNN utilizes a combination of image appearance features and structural features obtained from the OD-OC segmentation to obtain a robust prediction. The use of fewer network parameters and the sharing of the CNN features for multiple related tasks ensures the good generalizability of the architecture, allowing it to be trained on small training sets. The cross-testing performance of the proposed method on an independent validation set acquired using a different camera and image resolution was found to be good with an average dice score of 0.92 for OD, 0.84 for OC and AUC of 0.95 on the task of glaucoma classification illustrating its potential as a mass screening tool for the early detection of glaucoma.
△ Less
Submitted 29 July, 2018;
originally announced August 2018.