-
U-Net-based Models for Skin Lesion Segmentation: More Attention and Augmentation
Authors:
Pooya Mohammadi Kazaj,
MohammadHossein Koosheshi,
Ali Shahedi,
Alireza Vafaei Sadr
Abstract:
According to WHO[1], since the 1970s, diagnosis of melanoma skin cancer has been more frequent. However, if detected early, the 5-year survival rate for melanoma can increase to 99 percent. In this regard, skin lesion segmentation can be pivotal in monitoring and treatment planning. In this work, ten models and four augmentation configurations are trained on the ISIC 2016 dataset. The performance…
▽ More
According to WHO[1], since the 1970s, diagnosis of melanoma skin cancer has been more frequent. However, if detected early, the 5-year survival rate for melanoma can increase to 99 percent. In this regard, skin lesion segmentation can be pivotal in monitoring and treatment planning. In this work, ten models and four augmentation configurations are trained on the ISIC 2016 dataset. The performance and overfitting are compared utilizing five metrics. Our results show that the U-Net-Resnet50 and the R2U-Net have the highest metrics value, along with two data augmentation scenarios. We also investigate CBAM and AG blocks in the U-Net architecture, which enhances segmentation performance at a meager computational cost. In addition, we propose using pyramid, AG, and CBAM blocks in a sequence, which significantly surpasses the results of using the two individually. Finally, our experiments show that models that have exploited attention modules successfully overcome common skin lesion segmentation problems. Lastly, in the spirit of reproducible research, we implement models and codes publicly available.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Learning to Detect Interesting Anomalies
Authors:
Alireza Vafaei Sadr,
Bruce A. Bassett,
Emmanuel Sekyi
Abstract:
Anomaly detection algorithms are typically applied to static, unchanging, data features hand-crafted by the user. But how does a user systematically craft good features for anomalies that have never been seen? Here we couple deep learning with active learning -- in which an Oracle iteratively labels small amounts of data selected algorithmically over a series of rounds -- to automatically and dyna…
▽ More
Anomaly detection algorithms are typically applied to static, unchanging, data features hand-crafted by the user. But how does a user systematically craft good features for anomalies that have never been seen? Here we couple deep learning with active learning -- in which an Oracle iteratively labels small amounts of data selected algorithmically over a series of rounds -- to automatically and dynamically improve the data features for efficient outlier detection. This approach, AHUNT, shows excellent performance on MNIST, CIFAR10, and Galaxy-DESI data, significantly outperforming both standard anomaly detection and active learning algorithms with static feature spaces. Beyond improved performance, AHUNT also allows the number of anomaly classes to grow organically in response to Oracle's evaluations. Extensive ablation studies explore the impact of Oracle question selection strategy and loss function on performance. We illustrate how the dynamic anomaly class taxonomy represents another step towards fully personalized rankings of different anomaly classes that reflect a user's interests, allowing the algorithm to learn to ignore statistically significant but uninteresting outliers (e.g., noise). This should prove useful in the era of massive astronomical datasets serving diverse sets of users who can only review a tiny subset of the incoming data.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Recommendations on test datasets for evaluating AI solutions in pathology
Authors:
André Homeyer,
Christian Geißler,
Lars Ole Schwen,
Falk Zakrzewski,
Theodore Evans,
Klaus Strohmenger,
Max Westphal,
Roman David Bülow,
Michaela Kargl,
Aray Karjauv,
Isidre Munné-Bertran,
Carl Orge Retzlaff,
Adrià Romero-López,
Tomasz Sołtysiński,
Markus Plass,
Rita Carvalho,
Peter Steinbach,
Yu-Chia Lan,
Nassim Bouteldja,
David Haber,
Mateo Rojas-Carulla,
Alireza Vafaei Sadr,
Matthias Kraft,
Daniel Krüger,
Rutger Fick
, et al. (5 additional authors not shown)
Abstract:
Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recom…
▽ More
Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing.
A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations for the collection of test datasets.
We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries?
The recommendations are intended to help AI developers demonstrate the utility of their products and to help regulatory agencies and end users verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
IMDb data from Two Generations, from 1979 to 2019; Part one, Dataset Introduction and Preliminary Analysis
Authors:
M. Bahraminasr,
A. Vafaei Sadr
Abstract:
"IMDb" as a user-regulating and one the most-visited portal has provided an opportunity to create an enormous database. Analysis of the information on Internet Movie Database - IMDb, either those related to the movie or provided by users would help to reveal the determinative factors in the route of success for each movie. As the lack of a comprehensive dataset was felt, we determined to do create…
▽ More
"IMDb" as a user-regulating and one the most-visited portal has provided an opportunity to create an enormous database. Analysis of the information on Internet Movie Database - IMDb, either those related to the movie or provided by users would help to reveal the determinative factors in the route of success for each movie. As the lack of a comprehensive dataset was felt, we determined to do create a compendious dataset for the later analysis using the statistical methods and machine learning models; It comprises of various information provided on IMDb such as rating data, genre, cast and crew, MPAA rating certificate, parental guide details, related movie information, posters, etc, for over 79k titles which is the largest dataset by this date. The present paper is the first paper in a series of papers aiming at the mentioned goals, by a description of the created dataset and a preliminary analysis including some trend in data, demographic analysis of IMDb scores and their relation of genre MPAA rating certificate has been investigated.
△ Less
Submitted 6 September, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Inpainting via Generative Adversarial Networks for CMB data analysis
Authors:
Alireza Vafaei Sadr,
Farida Farsian
Abstract:
In this work, we propose a new method to inpaint the CMB signal in regions masked out following a point source extraction process. We adopt a modified Generative Adversarial Network (GAN) and compare different combinations of internal (hyper-)parameters and training strategies. We study the performance using a suitable $\mathcal{C}_r$ variable in order to estimate the performance regarding the CMB…
▽ More
In this work, we propose a new method to inpaint the CMB signal in regions masked out following a point source extraction process. We adopt a modified Generative Adversarial Network (GAN) and compare different combinations of internal (hyper-)parameters and training strategies. We study the performance using a suitable $\mathcal{C}_r$ variable in order to estimate the performance regarding the CMB power spectrum recovery. We consider a test set where one point source is masked out in each sky patch with a 1.83 $\times$ 1.83 squared degree extension, which, in our gridding, corresponds to 64 $\times$ 64 pixels. The GAN is optimized for estimating performance on Planck 2018 total intensity simulations. The training makes the GAN effective in reconstructing a masking corresponding to about 1500 pixels with $1\%$ error down to angular scales corresponding to about 5 arcminutes.
△ Less
Submitted 21 April, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
A Flexible Framework for Anomaly Detection via Dimensionality Reduction
Authors:
Alireza Vafaei Sadr,
Bruce A. Bassett,
Martin Kunz
Abstract:
Anomaly detection is challenging, especially for large datasets in high dimensions. Here we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. We release DRAMA, a general python package that implements the general framework with a wide range of built-in options. We test DRAMA on a wide variety of simulated and real datasets, in up to 3000 d…
▽ More
Anomaly detection is challenging, especially for large datasets in high dimensions. Here we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. We release DRAMA, a general python package that implements the general framework with a wide range of built-in options. We test DRAMA on a wide variety of simulated and real datasets, in up to 3000 dimensions, and find it robust and highly competitive with commonly-used anomaly detection algorithms, especially in high dimensions. The flexibility of the DRAMA framework allows for significant optimization once some examples of anomalies are available, making it ideal for online anomaly detection, active learning and highly unbalanced datasets.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
DeepSource: Point Source Detection using Deep Learning
Authors:
A. Vafaei Sadr,
Etienne. E. Vos,
Bruce A. Bassett,
Zafiirah Hosenie,
N. Oozeer,
Michelle Lochner
Abstract:
Point source detection at low signal-to-noise is challenging for astronomical surveys, particularly in radio interferometry images where the noise is correlated. Machine learning is a promising solution, allowing the development of algorithms tailored to specific telescope arrays and science cases. We present DeepSource - a deep learning solution - that uses convolutional neural networks to achiev…
▽ More
Point source detection at low signal-to-noise is challenging for astronomical surveys, particularly in radio interferometry images where the noise is correlated. Machine learning is a promising solution, allowing the development of algorithms tailored to specific telescope arrays and science cases. We present DeepSource - a deep learning solution - that uses convolutional neural networks to achieve these goals. DeepSource enhances the Signal-to-Noise Ratio (SNR) of the original map and then uses dynamic blob detection to detect sources. Trained and tested on two sets of 500 simulated 1 deg x 1 deg MeerKAT images with a total of 300,000 sources, DeepSource is essentially perfect in both purity and completeness down to SNR = 4 and outperforms PyBDSF in all metrics. For uniformly-weighted images it achieves a Purity x Completeness (PC) score at SNR = 3 of 0.73, compared to 0.31 for the best PyBDSF model. For natural-weighting we find a smaller improvement of ~40% in the PC score at SNR = 3. If instead we ask where either of the purity or completeness first drop to 90%, we find that DeepSource reaches this value at SNR = 3.6 compared to the 4.3 of PyBDSF (natural-weighting). A key advantage of DeepSource is that it can learn to optimally trade off purity and completeness for any science case under consideration. Our results show that deep learning is a promising approach to point source detection in astronomical images.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.