-
Skin cancer diagnosis using NIR spectroscopy data of skin lesions in vivo using machine learning algorithms
Authors:
Flavio P. Loss,
Pedro H. da Cunha,
Matheus B. Rocha,
Madson Poltronieri Zanoni,
Leandro M. de Lima,
Isadora Tavares Nascimento,
Isabella Rezende,
Tania R. P. Canuto,
Luciana de Paula Vieira,
Renan Rossoni,
Maria C. S. Santos,
Patricia Lyra Frasson,
Wanderson Romão,
Paulo R. Filgueiras,
Renato A. Krohling
Abstract:
Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability…
▽ More
Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability to provide information of the molecular structure of the lesion. NIR spectroscopy may provide an alternative source of information to automated CAD of skin lesions. The most commonly used techniques and classification algorithms used in spectroscopy are Principal Component Analysis (PCA), Partial Least Squares - Discriminant Analysis (PLS-DA), and Support Vector Machines (SVM). Nonetheless, there is a growing interest in applying the modern techniques of machine and deep learning (MDL) to spectroscopy. One of the main limitations to apply MDL to spectroscopy is the lack of public datasets. Since there is no public dataset of NIR spectral data to skin lesions, as far as we know, an effort has been made and a new dataset named NIR-SC-UFES, has been collected, annotated and analyzed generating the gold-standard for classification of NIR spectral data to skin cancer. Next, the machine learning algorithms XGBoost, CatBoost, LightGBM, 1D-convolutional neural network (1D-CNN) were investigated to classify cancer and non-cancer skin lesions. Experimental results indicate the best performance obtained by LightGBM with pre-processing using standard normal variate (SNV), feature extraction providing values of 0.839 for balanced accuracy, 0.851 for recall, 0.852 for precision, and 0.850 for F-score. The obtained results indicate the first steps in CAD of skin lesions aiming the automated triage of patients with skin lesions in vivo using NIR spectral data.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Bayesian artificial brain with ChatGPT
Authors:
Renato A. Krohling
Abstract:
This paper aims to investigate the mathematical problem-solving capabilities of Chat Generative Pre-Trained Transformer (ChatGPT) in case of Bayesian reasoning. The study draws inspiration from Zhu & Gigerenzer's research in 2006, which posed the question: Can children reason the Bayesian way? In the pursuit of answering this question, a set of 10 Bayesian reasoning problems were presented. The re…
▽ More
This paper aims to investigate the mathematical problem-solving capabilities of Chat Generative Pre-Trained Transformer (ChatGPT) in case of Bayesian reasoning. The study draws inspiration from Zhu & Gigerenzer's research in 2006, which posed the question: Can children reason the Bayesian way? In the pursuit of answering this question, a set of 10 Bayesian reasoning problems were presented. The results of their work revealed that children's ability to reason effectively using Bayesian principles is contingent upon a well-structured information representation. In this paper, we present the same set of 10 Bayesian reasoning problems to ChatGPT. Remarkably, the results demonstrate that ChatGPT provides the right solutions to all problems.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
1D Convolutional neural networks and machine learning algorithms for spectral data classification with a case study for Covid-19
Authors:
Breno Aguiar Krohling,
Renato A Krohling
Abstract:
Machine and deep learning algorithms have increasingly been applied to solve problems in various areas of knowledge. Among these areas, Chemometrics has been benefited from the application of these algorithms in spectral data analysis. Commonly, algorithms such as Support Vector Machines and Partial Least Squares are applied to spectral datasets to perform classification and regression tasks. In t…
▽ More
Machine and deep learning algorithms have increasingly been applied to solve problems in various areas of knowledge. Among these areas, Chemometrics has been benefited from the application of these algorithms in spectral data analysis. Commonly, algorithms such as Support Vector Machines and Partial Least Squares are applied to spectral datasets to perform classification and regression tasks. In this paper, we present a 1D convolutional neural networks (1D-CNN) to evaluate the effectiveness on spectral data obtained from spectroscopy. In most cases, the spectrum signals are noisy and present overlap among classes. Firstly, we perform extensive experiments including 1D-CNN compared to machine learning algorithms and standard algorithms used in Chemometrics on spectral data classification for the most known datasets available in the literature. Next, spectral samples of the SARS-COV2 virus, which causes the COVID-19, have recently been collected via spectroscopy was used as a case study. Experimental results indicate superior performance of 1D-CNN over machine learning algorithms and standard algorithms, obtaining an average accuracy of 96.5%, specificity of 98%, and sensitivity of 94%. The promissing obtained results indicate the feasibility to use 1D-CNN in automated systems to diagnose COVID-19 and other viral diseases in the future.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
Development of a hybrid method for stock trading based on TOPSIS, EMD and ELM
Authors:
Elivelto Ebermam,
Helder Knidel,
Renato A. Krohling
Abstract:
Deciding when to buy or sell a stock is not an easy task because the market is hard to predict, being influenced by political and economic factors. Thus, methodologies based on computational intelligence have been applied to this challenging problem. In this work, every day the stocks are ranked by technique for order preference by similarity to ideal solution (TOPSIS) using technical analysis cri…
▽ More
Deciding when to buy or sell a stock is not an easy task because the market is hard to predict, being influenced by political and economic factors. Thus, methodologies based on computational intelligence have been applied to this challenging problem. In this work, every day the stocks are ranked by technique for order preference by similarity to ideal solution (TOPSIS) using technical analysis criteria, and the most suitable stock is selected for purchase. Even so, it may occur that the market is not favorable to purchase on certain days, or even, the TOPSIS make an incorrect selection. To improve the selection, another method should be used. So, a hybrid model composed of empirical mode decomposition (EMD) and extreme learning machine (ELM) is proposed. The EMD decomposes the series into several sub-series, and thus the main omponent (trend) is extracted. This component is processed by the ELM, which performs the prediction of the next element of component. If the value predicted by the ELM is greater than the last value, then the purchase of the stock is confirmed. The method was applied in a universe of 50 stocks in the Brazilian market. The selection made by TOPSIS showed promising results when compared to the random selection and the return generated by the Bovespa index. Confirmation with the EMD-ELM hybrid model was able to increase the percentage of profit tradings.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small Datasets
Authors:
Leandro M. de Lima,
Renato A. Krohling
Abstract:
Skin cancer is one of the most common types of cancer in the world. Different computer-aided diagnosis systems have been proposed to tackle skin lesion diagnosis, most of them based in deep convolutional neural networks. However, recent advances in computer vision achieved state-of-art results in many tasks, notably Transformer-based networks. We explore and evaluate advances in computer vision ar…
▽ More
Skin cancer is one of the most common types of cancer in the world. Different computer-aided diagnosis systems have been proposed to tackle skin lesion diagnosis, most of them based in deep convolutional neural networks. However, recent advances in computer vision achieved state-of-art results in many tasks, notably Transformer-based networks. We explore and evaluate advances in computer vision architectures, training methods and multimodal feature fusion for skin lesion diagnosis task. Experiments show that PiT ($0.800 \pm 0.006$), CoaT ($0.780 \pm 0.024$) and ViT ($0.771 \pm 0.018$) backbone models with MetaBlock fusion achieved state-of-art results for balanced accuracy metric in PAD-UFES-20 dataset.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Solving integer multi-objective optimization problems using TOPSIS, Differential Evolution and Tabu Search
Authors:
Renato A. Krohling,
Erick R. F. A. Schneider
Abstract:
This paper presents a method to solve non-linear integer multiobjective optimization problems. First the problem is formulated using the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS). Next, the Differential Evolution (DE) algorithm in its three versions (standard DE, DE best and DEGL) are used as optimizer. Since the solutions found by the DE algorithms are continuous, th…
▽ More
This paper presents a method to solve non-linear integer multiobjective optimization problems. First the problem is formulated using the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS). Next, the Differential Evolution (DE) algorithm in its three versions (standard DE, DE best and DEGL) are used as optimizer. Since the solutions found by the DE algorithms are continuous, the Tabu Search (TS) algorithm is employed to find integer solutions during the optimization process. Experimental results show the effectiveness of the proposed method.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
A visualization tool for data analysis on higher education dropout: a case study at UFES
Authors:
Pedro P. Ladeira,
Leandro M. de Lima,
Renato A. Krohling
Abstract:
Through the analysis of cultural, socioeconomic and academic performance aspects it is possible to map the profile of the students and their motivations to drop out. This article aims to create a computational tool for data visualization that allows drawing the profile of students to support educational institutions managers in the definition of dropout avoidance policies. We present a method to t…
▽ More
Through the analysis of cultural, socioeconomic and academic performance aspects it is possible to map the profile of the students and their motivations to drop out. This article aims to create a computational tool for data visualization that allows drawing the profile of students to support educational institutions managers in the definition of dropout avoidance policies. We present a method to treat data collected by higher education institutions over the years, analyze them to understand the dropout and provide that information to the university and the general public. Eight questions were proposed to clarify the dropout from the Federal University of Espírito Santo, Brazil. The questions were answered through the dashboard that helps to understand the causes of dropout. It is expected that this tool can be used by others educational institutions to draw student profiles contributing to possible resolution of the problem.
△ Less
Submitted 29 January, 2022;
originally announced January 2022.
-
Beyond Visual Image: Automated Diagnosis of Pigmented Skin Lesions Combining Clinical Image Features with Patient Data
Authors:
José G. M. Esgario,
Renato A. Krohling
Abstract:
kin cancer is considered one of the most common type of cancer in several countries. Due to the difficulty and subjectivity in the clinical diagnosis of skin lesions, Computer-Aided Diagnosis systems are being developed for assist experts to perform more reliable diagnosis. The clinical analysis and diagnosis of skin lesions relies not only on the visual information but also on the context informa…
▽ More
kin cancer is considered one of the most common type of cancer in several countries. Due to the difficulty and subjectivity in the clinical diagnosis of skin lesions, Computer-Aided Diagnosis systems are being developed for assist experts to perform more reliable diagnosis. The clinical analysis and diagnosis of skin lesions relies not only on the visual information but also on the context information provided by the patient. This work addresses the problem of pigmented skin lesions detection from smartphones captured images. In addition to the features extracted from images, patient context information was collected to provide a more accurate diagnosis. The experiments showed that the combination of visual features with context information improved final results. Experimental results are very promising and comparable to experts.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
A Smartphone based Application for Skin Cancer Classification Using Deep Learning with Clinical Images and Lesion Information
Authors:
Breno Krohling,
Pedro B. C. Castro,
Andre G. C. Pacheco,
Renato A. Krohling
Abstract:
Over the last decades, the incidence of skin cancer, melanoma and non-melanoma, has increased at a continuous rate. In particular for melanoma, the deadliest type of skin cancer, early detection is important to increase patient prognosis. Recently, deep neural networks (DNNs) have become viable to deal with skin cancer detection. In this work, we present a smartphone-based application to assist on…
▽ More
Over the last decades, the incidence of skin cancer, melanoma and non-melanoma, has increased at a continuous rate. In particular for melanoma, the deadliest type of skin cancer, early detection is important to increase patient prognosis. Recently, deep neural networks (DNNs) have become viable to deal with skin cancer detection. In this work, we present a smartphone-based application to assist on skin cancer detection. This application is based on a Convolutional Neural Network(CNN) trained on clinical images and patients demographics, both collected from smartphones. Also, as skin cancer datasets are imbalanced, we present an approach, based on the mutation operator of Differential Evolution (DE) algorithm, to balance data. In this sense, beyond provides a flexible tool to assist doctors on skin cancer screening phase, the method obtains promising results with a balanced accuracy of 85% and a recall of 96%.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning
Authors:
Leandro M. de Lima,
Renato A. Krohling
Abstract:
High dropout rates in tertiary education expose a lack of efficiency that causes frustration of expectations and financial waste. Predicting students at risk is not enough to avoid student dropout. Usually, an appropriate aid action must be discovered and applied in the proper time for each student. To tackle this sequential decision-making problem, we propose a decision support method to the sele…
▽ More
High dropout rates in tertiary education expose a lack of efficiency that causes frustration of expectations and financial waste. Predicting students at risk is not enough to avoid student dropout. Usually, an appropriate aid action must be discovered and applied in the proper time for each student. To tackle this sequential decision-making problem, we propose a decision support method to the selection of aid actions for students using offline reinforcement learning to support decision-makers effectively avoid student dropout. Additionally, a discretization of student's state space applying two different clustering methods is evaluated. Our experiments using logged data of real students shows, through off-policy evaluation, that the method should achieve roughly 1.0 to 1.5 times as much cumulative reward as the logged policy. So, it is feasible to help decision-makers apply appropriate aid actions and, possibly, reduce student dropout.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Recent advances in deep learning applied to skin cancer detection
Authors:
Andre G. C. Pacheco,
Renato A. Krohling
Abstract:
Skin cancer is a major public health problem around the world. Its early detection is very important to increase patient prognostics. However, the lack of qualified professionals and medical instruments are significant issues in this field. In this context, over the past few years, deep learning models applied to automated skin cancer detection have become a trend. In this paper, we present an ove…
▽ More
Skin cancer is a major public health problem around the world. Its early detection is very important to increase patient prognostics. However, the lack of qualified professionals and medical instruments are significant issues in this field. In this context, over the past few years, deep learning models applied to automated skin cancer detection have become a trend. In this paper, we present an overview of the recent advances reported in this field as well as a discussion about the challenges and opportunities for improvement in the current models. In addition, we also present some important aspects regarding the use of these models in smartphones and indicate future directions we believe the field will take.
△ Less
Submitted 6 December, 2019;
originally announced December 2019.
-
The impact of patient clinical information on automated skin cancer detection
Authors:
Andre G. C. Pacheco,
Renato A. Krohling
Abstract:
Skin cancer is one of the most common types of cancer around the world. For this reason, over the past years, different approaches have been proposed to assist detect it. Nonetheless, most of them are based only on dermoscopy images and do not take into account the patient clinical information. In this work, first, we present a new dataset that contains clinical images, acquired from smartphones,…
▽ More
Skin cancer is one of the most common types of cancer around the world. For this reason, over the past years, different approaches have been proposed to assist detect it. Nonetheless, most of them are based only on dermoscopy images and do not take into account the patient clinical information. In this work, first, we present a new dataset that contains clinical images, acquired from smartphones, and patient clinical information of the skin lesions. Next, we introduce a straightforward approach to combine the clinical data and the images using different well-known deep learning models. These models are applied to the presented dataset using only the images and combining them with the patient clinical information. We present a comprehensive study to show the impact of the clinical data on the final predictions. The results obtained by combining both sets of information show a general improvement of around 7% in the balanced accuracy for all models. In addition, the statistical test indicates significant differences between the models with and without considering both data. The improvement achieved shows the potential of using patient clinical information in skin cancer detection and indicates that this piece of information is important to leverage skin cancer detection systems.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.
-
Deep Learning for Classification and Severity Estimation of Coffee Leaf Biotic Stress
Authors:
J. G. M. Esgario,
R. A. Krohling,
J. A. Ventura
Abstract:
Biotic stress consists of damage to plants through other living organisms. Efficient control of biotic agents such as pests and pathogens (viruses, fungi, bacteria, etc.) is closely related to the concept of agricultural sustainability. Agricultural sustainability promotes the development of new technologies that allow the reduction of environmental impacts, greater accessibility to farmers and, c…
▽ More
Biotic stress consists of damage to plants through other living organisms. Efficient control of biotic agents such as pests and pathogens (viruses, fungi, bacteria, etc.) is closely related to the concept of agricultural sustainability. Agricultural sustainability promotes the development of new technologies that allow the reduction of environmental impacts, greater accessibility to farmers and, consequently, increase on productivity. The use of computer vision with deep learning methods allows the early and correct identification of the stress-causing agent. So, corrective measures can be applied as soon as possible to mitigate the problem. The objective of this work is to design an effective and practical system capable of identifying and estimating the stress severity caused by biotic agents on coffee leaves. The proposed approach consists of a multi-task system based on convolutional neural networks. In addition, we have explored the use of data augmentation techniques to make the system more robust and accurate. The experimental results obtained for classification as well as for severity estimation indicate that the proposed system might be a suitable tool to assist both experts and farmers in the identification and quantification of biotic stresses in coffee plantations.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
A smartphone application to detection and classification of coffee leaf miner and coffee leaf rust
Authors:
Giuliano L. Manso,
Helder Knidel,
Renato A. Krohling,
Jose A. Ventura
Abstract:
Generally, the identification and classification of plant diseases and/or pests are performed by an expert . One of the problems facing coffee farmers in Brazil is crop infestation, particularly by leaf rust Hemileia vastatrix and leaf miner Leucoptera coffeella. The progression of the diseases and or pests occurs spatially and temporarily. So, it is very important to automatically identify the de…
▽ More
Generally, the identification and classification of plant diseases and/or pests are performed by an expert . One of the problems facing coffee farmers in Brazil is crop infestation, particularly by leaf rust Hemileia vastatrix and leaf miner Leucoptera coffeella. The progression of the diseases and or pests occurs spatially and temporarily. So, it is very important to automatically identify the degree of severity. The main goal of this article consists on the development of a method and its i implementation as an App that allow the detection of the foliar damages from images of coffee leaf that are captured using a smartphone, and identify whether it is rust or leaf miner, and in turn the calculation of its severity degree. The method consists of identifying a leaf from the image and separates it from the background with the use of a segmentation algorithm. In the segmentation process, various types of backgrounds for the image using the HSV and YCbCr color spaces are tested. In the segmentation of foliar damages, the Otsu algorithm and the iterative threshold algorithm, in the YCgCr color space, have been used and compared to k-means. Next, features of the segmented foliar damages are calculated. For the classification, artificial neural network trained with extreme learning machine have been used. The results obtained shows the feasibility and effectiveness of the approach to identify and classify foliar damages, and the automatic calculation of the severity. The results obtained are very promising according to experts.
△ Less
Submitted 19 March, 2019;
originally announced April 2019.
-
Application of Genetic Algorithms to the Multiple Team Formation Problem
Authors:
Jose G. M. Esgario,
Iago E. da Silva,
Renato A. Krohling
Abstract:
Allocating of people in multiple projects is an important issue considering the efficiency of groups from the point of view of social interaction. In this paper, based on previous works, the Multiple Team Formation Problem (MTFP) based on sociometric techniques is formulated as an optimization problem taking into account the social interaction among team members. To solve the resulting optimizatio…
▽ More
Allocating of people in multiple projects is an important issue considering the efficiency of groups from the point of view of social interaction. In this paper, based on previous works, the Multiple Team Formation Problem (MTFP) based on sociometric techniques is formulated as an optimization problem taking into account the social interaction among team members. To solve the resulting optimization problem we propose a Genetic Algorithm due to the NP-hard nature of the problem. The social cohesion is an important issue that directly impacts the productivity of the work environment. So, maintaining an appropriate level of cohesion keeps a group together, which will bring positive impacts on the results of a project. The aim of the proposal is to ensure the best possible effectiveness from the point of view of social interaction. In this way, the presented algorithm serves as a decision-making tool for managers to build teams of people in multiple projects. In order to analyze the performance of the proposed method, computational experiments with benchmarks were performed and compared with the exhaustive method. The results are promising and show that the algorithm generally obtains near-optimal results within a short computational time.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Ranking of classification algorithms in terms of mean-standard deviation using A-TOPSIS
Authors:
Andre G. C. Pacheco,
Renato A. Krohling
Abstract:
In classification problems when multiples algorithms are applied to different benchmarks a difficult issue arises, i.e., how can we rank the algorithms? In machine learning it is common run the algorithms several times and then a statistic is calculated in terms of means and standard deviations. In order to compare the performance of the algorithms, it is very common to employ statistical tests. H…
▽ More
In classification problems when multiples algorithms are applied to different benchmarks a difficult issue arises, i.e., how can we rank the algorithms? In machine learning it is common run the algorithms several times and then a statistic is calculated in terms of means and standard deviations. In order to compare the performance of the algorithms, it is very common to employ statistical tests. However, these tests may also present limitations, since they consider only the means and not the standard deviations of the obtained results. In this paper, we present the so called A-TOPSIS, based on TOPSIS (Technique for Order Preference by Similarity to Ideal Solution), to solve the problem of ranking and comparing classification algorithms in terms of means and standard deviations. We use two case studies to illustrate the A-TOPSIS for ranking classification algorithms and the results show the suitability of A-TOPSIS to rank the algorithms. The presented approach is general and can be applied to compare the performance of stochastic algorithms in machine learning. Finally, to encourage researchers to use the A-TOPSIS for ranking algorithms we also presented in this work an easy-to-use A-TOPSIS web framework.
△ Less
Submitted 22 October, 2016;
originally announced October 2016.
-
TODIM and TOPSIS with Z-numbers
Authors:
R. A. Krohling,
Artem dos Santos,
A. G. C. Pacheco
Abstract:
In this paper, we present an approach that is able to handle with Z-numbers in the context of Multi-Criteria Decision Making (MCDM) problems. Z-numbers are composed of two parts, the first one is a restriction on the values that can be assumed, and the second part is the reliability of the information. As human beings we communicate with other people by means of natural language using sentences li…
▽ More
In this paper, we present an approach that is able to handle with Z-numbers in the context of Multi-Criteria Decision Making (MCDM) problems. Z-numbers are composed of two parts, the first one is a restriction on the values that can be assumed, and the second part is the reliability of the information. As human beings we communicate with other people by means of natural language using sentences like: the journey time from home to university takes about half hour, very likely. Firstly, Z-numbers are converted to fuzzy numbers using a standard procedure. Next, the Z-TODIM and Z-TOPSIS are presented as a direct extension of the fuzzy TODIM and fuzzy TOPSIS, respectively. The proposed methods are applied to two case studies and compared with the standard approach using crisp values. Results obtained show the feasibility of the approach. In addition, a graphical interface was built to handle with both methods Z- TODIM and Z-TOPSIS allowing ease of use for user in other areas of knowledge.
△ Less
Submitted 19 September, 2016;
originally announced September 2016.
-
An approach to dealing with missing values in heterogeneous data using k-nearest neighbors
Authors:
Davi E. N. Frossard,
Igor O. Nunes,
Renato A. Krohling
Abstract:
Techniques such as clusterization, neural networks and decision making usually rely on algorithms that are not well suited to deal with missing values. However, real world data frequently contains such cases. The simplest solution is to either substitute them by a best guess value or completely disregard the missing values. Unfortunately, both approaches can lead to biased results. In this paper,…
▽ More
Techniques such as clusterization, neural networks and decision making usually rely on algorithms that are not well suited to deal with missing values. However, real world data frequently contains such cases. The simplest solution is to either substitute them by a best guess value or completely disregard the missing values. Unfortunately, both approaches can lead to biased results. In this paper, we propose a technique for dealing with missing values in heterogeneous data using imputation based on the k-nearest neighbors algorithm. It can handle real (which we refer to as crisp henceforward), interval and fuzzy data. The effectiveness of the algorithm is tested on several datasets and the numerical results are promising.
△ Less
Submitted 13 August, 2016;
originally announced August 2016.