Sign in to use this feature.

Years

Between: -

Search Results (6,824)

Search Parameters:
Keywords = SVM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 3921 KiB  
Article
Predicting Antidiabetic Peptide Activity: A Machine Learning Perspective on Type 1 and Type 2 Diabetes
by Kaida Cai, Zhe Zhang, Wenzhou Zhu, Xiangwei Liu, Tingqing Yu and Wang Liao
Int. J. Mol. Sci. 2024, 25(18), 10020; https://doi.org/10.3390/ijms251810020 (registering DOI) - 18 Sep 2024
Abstract
Diabetes mellitus (DM) presents a critical global health challenge, characterized by persistent hyperglycemia and associated with substantial economic and health-related burdens. This study employs advanced machine-learning techniques to improve the prediction and classification of antidiabetic peptides, with a particular focus on differentiating those [...] Read more.
Diabetes mellitus (DM) presents a critical global health challenge, characterized by persistent hyperglycemia and associated with substantial economic and health-related burdens. This study employs advanced machine-learning techniques to improve the prediction and classification of antidiabetic peptides, with a particular focus on differentiating those effective against T1DM from those targeting T2DM. We integrate feature selection with analysis methods, including logistic regression, support vector machines (SVM), and adaptive boosting (AdaBoost), to classify antidiabetic peptides based on key features. Feature selection through the Lasso-penalized method identifies critical peptide characteristics that significantly influence antidiabetic activity, thereby establishing a robust foundation for future peptide design. A comprehensive evaluation of logistic regression, SVM, and AdaBoost shows that AdaBoost consistently outperforms the other methods, making it the most effective approach for classifying antidiabetic peptides. This research underscores the potential of machine learning in the systematic evaluation of bioactive peptides, contributing to the advancement of peptide-based therapies for diabetes management. Full article
(This article belongs to the Special Issue Machine Learning in Disease Diagnosis and Treatment)
Show Figures

Figure 1

25 pages, 3397 KiB  
Article
Hyperspectral Imaging for Phenotyping Plant Drought Stress and Nitrogen Interactions Using Multivariate Modeling and Machine Learning Techniques in Wheat
by Frank Gyan Okyere, Daniel Kingsley Cudjoe, Nicolas Virlet, March Castle, Andrew Bernard Riche, Latifa Greche, Fady Mohareb, Daniel Simms, Manal Mhada and Malcolm John Hawkesford
Remote Sens. 2024, 16(18), 3446; https://doi.org/10.3390/rs16183446 (registering DOI) - 17 Sep 2024
Abstract
Accurate detection of drought stress in plants is essential for water use efficiency and agricultural output. Hyperspectral imaging (HSI) provides a non-invasive method in plant phenotyping, allowing the long-term monitoring of plant health due to sensitivity to subtle changes in leaf constituents. The [...] Read more.
Accurate detection of drought stress in plants is essential for water use efficiency and agricultural output. Hyperspectral imaging (HSI) provides a non-invasive method in plant phenotyping, allowing the long-term monitoring of plant health due to sensitivity to subtle changes in leaf constituents. The broad spectral range of HSI enables the development of different vegetation indices (VIs) to analyze plant trait responses to multiple stresses, such as the combination of nutrient and drought stresses. However, known VIs may underperform when subjected to multiple stresses. This study presents new VIs in tandem with machine learning models to identify drought stress in wheat plants under varying nitrogen (N) levels. A pot wheat experiment was set up in the glasshouse with four treatments: well-watered high-N (WWHN), well-watered low-N (WWLN), drought-stress high-N (DSHN) and drought-stress low-N (DSLN). In addition to ensuring that plants were watered according to the experiment design, photosynthetic rate (Pn) and stomatal conductance (gs) (which are used to assess plant drought stress) were taken regularly, serving as the ground truth data for this study. The proposed VIs, together with known VIs, were used to train three classification models: support vector machines (SVM), random forest (RF), and deep neural networks (DNN) to classify plants based on their drought status. The proposed VIs achieved more than 0.94 accuracy across all models, and their performance further increased when combined with known VIs. The combined VIs were used to train three regression models to predict the stomatal conductance and photosynthetic rates of plants. The random forest regression model performed best, suggesting that it could be used as a stand-alone tool to forecast gs and Pn and track drought stress in wheat. This study shows that combining hyperspectral data with machine learning can effectively monitor and predict drought stress in crops, especially in varying nitrogen conditions. Full article
Show Figures

Figure 1

17 pages, 8104 KiB  
Article
Potential Plasma Proteins (LGALS9, LAMP3, PRSS8 and AGRN) as Predictors of Hospitalisation Risk in COVID-19 Patients
by Thomas McLarnon, Darren McDaid, Seodhna M. Lynch, Eamonn Cooper, Joseph McLaughlin, Victoria E. McGilligan, Steven Watterson, Priyank Shukla, Shu-Dong Zhang, Magda Bucholc, Andrew English, Aaron Peace, Maurice O’Kane, Martin Kelly, Manav Bhavsar, Elaine K. Murray, David S. Gibson, Colum P. Walsh, Anthony J. Bjourson and Taranjit Singh Rai
Biomolecules 2024, 14(9), 1163; https://doi.org/10.3390/biom14091163 (registering DOI) - 17 Sep 2024
Viewed by 165
Abstract
Background: The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, has posed unprecedented challenges to healthcare systems worldwide. Here, we have identified proteomic and genetic signatures for improved prognosis which is vital for COVID-19 research. Methods: We investigated the proteomic and genomic profile [...] Read more.
Background: The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, has posed unprecedented challenges to healthcare systems worldwide. Here, we have identified proteomic and genetic signatures for improved prognosis which is vital for COVID-19 research. Methods: We investigated the proteomic and genomic profile of COVID-19-positive patients (n = 400 for proteomics, n = 483 for genomics), focusing on differential regulation between hospitalised and non-hospitalised COVID-19 patients. Signatures had their predictive capabilities tested using independent machine learning models such as Support Vector Machine (SVM), Random Forest (RF) and Logistic Regression (LR). Results: This study has identified 224 differentially expressed proteins involved in various inflammatory and immunological pathways in hospitalised COVID-19 patients compared to non-hospitalised COVID-19 patients. LGALS9 (p-value < 0.001), LAMP3 (p-value < 0.001), PRSS8 (p-value < 0.001) and AGRN (p-value < 0.001) were identified as the most statistically significant proteins. Several hundred rsIDs were queried across the top 10 significant signatures, identifying three significant SNPs on the FSTL3 gene showing a correlation with hospitalisation status. Conclusions: Our study has not only identified key signatures of COVID-19 patients with worsened health but has also demonstrated their predictive capabilities as potential biomarkers, which suggests a staple role in the worsened health effects caused by COVID-19. Full article
Show Figures

Figure 1

18 pages, 1556 KiB  
Article
Bayesian Optimized Machine Learning Model for Automated Eye Disease Classification from Fundus Images
by Tasnim Bill Zannah, Md. Abdulla-Hil-Kafi, Md. Alif Sheakh, Md. Zahid Hasan, Taslima Ferdaus Shuva, Touhid Bhuiyan, Md. Tanvir Rahman, Risala Tasin Khan, M. Shamim Kaiser and Md Whaiduzzaman
Computation 2024, 12(9), 190; https://doi.org/10.3390/computation12090190 - 16 Sep 2024
Viewed by 304
Abstract
Eye diseases are defined as disorders or diseases that damage the tissue and related parts of the eyes. They appear in various types and can be either minor, meaning that they do not last long, or permanent blindness. Cataracts, glaucoma, and diabetic retinopathy [...] Read more.
Eye diseases are defined as disorders or diseases that damage the tissue and related parts of the eyes. They appear in various types and can be either minor, meaning that they do not last long, or permanent blindness. Cataracts, glaucoma, and diabetic retinopathy are all eye illnesses that can cause vision loss if not discovered and treated early on. Automated classification of these diseases from fundus images can empower quicker diagnoses and interventions. Our research aims to create a robust model, BayeSVM500, for eye disease classification to enhance medical technology and improve patient outcomes. In this study, we develop models to classify images accurately. We start by preprocessing fundus images using contrast enhancement, normalization, and resizing. We then leverage several state-of-the-art deep convolutional neural network pre-trained models, including VGG16, VGG19, ResNet50, EfficientNet, and DenseNet, to extract deep features. To reduce feature dimensionality, we employ techniques such as principal component analysis, feature agglomeration, correlation analysis, variance thresholding, and feature importance rankings. Using these refined features, we train various traditional machine learning models as well as ensemble methods. Our best model, named BayeSVM500, is a Support Vector Machine classifier trained on EfficientNet features reduced to 500 dimensions via PCA, achieving 93.65 ± 1.05% accuracy. Bayesian hyperparameter optimization further improved performance to 95.33 ± 0.60%. Through comprehensive feature engineering and model optimization, we demonstrate highly accurate eye disease classification from fundus images, comparable to or superior to previous benchmarks. Full article
(This article belongs to the Special Issue Deep Learning Applications in Medical Imaging)
Show Figures

Figure 1

17 pages, 9162 KiB  
Article
Estimating Cadmium Concentration in Agricultural Soils with ZY1-02D Hyperspectral Data: A Comparative Analysis of Spectral Transformations and Machine Learning Models
by Junwei Lv, Jing Geng, Xuanhong Xu, Yong Yu, Huajun Fang, Yifan Guo and Shulan Cheng
Agriculture 2024, 14(9), 1619; https://doi.org/10.3390/agriculture14091619 - 15 Sep 2024
Viewed by 258
Abstract
The accumulation of cadmium (Cd) in agricultural soils presents a significant threat to crop safety, emphasizing the critical necessity for effective monitoring and management of soil Cd levels. Despite technological advancements, accurately monitoring soil Cd concentrations using satellite hyperspectral technology remains challenging, particularly [...] Read more.
The accumulation of cadmium (Cd) in agricultural soils presents a significant threat to crop safety, emphasizing the critical necessity for effective monitoring and management of soil Cd levels. Despite technological advancements, accurately monitoring soil Cd concentrations using satellite hyperspectral technology remains challenging, particularly in efficiently extracting spectral information. In this study, a total of 304 soil samples were collected from agricultural soils surrounding a tungsten mine located in the Xiancha River basin, Jiangxi Province, Southern China. Leveraging hyperspectral data from the ZY1-02D satellite, this research developed a comprehensive framework that evaluates the predictive accuracy of nine spectral transformations across four modeling approaches to estimate soil Cd concentrations. The spectral transformation methods included four logarithmic and reciprocal transformations, two derivative transformations, and three baseline correction and normalization transformations. The four models utilized for predicting soil Cd were partial least squares regression (PLSR), support vector machine (SVM), bidirectional recurrent neural networks (BRNN), and random forest (RF). The results indicated that these spectral transformations markedly enhanced the absorption and reflection features of the spectral curves, accentuating key peaks and troughs. Compared to the original spectral curves, the correlation analysis between the transformed spectra and soil Cd content showed a notable improvement, particularly with derivative transformations. The combination of the first derivative (FD) transformation with the RF model yielded the highest accuracy (R2 = 0.61, RMSE = 0.37 mg/kg, MAE = 0.21 mg/kg). Furthermore, the RF model in multiple spectral transformations exhibited higher suitability for modeling soil Cd content compared to other models. Overall, this research highlights the substantial applicative potential of the ZY1-02D satellite hyperspectral data for detecting soil heavy metals and provides a framework that integrates optimal spectral transformations and modeling techniques to estimate soil Cd contents. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

21 pages, 9422 KiB  
Article
GNSS-IR Soil Moisture Retrieval Using Multi-Satellite Data Fusion Based on Random Forest
by Yao Jiang, Rui Zhang, Bo Sun, Tianyu Wang, Bo Zhang, Jinsheng Tu, Shihai Nie, Hang Jiang and Kangyi Chen
Remote Sens. 2024, 16(18), 3428; https://doi.org/10.3390/rs16183428 - 15 Sep 2024
Viewed by 216
Abstract
The accuracy and reliability of soil moisture retrieval based on Global Positioning System (GPS) single-star Signal-to-Noise Ratio (SNR) data is low due to the influence of spatial and temporal differences of different satellites. Therefore, this paper proposes a Random Forest (RF)-based multi-satellite data [...] Read more.
The accuracy and reliability of soil moisture retrieval based on Global Positioning System (GPS) single-star Signal-to-Noise Ratio (SNR) data is low due to the influence of spatial and temporal differences of different satellites. Therefore, this paper proposes a Random Forest (RF)-based multi-satellite data fusion Global Navigation Satellite System Interferometric Reflectometry (GNSS-IR) soil moisture retrieval method, which utilizes the RF Model’s Mean Decrease Impurity (MDI) algorithm to adaptively assign arc weights to fuse all available satellite data to obtain accurate retrieval results. Subsequently, the effectiveness of the proposed method was validated using GPS data from the Plate Boundary Observatory (PBO) network sites P041 and P037, as well as data collected in Lamasquere, France. A Support Vector Machine model (SVM), Radial Basis Function (RBF) neural network model, and Convolutional Neural Network model (CNN) are introduced for the comparison of accuracy. The results indicated that the proposed method had the best retrieval performance, with Root Mean Square Error (RMSE) values of 0.032, 0.028, and 0.003 cm3/cm3, Mean Absolute Error (MAE) values of 0.025, 0.022, and 0.002 cm3/cm3, and correlation coefficients (R) of 0.94, 0.95, and 0.98, respectively, at the three sites. Therefore, the proposed soil moisture retrieval model demonstrates strong robustness and generalization capabilities, providing a reference for achieving high-precision, real-time monitoring of soil moisture. Full article
Show Figures

Figure 1

18 pages, 6254 KiB  
Article
Rice Yield Estimation Using Machine Learning and Feature Selection in Hilly and Mountainous Chongqing, China
by Li Fan, Shibo Fang, Jinlong Fan, Yan Wang, Linqing Zhan and Yongkun He
Agriculture 2024, 14(9), 1615; https://doi.org/10.3390/agriculture14091615 - 14 Sep 2024
Viewed by 366
Abstract
To investigate effective techniques for estimating rice production in hilly and mountainous areas, in this study, we collected yield data at the field level, agro-meteorological data, and Sentinel-2/MSI remote sensing data in Chongqing, China, between 2020 and 2023. The integral values of vegetation [...] Read more.
To investigate effective techniques for estimating rice production in hilly and mountainous areas, in this study, we collected yield data at the field level, agro-meteorological data, and Sentinel-2/MSI remote sensing data in Chongqing, China, between 2020 and 2023. The integral values of vegetation indicators from the rice greening up to heading–filling stages were determined using the Newton–trapezoidal integration method. Using correlation analysis and importance analysis of permutation features, the effects of agro-meteorological variables and vegetation index integrals on rice yield were assessed. The chosen characteristics were then combined with three machine learning techniques—random forest (RF), support vector machine (SVM), and partial least squares regression (PLSR)—to create six rice yield estimate models. The results showed that combined vegetation indices were more effective than indices used in separate development phases. Specifically, the correlation coefficients between the integral values of eight vegetation indices from rice greening up to heading–filling stages and rice yield were all above 0.65. By introducing agro-meteorological factors as new independent variables and combining them with vegetation indices as input parameters, the predictive capability of the model was evaluated. The results showed that the performance of PLSR remained stable, while the prediction accuracies of SVM and RF improved by 13% to 21.5%. After feature selection, the inversion performance of all three machine learning models improved, with the RF model coupled with variables selected during permutation feature importance analysis achieving the optimal inversion effect, which was characterized by a coefficient of determination of 0.85, a root mean square error of 529.1 kg/hm2, and a mean relative error of 5.63%. This study provides technical support for improving the accuracy of remote sensing-based crop yield estimation in hilly and mountainous regions, facilitating precise agricultural management and informing agrarian decision making. Full article
(This article belongs to the Special Issue Applications of Remote Sensing in Agricultural Soil and Crop Mapping)
Show Figures

Figure 1

16 pages, 1081 KiB  
Article
Optimized Machine Learning Classifiers for Symptom-Based Disease Screening
by Auba Fuster-Palà, Francisco Luna-Perejón, Lourdes Miró-Amarante and Manuel Domínguez-Morales
Computers 2024, 13(9), 233; https://doi.org/10.3390/computers13090233 - 14 Sep 2024
Viewed by 432
Abstract
This work presents a disease detection classifier based on symptoms encoded by their severity. This model is presented as part of the solution to the saturation of the healthcare system, aiding in the initial screening stage. An open-source dataset is used, which undergoes [...] Read more.
This work presents a disease detection classifier based on symptoms encoded by their severity. This model is presented as part of the solution to the saturation of the healthcare system, aiding in the initial screening stage. An open-source dataset is used, which undergoes pre-processing and serves as the data source to train and test various machine learning models, including SVM, RFs, KNN, and ANNs. A three-phase optimization process is developed to obtain the best classifier: first, the dataset is pre-processed; secondly, a grid search is performed with several hyperparameter variations to each classifier; and, finally, the best models obtained are subjected to additional filtering processes. The best-results model, selected based on the performance and the execution time, is a KNN with 2 neighbors, which achieves an accuracy and F1 score of over 98%. These results demonstrate the effectiveness and improvement of the evaluated models compared to previous studies, particularly in terms of accuracy. Although the ANN model has a longer execution time compared to KNN, it is retained in this work due to its potential to handle more complex datasets in a real clinical context. Full article
(This article belongs to the Special Issue Future Systems Based on Healthcare 5.0 for Pandemic Preparedness 2024)
Show Figures

Figure 1

20 pages, 3457 KiB  
Article
Non-Invasive Endometrial Cancer Screening through Urinary Fluorescent Metabolome Profile Monitoring and Machine Learning Algorithms
by Monika Švecová, Katarína Dubayová, Anna Birková, Peter Urdzík and Mária Mareková
Cancers 2024, 16(18), 3155; https://doi.org/10.3390/cancers16183155 - 14 Sep 2024
Viewed by 306
Abstract
Endometrial cancer is becoming increasingly common, highlighting the need for improved diagnostic methods that are both effective and non-invasive. This study investigates the use of urinary fluorescence spectroscopy as a potential diagnostic tool for endometrial cancer. Urine samples were collected from endometrial cancer [...] Read more.
Endometrial cancer is becoming increasingly common, highlighting the need for improved diagnostic methods that are both effective and non-invasive. This study investigates the use of urinary fluorescence spectroscopy as a potential diagnostic tool for endometrial cancer. Urine samples were collected from endometrial cancer patients (n = 77), patients with benign uterine tumors (n = 23), and control gynecological patients attending regular checkups or follow-ups (n = 96). These samples were analyzed using synchronous fluorescence spectroscopy to measure the total fluorescent metabolome profile, and specific fluorescence ratios were created to differentiate between control, benign, and malignant samples. These spectral markers demonstrated potential clinical applicability with AUC as high as 80%. Partial Least Squares Discriminant Analysis (PLS-DA) was employed to reduce data dimensionality and enhance class separation. Additionally, machine learning models, including Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Stochastic Gradient Descent (SGD), were utilized to distinguish between controls and endometrial cancer patients. PLS-DA achieved an overall accuracy of 79% and an AUC of 90%. These promising results indicate that urinary fluorescence spectroscopy, combined with advanced machine learning models, has the potential to revolutionize endometrial cancer diagnostics, offering a rapid, accurate, and non-invasive alternative to current methods. Full article
(This article belongs to the Special Issue Image Analysis and Machine Learning in Cancers)
Show Figures

Figure 1

17 pages, 4081 KiB  
Article
Chemical Fractions and Magnetic Simulation Based on Machine Learning for Trace Metals in a Sedimentary Column of Lake Taihu
by Hui Xiao, Tong Ke, Liming Chen, Dehu Li, Wanru Yang, Xin Qian, Long Chen, Ligang Deng and Huiming Li
Water 2024, 16(18), 2604; https://doi.org/10.3390/w16182604 - 14 Sep 2024
Viewed by 217
Abstract
In this study, the chemical fractions (CFs) of trace metal (TMs) and multiple magnetic parameters were analysed in the sedimentary column from the centre of Lake Taihu. The sedimentary column, measuring 53 cm in length, was dated using 210Pb and 137Cs [...] Read more.
In this study, the chemical fractions (CFs) of trace metal (TMs) and multiple magnetic parameters were analysed in the sedimentary column from the centre of Lake Taihu. The sedimentary column, measuring 53 cm in length, was dated using 210Pb and 137Cs to be 124 years old. Surface layers of the column were found to contain significantly higher concentrations of Cd, Co, Cu, Pb, Sb, Ti, and Zn than the middle and bottom layers. The sedimentary core contained a substantial amount of ferrimagnetic minerals. Most of the TMs were present in the residual state, except for Mn and Pb. The chemical fractions of Cd exhibited the most significant variation with depth. The pollution load index (PLI) indicated moderate TMs pollution levels in the region, whereas the risk assessment code (RAC) classified Mn as being heavily polluted. Multiple linear regression (MLR) and random forest (RF), support vector machine (SVM), and XGBoost (1.7.7.1) machine learning models were used to simulate the RAC and total concentration of TMs, using physical and chemical indicators and magnetic parameters of the sediments as input variables. The MLR model outperformed RF, SVM, and XGBoost in simulating the CFs and total concentrations of most TMs in the sedimentary column, with R2 up to 0.668 and 0.87. The SHapley Additive exPlanations (SHAP) method reveals that χarm/χ is the dominant factor influencing the RAC of As in the XGBoost models. For the RAC of Co and Cu in RF models, C% and N% exhibit greater contributions. Full article
(This article belongs to the Section Water Quality and Contamination)
Show Figures

Figure 1

38 pages, 2067 KiB  
Article
A Multi-Strategy Enhanced Hybrid Ant–Whale Algorithm and Its Applications in Machine Learning
by Chenyang Gao, Yahua He  and Yuelin Gao
Mathematics 2024, 12(18), 2848; https://doi.org/10.3390/math12182848 - 13 Sep 2024
Viewed by 256
Abstract
Based on the principles of biomimicry, evolutionary algorithms (EAs) have been widely applied across diverse domains to tackle practical challenges. However, the inherent limitations of these algorithms call for further refinement to strike a delicate balance between global exploration and local exploitation. Thus, [...] Read more.
Based on the principles of biomimicry, evolutionary algorithms (EAs) have been widely applied across diverse domains to tackle practical challenges. However, the inherent limitations of these algorithms call for further refinement to strike a delicate balance between global exploration and local exploitation. Thus, this paper introduces a novel multi-strategy enhanced hybrid algorithm called MHWACO, which integrates a Whale Optimization Algorithm (WOA) and Ant Colony Optimization (ACO). Initially, MHWACO employs Gaussian perturbation optimization for individual initialization. Subsequently, individuals selectively undertake either localized exploration based on the refined WOA or global prospecting anchored in the Golden Sine Algorithm (Golden-SA), determined by transition probabilities. Inspired by the collaborative behavior of ant colonies, a Flight Ant (FA) strategy is proposed to guide unoptimized individuals toward potential global optimal solutions. Finally, the Gaussian scatter search (GSS) strategy is activated during low population activity, striking a balance between global exploration and local exploitation capabilities. Moreover, the efficacy of Support Vector Regression (SVR) and random forest (RF) as regression models heavily depends on parameter selection. In response, we have devised the MHWACO-SVM and MHWACO-RF models to refine the selection of parameters, applying them to various real-world problems such as stock prediction, housing estimation, disease forecasting, fire prediction, and air quality monitoring. Experimental comparisons against 9 newly proposed intelligent optimization algorithms and 9 enhanced algorithms across 34 benchmark test functions and the CEC2022 benchmark suite, highlight the notable superiority and efficacy of MSWOA in addressing global optimization problems. Finally, the proposed MHWACO-SVM and MHWACO-RF models outperform other regression models across key metrics such as the Mean Bias Error (MBE), Coefficient of Determination (R2), Mean Absolute Error (MAE), Explained Variance Score (EVS), and Median Absolute Error (MEAE). Full article
21 pages, 5094 KiB  
Article
Parameter Optimization of a Surface Mechanical Rolling Treatment Process to Improve the Surface Integrity and Fatigue Property of FV520B Steel by Machine Learning
by Yongxin Zhou, Zheng Xing, Qianduo Zhuang, Jiao Sun and Xingrong Chu
Materials 2024, 17(18), 4505; https://doi.org/10.3390/ma17184505 - 13 Sep 2024
Viewed by 368
Abstract
Surface integrity is a critical factor that affects the fatigue resistance of materials. A surface mechanical rolling treatment (SMRT) process can effectively improve the surface integrity of the material, thus enhancing the fatigue property. In this paper, an analysis of variance (ANOVA) and [...] Read more.
Surface integrity is a critical factor that affects the fatigue resistance of materials. A surface mechanical rolling treatment (SMRT) process can effectively improve the surface integrity of the material, thus enhancing the fatigue property. In this paper, an analysis of variance (ANOVA) and signal-to-noise ratio (SNR) are performed by orthogonal experimental design with SMRT parameters as variables and surface integrity indicators as optimization objectives, and the support vector machine-active learning (SVM-AL) model is proposed based on machine learning theory. The entire model includes three rounds of AL processes. In each round of the AL process, the SMRT parameters with relative average deviation and high output values from cross-validation are selected for the additional experimental supplement. The results show that the prediction accuracy and generalization ability of the SVM-AL model are significantly improved compared to the support vector machine (SVM) model. A fatigue test was also carried out, and the fatigue property of the SMRT specimens predicted by the SVM-AL model is also higher than that of the other specimens. Full article
Show Figures

Figure 1

21 pages, 13840 KiB  
Article
Estimating Forest Gross Primary Production Using Machine Learning, Light Use Efficiency Model, and Global Eddy Covariance Data
by Zhenkun Tian, Yingying Fu, Tao Zhou, Chuixiang Yi, Eric Kutter, Qin Zhang and Nir Y. Krakauer
Forests 2024, 15(9), 1615; https://doi.org/10.3390/f15091615 - 13 Sep 2024
Viewed by 312
Abstract
Forests play a vital role in atmospheric CO2 sequestration among terrestrial ecosystems, mitigating the greenhouse effect induced by human activity in a changing climate. The LUE (light use efficiency) model is a popular algorithm for calculating terrestrial GPP (gross primary production) based [...] Read more.
Forests play a vital role in atmospheric CO2 sequestration among terrestrial ecosystems, mitigating the greenhouse effect induced by human activity in a changing climate. The LUE (light use efficiency) model is a popular algorithm for calculating terrestrial GPP (gross primary production) based on physiological mechanisms and is easy to implement. Different versions have been applied for many years to simulate the GPP of different ecosystem types at regional or global scales. For estimating forest GPP using different approaches, we implemented five LUE models (EC-LUE, VPM, GOL-PEM, CASA, and C-Fix) in forests of type DBF, EBF, ENF, and MF, using the FLUXNET2015 dataset, remote sensing observations, and Köppen–Geiger climate zones. We then fused these models to additionally improve the ability of the GPP estimation using an RF (random forest) and an SVM (support vector machine). Our results indicated that under a unified parameterization scheme, EC-LUE and VPM yielded the best performance in simulating GPP variations, followed by GLO-PEM, CASA, and C-fix, while MODIS also demonstrated reliable GPP estimation ability. The results of the model fusion across different forest types and flux net sites indicated that the RF could capture more GPP variation magnitudes with higher R2 and lower RMSE than the SVM. Both RF and SVM were validated using cross-validation for all forest types and flux net sites, showing that the accuracy of the GPP simulation could be improved by the RF and SVM by 28% and 27%. Full article
(This article belongs to the Section Forest Ecology and Management)
Show Figures

Figure 1

19 pages, 4248 KiB  
Article
Predicting Leukoplakia and Oral Squamous Cell Carcinoma Using Interpretable Machine Learning: A Retrospective Analysis
by Salem Shamsul Alam, Saif Ahmed, Taseef Hasan Farook and James Dudley
Oral 2024, 4(3), 386-404; https://doi.org/10.3390/oral4030032 - 13 Sep 2024
Viewed by 455
Abstract
Purpose: The purpose of this study is to assess the effectiveness of the best performing interpretable machine learning models in the diagnoses of leukoplakia and oral squamous cell carcinoma (OSCC). Methods: A total of 237 patient cases were analysed that included [...] Read more.
Purpose: The purpose of this study is to assess the effectiveness of the best performing interpretable machine learning models in the diagnoses of leukoplakia and oral squamous cell carcinoma (OSCC). Methods: A total of 237 patient cases were analysed that included information about patient demographics, lesion characteristics, and lifestyle factors, such as age, gender, tobacco use, and lesion size. The dataset was preprocessed and normalised, and then separated into training and testing sets. The following models were tested: K-Nearest Neighbours (KNN), Logistic Regression, Naive Bayes, Support Vector Machine (SVM), and Random Forest. The overall accuracy, Kappa score, class-specific precision, recall, and F1 score were used to assess performance. SHAP (SHapley Additive ExPlanations) was used to interpret the Random Forest model and determine the contribution of each feature to the predictions. Results: The Random Forest model had the best overall accuracy (93%) and Kappa score (0.90). For OSCC, it had a precision of 0.91, a recall of 1.00, and an F1 score of 0.95. The model had a precision of 1.00, recall of 0.78, and F1 score of 0.88 for leukoplakia without dysplasia. The precision for leukoplakia with dysplasia was 0.91, the recall was 1.00, and the F1 score was 0.95. The top three features influencing the prediction of leukoplakia with dysplasia are buccal mucosa localisation, ages greater than 60 years, and larger lesions. For leukoplakia without dysplasia, the key features are gingival localisation, larger lesions, and tongue localisation. In the case of OSCC, gingival localisation, floor-of-mouth localisation, and buccal mucosa localisation are the most influential features. Conclusions: The Random Forest model outperformed the other machine learning models in diagnosing oral cancer and potentially malignant oral lesions with higher accuracy and interpretability. The machine learning models struggled to identify dysplastic changes. Using SHAP improves the understanding of the importance of features, facilitating early diagnosis and possibly reducing mortality rates. The model notably indicated that lesions on the floor of the mouth were highly unlikely to be dysplastic, instead showing one of the highest probabilities for being OSCC. Full article
Show Figures

Figure 1

17 pages, 3327 KiB  
Article
Explainable Machine Learning Model to Accurately Predict Protein-Binding Peptides
by Sayed Mehedi Azim, Aravind Balasubramanyam, Sheikh Rabiul Islam, Jinglin Fu and Iman Dehzangi
Algorithms 2024, 17(9), 409; https://doi.org/10.3390/a17090409 - 12 Sep 2024
Viewed by 318
Abstract
Enzymes play key roles in the biological functions of living organisms, which serve as catalysts to and regulate biochemical reaction pathways. Recent studies suggest that peptides are promising molecules for modulating enzyme function due to their advantages in large chemical diversity and well-established [...] Read more.
Enzymes play key roles in the biological functions of living organisms, which serve as catalysts to and regulate biochemical reaction pathways. Recent studies suggest that peptides are promising molecules for modulating enzyme function due to their advantages in large chemical diversity and well-established methods for library synthesis. Experimental approaches to identify protein-binding peptides are time-consuming and costly. Hence, there is a demand to develop a fast and accurate computational approach to tackle this problem. Another challenge in developing a computational approach is the lack of a large and reliable dataset. In this study, we develop a new machine learning approach called PepBind-SVM to predict protein-binding peptides. To build this model, we extract different sequential and physicochemical features from peptides and use a Support Vector Machine (SVM) as the classification technique. We train this model on the dataset that we also introduce in this study. PepBind-SVM achieves 92.1% prediction accuracy, outperforming other classifiers at predicting protein-binding peptides. Full article
(This article belongs to the Special Issue Machine Learning for Pattern Recognition (2nd Edition))
Show Figures

Figure 1

Back to TopTop