The objective of the study was to develop a computational model with which predictions regarding the level of prevalence of mastitis in dairy sheep farms could be performed. Data for the construction of the model were obtained from a large Greece-wide field study with 111 farms. Unsupervised learning methodology was applied for clustering data into two clusters based on 18 variables (17 independent variables related to health management practices applied in farms, climatological data at the locations of the farms, and the level of prevalence of subclinical mastitis as the target value). The K-means tool showed the highest significance for the classification of farms into two clusters for the construction of the computational model: median (interquartile range) prevalence of subclinical mastitis among farms was 20.0% (interquartile range: 15.8%) and 30.0% (16.0%) (p = 0.002). Supervised learning tools were subsequently used to predict the level of prevalence of the infection: decision trees, k-NN, neural networks, and Support vector machines. For each of these, combinations of hyperparameters were employed; 83 models were produced, and 4150 assessments were made in total. A computational model obtained by means of Support vector machines (kernel: 'linear', regularization parameter C = 3) was selected. Thereafter, the model was assessed through the results of the prevalence of subclinical mastitis in 373 records from sheep flocks unrelated to the ones employed for the selection of the model; the model was used for evaluation of the correct classification of the data in each of 373 sets, each of which included a test (prediction) subset with one record that referred to the farm under assessment. The median prevalence of the infection in farms classified by the model in each of the two categories was 10.4% (5.5%) and 36.3% (9.7%) (p < 0.0001). The overall accuracy of the model for the results presented by the K-means tool was 94.1%; for the estimation of the level of prevalence (<25.0%/≥25.0%) in the farms, it was 96.3%. The findings of this study indicate that machine learning algorithms can be usefully employed in predicting the level of subclinical mastitis in dairy sheep farms. This can facilitate setting up appropriate health management measures for interventions in the farms.
Keywords: machine learning; mastitis; prediction; sheep; support vector machines.