The specific objective of the present study was to develop computational models, by means of which predictions could be performed regarding the quality of the bulk-tank milk in dairy sheep and goat farms. Our hypothesis was that use of specific variables related to the health management applied in the farm can facilitate the development of predictions regarding values related to milk quality, specifically for fat content, protein content, fat and protein content combined, somatic cell counts, and total bacterial counts. Bulk-tank milk from 325 sheep and 119 goat farms was collected and evaluated by established techniques for analysis of fat and protein content, for somatic cell counting, and for total bacterial counting. Subsequently, computational models were constructed for the prediction of five target values: (a) fat content, (b) protein content, (c) fat and protein, (d) somatic cell counts, and (e) total bacterial counts, through the use of 21 independent variables related to factors prevalent in the farm. Five machine learning tools were employed: decision trees (18 different models evaluated), random forests (16 models), XGBoost (240 models), k-nearest neighbours (72 models), and neural networks (576 models) (in total, 9220 evaluations were performed). Tools found with the lowest mean absolute percentage error (MAPE) between the five tools used to test predictions for each target value were selected. In sheep farms, for the prediction of protein content, k-nearest neighbours was selected (MAPE: 3.95%); for the prediction of fat and protein content combined, neural networks was selected (6.00%); and for the prediction of somatic cell counts, random forests and k-nearest neighbours were selected (6.55%); no tool provided useful predictions for fat content and for total bacterial counts. In goat farms, for the prediction of protein content, k-nearest neighbours was selected (MAPE: 6.17%); for the prediction of somatic cell counts, random forests and k-nearest neighbours were selected (4.93% and 5.00%); and for the prediction of total bacterial counts, neural networks was selected (8.33%); no tool provided useful prediction models for fat content and for fat and protein content combined. The results of the study will be of interest to farmers, as well as to professionals; the findings will also be useful to dairy processing factories. That way, it will be possible to obtain a distance-aware, rapid, quantitative estimation of the milk output from sheep and goat farms with sufficient data attributes. It will thus become easier to monitor and improve milk quality at the farm level as part of the dairy production chain. Moreover, the findings can support the setup of relevant and appropriate measures and interventions in dairy sheep and goat farms.
Keywords: artificial intelligence; goat; health management; k-nearest neighbour; mastitis; milk composition; milk fat; milk protein; neural network; prediction; random forests; sheep; somatic cell counts; total bacterial counts.