Dystocia or difficult calving in cattle is detrimental to the health of the afflicted cows and has a negative economic impact on the dairy industry. The goal of this study was to create a data-driven tool for predicting the calving difficulty of non-heifer cows using input variables that are known prior to the moment of insemination. Compared to past studies, we excluded input variables that can only be known during or after insemination, such as birth weight and gestation length. This makes the model suitable for informing mating decisions that could reduce the incidence of difficult calvings or mitigate their consequences. We used a dataset consisting of 131,527 calving records of Holstein cattle, from which we derived a total of 274 phenotypic features and estimated breeding values. The distribution of classes in the dataset was 96.7 % normal calvings, and 3.3 % difficult calvings. We used a gradient boosted trees (XGBoost) as the learning model and a bagging ensemble approach to deal with the extreme class imbalance. The model achieved an average area under the ROC curve of 0.73 on unseen test data. Using feature importance analysis, we identified a number of features that have a high discriminatory value for calving difficulty, including maternal and paternal breeding values, and past phenotypic measurements of the cow.
Keywords: Bovine reproduction; Difficult calving; Machine learning; Obstetrics; Precision livestock farming.
Copyright © 2022 The Authors. Published by Elsevier B.V. All rights reserved.