Principal component analysis (PCA) is a variable reduction method used on over-parameterized data sets with a vast number of variables and a limited number of observations, such as Dairy Herd Improvement (DHI) data, to select subsets of variables that describe the largest amount of variance. Cluster analysis (CA) segregates objects, in this case dairy herds, into groups based upon similarity in multiple characteristics simultaneously. This project aimed to apply PCA to discover the subset of most meaningful DHI variables and to discover groupings of dairy herds with similar performance characteristics. Year 2011 DHI data was obtained for 557 Upper Midwest herds with test-day mean ≥200 cows (assumed mostly freestall housed), that remained on test for the entire year. The PCA reduced an initial list of 22 variables to 16. The average distance method of CA grouped farms based on best goodness of fit determined by the minimum cophenetic distance. Six groupings provided the optimal fitting number of clusters. Descriptive statistics for the 16 variables were computed per group. On observations of means, groups 1, 2, and 6 demonstrated the best performances in most variables, including energy-corrected milk, linear somatic cell score (log of somatic cell count), dry period intramammary infection cure rate, new intramammary infection risk, risk of subclinical intramammary infection at first test, age at first calving, days in milk, and Transition Cow Index. Groups 3, 4, and 5 demonstrated the worst mean performances in most the PCA-selected variables, including DIM, age at first calving, risk of subclinical intramammary infection at first test, and dry period intramammary infection cure rate. Groups 4 and 5 also had the worst mean herd performances in energy-corrected milk, Transition Cow Index, linear somatic cell score, and new intramammary infection risk. Further investigation will be conducted to reveal patterns of management associated with herd categorization. The PCA and CA should be used when describing the multivariate performance of dairy herds and whenever working with over-parameterized data sets, such as DHI databases.
Keywords: Dairy Herd Improvement data; cluster analysis; principal component analysis.
Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.