Machine Learning and Advanced Statistical Modeling Can Identify Key Quality Management Practices That Affect Postpasteurization Contamination of Fluid Milk

J Food Prot. 2021 Sep 1;84(9):1496-1511. doi: 10.4315/JFP-20-431.

Abstract

Abstract: Spoilage of high-temperature, short-time (HTST)- and vat-pasteurized fluid milk due to the introduction of gram-negative bacteria postpasteurization remains a challenge for the dairy industry. Although processing facility-level practices (e.g., sanitation practices) are known to impact the frequency of postpasteurization contamination (PPC), the relative importance of different practices is not well defined, thereby affecting the ability of facilities to select intervention targets that reduce PPC and provide the greatest return on investment. Thus, the goal of this study was to use an existing longitudinal data set of bacterial spoilage indicators obtained for pasteurized fluid milk samples collected from 23 processing facilities between July 2015 and November 2017 (with three to five samplings per facility) and data from a survey on fluid milk quality management practices, to identify factors associated with PPC and rank their relative importance. This ranking was accomplished using two separate approaches: multimodel inference and conditional random forest. Data preprocessing for multimodel inference analysis showed (i) nearly all factors were significantly associated with PPC when assessed individually using univariable logistic regression and (ii) numerous pairs of factors were strongly associated with each other (Cramer's V ≥ 0.80). Multimodel inference and conditional random forest analyses identified similar drivers associated with PPC; factors identified as most important based on these analyses included cleaning and sanitation practices, activities related to good manufacturing practices, container type (a proxy for different filling equipment), in-house finished product testing, and designation of a quality department, indicating potential targets for reducing PPC. In addition, this study illustrates how machine learning approaches can be used with highly correlated and unbalanced data, as typical for food safety and quality, to facilitate improved data analyses and decision making.

Keywords: Dairy; Machine learning; Postpasteurization contamination; Quality management; Spoilage bacteria.

MeSH terms

  • Animals
  • Bacteria
  • Dairying
  • Food Contamination* / analysis
  • Machine Learning
  • Milk*