Regression tools for chemical release modeling: An additive manufacturing case study

J Occup Environ Hyg. 2025 Jan 13:1-11. doi: 10.1080/15459624.2024.2447320. Online ahead of print.

Abstract

Chemical release data are essential for performing chemical risk assessments to understand the potential exposures arising from industrial processes. Often, these data are unknown or unavailable and must be estimated. A case study of volatile organic compound releases during extrusion-based additive manufacturing is used here to explore the viability of various regression methods for predicting chemical releases to inform chemical assessments. The methods assessed in this work include linear Least Squares, Least Absolute Shrinkage and Selection Operator (LASSO) and Ridge regression, classification and regression tree, random forest model, and neural network analysis. Secondary data describing polymeric extrusion in multiple applications are curated and assembled in a dataset to support regression modeling using default parameters for the various approaches. The potential to add noise to the dataset and improve regression is evaluated using synthetic data generation. Evaluation of model performance for a common test set found all methods were able to achieve predictions within 10%-error for up to 98% of the test sample population. The degree to which this level of performance was maintained when varying the number and type of features for regression was dependent on the model type. Linear methods and neural network analysis predicted the most test samples within 10%-error for smaller numbers of features while tree-based approaches could accommodate a larger number of features. The number and type of features can be important if the desire is to make chemical-specific release predictions. The inclusion of release data from related processes generally improved test set predictions across all models while the use of synthetic data as implemented here resulted in smaller increases in test sample predictions within 10%-error. Future work should focus on improving access to primary data and optimizing models to achieve maximum predictive performance of environmental releases to support chemical risk assessment.

Keywords: Decision tree; extrusion; neural network; process read-across; process releases; synthetic data.