Parallel data analysis was investigated to improve performance in variable selection and to develop predictive models for beer quality control. A set of spectral near infrared (NIR) data from 60 beer samples and its primitive extracts as the original concentration was used. The dataset was distributed to Raspberry Pi 3 Model B devices connected to a network that was running a Machine Learning service. With more than 4 devices acting in parallel, it was possible to reduce time in 57% to find the best linear regression coefficient (0.999) with the lower RMSECV (0.216) if compared to a singular desktop computer. Thus, parallel processing can significantly reduce the time to indicate the best model fitted during the variable's selection.
Keywords: Methods of Parallelism; Partial Least Squares; Variable selection; WebService.
Copyright © 2021 Elsevier Ltd. All rights reserved.