Artificial intelligence-based prediction of lycopene content in raw tomatoes using physicochemical attributes

Phytochem Anal. 2023 Oct;34(7):729-744. doi: 10.1002/pca.3185. Epub 2022 Nov 11.

Abstract

Introduction: Lycopene consumption reduces risk and incidence of cancer and cardiovascular diseases. Tomatoes are a rich source of phytochemical compounds including lycopene as a major constituent. Lycopene estimation using high-performance liquid chromatography is time-consuming and expensive.

Objective: To develop artificial intelligence models for prediction of lycopene in raw tomatoes using 14 different physicochemical parameters including salinity, total dissolved solids (TDS), electrical conductivity (EC), firmness, pH, total soluble solids (TSS), titratable acidity (TA), colour values on Hunter scale (L, a, b), total phenolic content (TPC), total flavonoid content (TFC) and antioxidant activity (AOA).

Material and methods: The post-harvest data acquisition was collected through investigation for more than 100 raw tomatoes stored for 15 days. Linear multivariate regression (LMVR), principal component regression (PCR) and partial least squares regression (PLSR) models were developed by splitting data set into train and test datasets. The training of models was performed using 10-fold cross validation (CV).

Results: Principal component analysis showed strong positive association between lycopene, colour value 'a', TPC, TFC and AOA. The R2 (CV), root mean square error (RMSE) (CV) and RMSE (Test) for best LMVR model was observed to be at 0.70, 8.48 and 9.69 respectively. The PCR model revealed R2 (CV) at 0.59, RMSE (CV) at 8.91 and RMSE (Test) at 10.17 while PLSR model revealed R2 (CV) at 0.60, RMSE (CV) at 9.10 and RMSE (Test) at 10.11.

Conclusion: Results of the present study show that epidemiological studies suggest fully ripened tomatoes are most beneficial for consumption to ensure recommended daily intake of lycopene content.

Keywords: artificial intelligence; linear multivariate regression; lycopene content; partial least squares regression; post-harvest quality; principal component regression; tomato fruit.