Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Yaping Zhao; Ping Zhu; Yizhang Jiang; Kaijian Xia

doi:10.3389/fnut.2024.1469878

Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation

Front Nutr. 2024 Dec 17:11:1469878. doi: 10.3389/fnut.2024.1469878. eCollection 2024.

Authors

Yaping Zhao^{1

2}, Ping Zhu^{2

3}, Yizhang Jiang¹, Kaijian Xia^{2

3}

Affiliations

¹ School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China.
² Changshu Key Laboratory of Medical Artificial Intelligence and Big Data, Suzhou, Jiangsu, China.
³ Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.

Abstract

Introduction: Nutrition is closely related to body health. A reasonable diet structure not only meets the body's needs for various nutrients but also effectively prevents many chronic diseases. However, due to the general lack of systematic nutritional knowledge, people often find it difficult to accurately assess the nutritional content of food. In this context, image-based nutritional evaluation technology can provide significant assistance. Therefore, we are dedicated to directly predicting the nutritional content of dishes through images. Currently, most related research focuses on estimating the volume or area of food through image segmentation tasks and then calculating its nutritional content based on the food category. However, this method often lacks real nutritional content labels as a reference, making it difficult to ensure the accuracy of the predictions.

Methods: To address this issue, we combined segmentation and regression tasks and used the Nutrition5k dataset, which contains detailed nutritional content labels but no segmentation labels, for manual segmentation annotation. Based on these annotated data, we developed a nutritional content prediction model that performs segmentation first and regression afterward. Specifically, we first applied the UNet model to segment the food, then used a backbone network to extract features, and enhanced the feature expression capability through the Squeeze-and-Excitation structure. Finally, the extracted features were processed through several fully connected layers to obtain predictions for the weight, calories, fat, carbohydrates, and protein content.

Results and discussion: Our model achieved an outstanding average percentage mean absolute error (PMAE) of 17.06% for these components. All manually annotated segmentation labels can be found at https://doi.org/10.6084/m9.figshare.26252048.v1.

Keywords: Nutrition5k; deep learning; image segmentation; nutrition estimation; regression.

Associated data

figshare/10.6084/m9.figshare.26252048.v1

Grants and funding

The authors declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported in part by the National Natural Science Foundation of China (No. 62171203), the Jiangsu Graduate Workstation Project in Jiangnan University, the Jiangsu Province “333 Project” High-level Talent Cultivation Subsidized Project, the Suzhou Key Supporting Subjects, Health Informatics (No. SZFCXK202147), the Changshu Science and Technology Program (Nos. CS202015 and CS202246), and the Changshu Key Laboratory of Medical Artificial Intelligence and Big Data (Nos. CYZ202301 and CS202314).