IngredSAM: Open-World Food Ingredient Segmentation via a Single Image Prompt

Leyi Chen; Bowen Wang; Jiaxin Zhang

doi:10.3390/jimaging10120305

IngredSAM: Open-World Food Ingredient Segmentation via a Single Image Prompt

J Imaging. 2024 Nov 26;10(12):305. doi: 10.3390/jimaging10120305.

Authors

Leyi Chen¹, Bowen Wang², Jiaxin Zhang³

Affiliations

¹ College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China.
² D3 Center, Osaka University, 2-1, Yamadaoka, Osaka 5650871, Japan.
³ Architecture and Design College, Nanchang University, No. 999, Xuefu Avenue, Honggutan New District, Nanchang 330031, China.

Abstract

Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. Thus, we introduce IngredSAM, a novel approach for open-world food ingredient semantic segmentation, extending the capabilities of the Segment Anything Model (SAM). Utilizing visual foundation models (VFMs) and prompt engineering, IngredSAM leverages discriminative and matchable semantic features between a single clean image prompt of specific ingredients and open-world images to guide the generation of accurate segmentation masks in real-world scenarios. This method addresses the challenges of traditional supervised models in dealing with the diverse appearances and class imbalances of food ingredients. Our framework demonstrates significant advancements in the segmentation of food ingredients without any training process, achieving 2.85% and 6.01% better performance than previous state-of-the-art methods on both FoodSeg103 and UECFoodPix datasets. IngredSAM exemplifies a successful application of one-shot, open-world segmentation, paving the way for downstream applications such as enhancements in nutritional analysis and consumer dietary trend monitoring.

Keywords: SAM; open-world food segmentation; visual prompting.

Grants and funding

24K20795/JSPS KAKENHI