Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. Thus, we introduce IngredSAM, a novel approach for open-world food ingredient semantic segmentation, extending the capabilities of the Segment Anything Model (SAM). Utilizing visual foundation models (VFMs) and prompt engineering, IngredSAM leverages discriminative and matchable semantic features between a single clean image prompt of specific ingredients and open-world images to guide the generation of accurate segmentation masks in real-world scenarios. This method addresses the challenges of traditional supervised models in dealing with the diverse appearances and class imbalances of food ingredients. Our framework demonstrates significant advancements in the segmentation of food ingredients without any training process, achieving 2.85% and 6.01% better performance than previous state-of-the-art methods on both FoodSeg103 and UECFoodPix datasets. IngredSAM exemplifies a successful application of one-shot, open-world segmentation, paving the way for downstream applications such as enhancements in nutritional analysis and consumer dietary trend monitoring.
Keywords: SAM; open-world food segmentation; visual prompting.