Abstract
Maintaining a healthy lifestyle has become increasingly challenging in today’s sedentary society marked by poor eating habits. To address this issue, both national and international organisations have made numerous efforts to promote healthier diets and increased physical activity. However, implementing these recommendations in daily life can be difficult, as they are often generic and not tailored to individuals. This study presents the AI4Food-NutritionDB database, the first nutrition database that incorporates food images and a nutrition taxonomy based on recommendations by national and international health authorities. The database offers a multi-level categorisation, comprising 6 nutritional levels, 19 main categories (e.g., “Meat”), 73 subcategories (e.g., “White Meat”), and 893 specific food products (e.g., “Chicken”). The AI4Food-NutritionDB opens the doors to new food computing approaches in terms of food intake frequency, quality, and categorisation. Also, we present a standardised experimental protocol and benchmark including three tasks based on the nutrition taxonomy (i.e., category, subcategory, and final product recognition). These resources are available to the research community, including our deep learning models trained on AI4Food-NutritionDB, which can serve as pre-trained models, achieving accurate recognition results for challenging food image databases. All these resources are available in GitHub (https://github.com/BiDAlab/AI4Food-NutritionDB).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The Double Burden of Malnutrition (DBM) is defined as the coexistence of both undernutrition and overweight, a global issue affecting populations across all regions worldwide. According to the World Health Organization (WHO), it is estimated that 39% of the adult population is currently overweight today and by 2030, this percentage will reach 50%. In addition, Non-Communicable Diseases (NCD) have been spread in the last century mainly due to bad eating behaviours and the lack of physical activity, among others. These NCDs, such as diabetes, cardiovascular problems, or cancer, result in millions of annual deaths, emphasising the critical need for healthy dietary planning to mitigate these risks [26, 66].
Historically, strategies for promoting healthier diets have been based on recommendations tailored to the general population. National and international organisations have created food pyramids, which serve as guidelines for daily dietary choices across various food groups [40, 58]. Figure 1 provides a graphical representation of a typical food pyramid, categorising food intake into 6 nutritional levels based on intake frequency. Personalising these recommendations from general populations [25, 28], together with the rapid deployment of smart devices and Artificial Intelligence (AI) methods [20, 68], are expected to revolutionise the promotion of healthier lifestyles.
Apart from that, health-related information, such as nutrition and physical activity, can easily be acquired through smartphones and wearable devices today [2, 5]. As a result, a large amount of data has been generated in recent years and their analysis, often referred to as food computing, can provide very interesting insights. Thus, food computing encompasses the acquisition and analysis of heterogeneous food data to address various food-related issues across domains such as medicine, biology, gastronomy, and agronomy [44]. For instance, a popular and user-friendly way of acquiring the eating habits of a person consists of taking pictures of the food consumed. Consequently, millions of food images have been shared through social networks and new computer vision applications based on food detection and recognition have emerged [3, 17, 46, 47].
The current studies are limited in scope, primarily focusing on the pure application of computer vision techniques to only food images, e.g., the task of food recognition. However, the AI4Food-NutritionDB database intends to reduce the gap between resources generated by computer vision experts and the nutrition expert’s guidance. First, we introduce a food image database that includes various nutrition taxonomies such as the nutritional levels of the popular food pyramid and their subcategories, among others. Second, we introduce a benchmark considering novel deep learning models to automatically assess several nutritional aspects and scenarios (i.e., intra- and inter-database). A graphical diagram of the proposed study is shown in Fig. 2.
In this article, we also present our interdisciplinary framework named AI4Food [54], which aims to reduce the gap between computer scientists and nutritionists. Our overall objective is to foster a new generation of technologies focused on modelling users’ habits including food diet and physical activity patterns. AI4Food-NutritionDB comprises a database and a taxonomy aimed at improving current methods and resources for research in food computing. One of the primary objectives of the AI4Food framework is to create a configurable software environment capable of generating synthetic diets, including food images to simulate different user profiles depending on lifestyles and eating behaviours. This functionality has two potential applications, among many others: i) the automatic proposal of healthy diets to the final user that can be frequently updated depending on the specific user’s habits, and ii) the automatic and continuous analysis of the eating habits of the user from the food pictures taken, giving recommendations to improve the food eating habits. To achieve this goal, this article focuses on the generation of a food nutrition image database that includes a taxonomy derived from international nutritional guidelines.
The main contributions of the present study are:
-
AI4Food-NutritionDB database. To the best of our knowledge, this is the first nutrition database that considers food images and the nutrition taxonomy. The proposed taxonomy includes four different levels of categorisations, i.e., 6 nutritional levels (see Fig. 1), 19 main categories (e.g., the family of vertebrate animals such as “Meat”), 73 subcategories (e.g., specific products such as “White Meat”), and 893 final food products (e.g., final products such as “Chicken”). In addition, each subcategory is defined by a type of dish (e.g., “Appetizer” and “Main Dish”) regarding its healthiness and food quantity. Figure 3 provides a graphical description of the database and its associated taxonomy. AI4Food-NutritionDB opens the doors to new food computing approaches in terms of food intake frequency, quality, and categorisation.
-
Proposal of a standard experimental protocol and benchmark, including different recognition tasks (category, subcategory, and final product). The experimental protocol considered comprises both intra- and inter-database scenarios, ensuring robust evaluation.
-
Free release to the research community of the described datasets, protocols, and multiple deep learning models. These models can serve as pre-trained models, achieving accurate recognition results when applied to other challenging food databases. All these resources are publicly available in our GitHub repositoryFootnote 1.
The remainder of the article is organised as follows. State-of-the-art studies related to food computing and food image databases are presented in Section 2. Section 3 explains the design of the AI4Food-NutritionDB food image database. Section 4 describes the proposed standard experimental protocol and benchmark results carried out on the AI4Food-NutritionDB database using deep learning techniques. Finally, conclusions and future studies are drawn up in Section 5.
2 Related works
2.1 Food computing
Food computing has become a very active topic in recent years, applying computational approaches to food-related areas. Methods based on computer vision, data mining, or machine learning, among others, have been used to analyse large amounts of food images obtained from the Internet, social platforms, and smartphones. In general, food computing considers a wide range of tasks, including food segmentation, recognition, and recommendation, with applications in various fields, for instance, in health, biology, or agriculture [1, 44].
Among these tasks, food recognition is one of the most popular ones in the literature. This task consists of detecting and classifying food images using different techniques. Traditional approaches rely on visual features such as shape, colour, and texture for food product detection [59]. Scale Invariant Feature Transform (SIFT), Histogram Oriented Gradients (HOG), and Local Binary Patterns (LBP) are some popular descriptors used in the literature as feature extractors [61]. For classification, Support Vector Machine (SVM) and K-Nearest Neighbour (KNN) algorithms are the most common ones to differentiate food products and categories [51]. However, traditional approaches are ineffective against challenging databases. In contrast, deep learning techniques have shown better performances instead [42, 47]. Concretely, complex architectures based on Convolutional Neural Networks (CNNs) consider both feature extraction and classification together, achieving accuracy (Acc.) rates of above 80% [4, 34]. For instance, Min et al. [47] used the architecture Squeeze-and-Excitation Network (SENet) [31], achieving a high inter-database generalisation capacity with a 91.45% Top-1 Acc. on the VireoFood-172 database. They also considered in [46, 47] other deep learning architectures, for instance, Stacked Global-Local Attention Network (SGLANet) and Progressive Region Enhancement Network (PRENet). Experiments were carried out using the ISIA Food-500 and Food2K databases [46, 47], achieving Top-1 Acc. results of 64.74% and 83.62%, respectively.
It is important to highlight that most published studies focus on food recognition at the final food product level (using as labels the name of the dish, for example, “Pasta alla Norma”) or the main category (e.g., “Fast Food"). However, in the present article, we analyse the task of food recognition based on the proposed nutrition taxonomy (6 nutritional levels, 19 main categories, 73 subcategories, and 893 final food products) as this is needed for many real applications, particularly those related to healthy dietary practices. In addition, each subcategory is defined by a type of dish (e.g., “Appetizer” and “Main Dish”) regarding its healthiness and food quantity.
2.2 Food image databases
Many food image databases have emerged in recent years, including a wide range of food products from various world areas. These databases are categorized based on three different acquisition protocols found in the literature (i.e., self-collected, web scraping, and combination). Table 1 provides a summary of these databases and their metadata, including the protocol used, the number of classes and images, and the world region:
-
Self-collected: this consists of taking food images from a camera or smartphone in controlled or semi-controlled environments. Although databases such as PFID [13], UNIMIB2015 [16], or UNIMIB2016 [17] include a large variety of food products, the number of total images is relatively low (< 20K images) due to the extensive manual process. Similarly, UNICT-FD889 [23] and UNICT-FD1200 [24] are two databases with 889 and 1,200 final food products, respectively, which represent food dishes from different parts of the world and nationalities (e.g., English, Japanese, Indian, and Italian, among others). In addition, FruitVeg-81 [64] contains more than 15,000 fruit and vegetable images, and Mixed-Dish [21], in contrast, is a food image database of 164 Asian food products. Finally, F4H [9] and Food-Pics Extended [7] are two databases captured in a controlled scenario and a plain background.
-
Web Scraping: web scraping techniques are employed to acquire large amounts of food images. In contrast to self-collected techniques, thousands of food images can be easily captured from social and web platforms. Some databases focus on food products from specific regions of the world, for instance, traditional Japanese and Chinese dishes (e.g., Food50 [35], UECFood-256 [37], VireoFood-251 [11]) while others include dishes from Europe and North America (e.g., Food-101 [8], UPMCD Food-101 [65]). Additionally, databases like TurkishFoods-15 [29], KenyanFood13 [33], and VIPER-FoodNet [39] are three food image databases from Turkey, Kenya, and the United States, respectively. Other databases include food dishes from several regions of the world. Instagram 800k [53], Food500 [43], ISIA Food-200 [45], and FoodX-251 [36]. Similarly to FruitVeg-81, VegFru [30] contains only fruit and vegetable images. ISIA Food-500 [46] is a database with 500 food final food products and lastly, Food2K [47] is a recent database with around 1M food images organised into 2K food products.
-
Combination: this consists in creating new food image databases by combining data from existing ones. For instance, Food201-Segmented [49], is derived from the Food-101 database, supplemented with food tags using crowd-sourcing. Food-11 [57] is a database created primarily from three different databases (Food-101, UECFOOD-100, and UECFOOD-256) and grouped into 11 main food categories. Multi-Attribute Food-121 (MAFood-121) [3] database comprises the top-11 most popular cuisines in the world according to Google Trends (such as French, Mexican, or Vietnamese cuisines) and comprises 121 final food products and more than 21K food images.
To summarise, various food image databases have been presented in the literature considering different acquisition approaches and conditions. However, none of them have previously incorporated a nutrition taxonomy that assesses the quality, quantity, and intake frequency of foods based on images. The database proposed in this study offers a nutritional categorisation that facilitates the development of a new generation of food computing algorithms that foster its use in various food-related areas.
3 AI4Food-NutritionDB database
The proposed AI4Food-NutritionDB is the first nutrition database that considers food images and the nutrition taxonomy. This taxonomy includes four different levels of categorisation, i.e., 6 nutritional levels (see Fig. 1), 19 main categories (e.g., the family of vertebrate animals such as “Meat”), 73 subcategories (e.g., specific products such as “White Meat”), and 893 final food products (e.g., final products such as “Chicken”). In addition, each subcategory is defined by a type of dish (e.g., “Appetizer” and “Main Dish”) considering factors related to healthiness and food quantity. Figure 3 provides a graphical description of the database. AI4Food-NutritionDB has been built by combining food images from seven different databases, encompassing food products from all over the world. We provide next all the information regarding the source databases (Section 3.1) and the construction process of the AI4Food-NutritionDB (Section 3.2).
3.1 Source food image databases
Seven state-of-the-art food image databases were selected to construct our database. These databases encompass various world regions and exhibit different characteristics.
3.1.1 UECFood-256 Footnote 2 [ 37 ]
UECFood-256 contains 256 food products and more than 30K Japanese food images from different platforms such as Bing Image Search, Flickr, and Twitter (web scraping acquisition). In addition, they employed Amazon Mechanical Turk (AMT) for image selection and labelling.
3.1.2 Food-101 Footnote 3 [ 8 ]
This database comprises over 100K food images and 101 unique food products from various world regions. All the images were sourced from the FoodSpotting application, a social platform where individuals uploaded and shared food images.
3.1.3 Food-11 Footnote 4 [ 57 ]
Singla et al. analysed the eating behaviour to construct a database that comprised some of the food groups consumed in the United States. This way they defined 11 general food categories from the United States Department of Agriculture (USDA), including bread, dairy products, dessert, eggs, fried food, meat, noodle/pasta, rice, seafood, soups, and vegetables/fruits. They also combined three different databases (Food-101, UECFood-100, and UECFood-256) and two social platforms (Flickr and Instagram) to finally accumulate more than 16K food images.
3.1.4 FruitVeg-81 Footnote 5 [ 64 ]
Many of the state-of-the-art food image databases do not consider many fruit or vegetable food products. As a distinctive feature, this database contains images in these mentioned groups highly underrepresented. FruitVeg-81 database has 81 different fruits and vegetable food products acquired from the self-collected acquisition protocol.
3.1.5 MAFood-121 Footnote 6 [ 3 ]
Considering the 11 most popular cuisines in the world (according to Google Trends), Aguilar et al. released the MAFood-121 database. This database contains 121 unique food products and around 21K food images grouped into 10 main categories (bread, eggs, fried food, meat, noodle/pasta, rice, seafood, soup, dumpling, and vegetables). They utilised the combination acquisition protocol, using three state-of-the-art public databases (Food-101, UECFood-256, and TurkishFoods-15) and a private one.
3.1.6 ISIA Food-500 Footnote 7 [ 46 ]
ISIA Food-500 is a database released in 2020. All food images (around 400K) are organised into 500 different food products and were acquired from Google, Baidu, and Bing search engines, including both Western and Eastern cuisines. Following a similar approach to other databases, they categorised all food products into 11 major groups, including meat, cereal, vegetables, seafood, fruits, dairy products, bakery products, fat products, pastries, drinks, and eggs.
3.1.7 VIPER-FoodNet Footnote 8 [ 39 ]
Similar to the Food-11 database, VIPER-FoodNet is an 82-food-product database, selected based on the most commonly consumed items in the United States from the What We Eat In America (WWEIA) databaseFootnote 9. All the images were obtained through web scraping, specifically from Google Images.
As a result, the proposed AI4Food-NutritionDB initially comprises 1,152 food products with 586,914 food images. This diverse database represents traditional dishes from several world areas, such as Food-101 with Western dishes, UECFood-256 with traditional Japanese dishes, and VIPER FoodNet, with typical food dishes from the United States. In addition, the ISIA Food-500 database has 500 food products from various countries, and the FruitVeg-81 database also includes fruit and vegetable images from several world regions. Finally, it is important to highlight that Food-11 and MAFood-121 databases are created from some of the previous databases. As a result, post-processing was conducted to remove duplicated images.
3.2 Food product categorisation
Each of the 1,152 food products obtained in the previous stage is individually processed for classification into the following levels: i) nutritional level, ii) category, iii) subcategory, and iv) type of dish. Table 2 summarise information from the AI4Food-NutritionDB database, including the levels, the number of products, and the type of dish for each subcategory. For completeness, we provide in Fig. 4 a graphical representation of the categories, subcategories, and nutritional levels considered in AI4Food-NutritionDB. Each subcategory features one or two food images labelled with its corresponding nutritional level, and subcategories are further grouped into main categories.
3.2.1 Process of categorisation
Three different stages are considered to classify each food product into subcategories, categories, and types of dishes. In the initial stage, the food product’s taxonomy is extracted using FoodOn ontologyFootnote 10 [22], which provides supercategories and subcategories for corresponding food products. Although this ontology comprises over 9K food products, a high percentage of the analysed items is not contemplated by FoodOn, particularly those translated from their original language (e.g., the Danish plate æbleflæsk). The second stage involves querying the food term within the TasteAtlas web platformFootnote 11, which contains around 10K traditional dishes worldwide. In addition, metadata such as ingredients, dish type, or food region is included in this platform.
Finally, in the third stage, each final food product is classified into subcategories and categories, and all examined food terms go through a review and unification process. This step involves eliminating terms that do not comply with established criteria and merging those with similar characteristics. The outcome of this process yields a collection of 893 final food products.
3.2.2 Nutritional level
The nutritional level indicates the intake frequency for a specific food product and is determined by the popular nutritional pyramids proposed by national and international organisations, such as the United States Department of Agriculture (USDA) pyramid [40] and the Spanish Society of Community Nutrition (SENC) pyramid [58]. Figure 1 provides a graphical representation of the typical food pyramid considered for AI4Food-NutritionDB based on 6 different nutritional levels. A lower nutritional level (at the pyramid’s top) implies limited consumption, whereas a higher level (at the pyramid’s bottom), denotes greater intake.
Most food products align with the different nutritional levels proposed in the pyramid, allowing the nutritional level assignment. However, some food products or subcategories are not directly contemplated by nutritional level, e.g., “Fried Vegetables” or “Rice and Fish”. For all these ambiguous cases, the nutrition experts of the AI4Food framework have manually defined the appropriate nutritional level.
3.2.3 Dish type
Following a similar process to the nutritional level assignment, each subcategory is set to a dish type to differentiate it from others that can be found during a meal, since the quantity of each dish significantly varies. Seven different types of dishes are defined, following the guidelines established in [50]:
-
Main Dish: this type of dish represents most of the subcategories defined and includes both first and second courses.
-
Appetizer: this dish is usually consumed before the main dish and the quantity is relatively less. “Pâté”, “Cheese”, and “Other Types of Bread” are included in it.
-
Snack: similar to an appetizer, this is consumed at any time of the day. All the “Salty Snack” subcategories and “Sauce” comprise this dish.
-
Dessert: usually consumed at the end of a meal, dessert often consists of sweet food products. In this study, the “Fruits” and “Toast” subcategories, and “Sweet Products” category, are included in the Dessert dish type.
-
Side Dish: served with main dishes such as “Fries” and “Side Dish Salad”.
-
Bread: this basic food product is usually eaten alongside main dishes. In this case, only the “Bread” subcategory is included.
-
Drinks: this is represented by the “Drinks” products.
As a result, the AI4Food-NutritionDB database comprises 558,676 food images grouped into 6 nutritional levels, 19 main categories, 73 subcategories, and 893 food products as depicted in Table 2.
4 AI4Food-NutritionDB benchmark
This section describes the proposal of a standard experimental protocol and benchmark evaluation of AI4Food-NutritionDB, based on the nutrition taxonomy (category, subcategory, and final product). First, the deep learning recognition systems are described in Section 4.1. Then, Section 4.2 describes the proposed experimental protocol. Finally, Sections 4.3 and 4.4 provide the recognition results achieved on intra- and inter-database scenarios, respectively.
In addition, we provide the complete experimental protocol, benchmark evaluation, and pre-trained models, all of which are available on our GitHub repositoryFootnote 12. The repository contains detailed documentation and instructions for reproducing our experiments and scenarios.
4.1 Proposed food recognition systems
The proposed food recognition systems utilize state-of-the-art CNN architectures, namely Xception [15] and EfficientNetV2Footnote 13 [62]. These architectures have been selected due to outstanding performances in computer vision tasks such as food recognition, deepfake detection, and image classification in general [12, 48, 63]. First, the Xception approach is inspired by Inception [60], replacing Inception modules with depthwise separable convolutions. Secondly, the EfficientNetV2 approach is an optimised model within the EfficientNet family of architectures, able to achieve better results with fewer parameters compared to other models in challenging databases like ImageNet [38].
In this study, we follow the same training approach considered in [63], using a pre-trained model with ImageNet, where the last fully-connected layers are replaced with the number of classes specific to each experiment. Then, all the weights from the model are fixed up to the fully-connected layers and re-trained for over 10 epochs. Subsequently, the entire network is trained again for 50 more epochs, choosing the best-performing model in terms of validation accuracy. We use the following features for all experiments, employing an Adam optimiser based on binary cross-entropy using a learning rate of \(10^-3\), and \(\beta _1\) and \(\beta _2\) of 0.9 and 0.999, respectively. In addition, training and testing are performed with an image size of 224\(\times \)224. The experimental protocol was executed with the aid of an NVIDIA GeForce RTX 4090 GPU, utilising the Keras library.
4.2 Experimental protocol
For reproducibility reasons, we adopt the same experimental protocol considered in the collected databases, dividing them into development and test subsets following each corresponding subdivision. In addition, the development subset is also divided into train and validation subsets. However, three of the collected databases -FruitVeg81, UECFood-256, and Food-101- do not contain this division. In such cases, we employ a similar procedure as presented in [63]. Around 80% of the images comprise the development subset, with the train and validation subsets also distributed around 80% and 20% of the development subset, respectively. The remaining images correspond to the test subset (around 20%). It is important to remark that no images are duplicated across the three subsets (train, validation, and test) in any of the seven databases. Similarly to [47], Top-1 (Top-1 Acc.) and Top-5 classification accuracy (Top-5 Acc.) are used as evaluation metrics.
4.3 Intra-database results
Three different scenarios are considered for the intra-database evaluation of the AI4Food-NutritionDB database. Each scenario represents a different level of granularity defined by the number of categories (19), subcategories (73), and final products (893). Table 3 summarises the performances obtained for the different intra-database scenarios and deep learning architectures considered in the AI4Food-NutritionDB. For completeness, we also include the results achieved on the individual databases included in AI4Food-NutritionDB. We highlight the best results in bold for each dataset and nutrition taxonomy. This allows us to also assess the model’s performance across the different subsets. Regarding the whole AI4Food-NutritionDB database, category scenario performances show the best results, obtaining 77.74% Top-1 Acc. and 97.78% Top-5 Acc. for Xception, and 82.04% Top-1 Acc. and 98.45% Top-5 Acc. for EfficientNetV2. However, the performance significantly drops as the granularity becomes finer for both architectures. For example, for the EfficientNetV2 architecture, the Top-1 Acc. decreases from 82.04% to 77.66% and 66.28% for the subcategory (73 classes) and product (893 classes) analysis, respectively. This decrease is mainly due to the similarity in appearance among different subcategories (e.g., “White Meat” and “Red Meat”), final products (e.g., “Pizza Carbonara” and “Pizza Pugliese”), or even the same food cooked in several manners (e.g., “Baked Salmon” and “Cured Salmon”). Regarding each specific dataset, the FruitVeg-81 dataset shows the best results in general for both deep learning architectures, classifying almost perfectly the different fruits and vegetables (over 98% Top-1 and Top-5 Acc. for all categorisation scenarios). Contrarily, the VIPER-FoodNet dataset obtains the worst results in each categorisation scenario as images sometimes contain food products with mixed ingredients (e.g., different types of beans, meat, and pasta). Finally, in terms of the deep learning architecture, EfficientNetV2 outperforms Xception in all scenarios (category, subcategory, product) of the AI4Food-NutritionDB for both Top-1 Acc. and Top-5 Acc. metrics. These results highlight the potential of the state-of-the-art EfficientNetV2 architecture for the nutrition taxonomy proposed in the present article.
4.4 Inter-database results
To assess the generalisation ability of our deep learning models pre-trained with the proposed AI4Food-NutritionDB, we include an inter-database scenario using the challenging VireoFood-251 food image database [11], which is an extended version of VireoFood-172 [10]. This database comprises over 169K food images distributed in 251 Chinese food plates, which were not included in AI4Food-NutritionDB. In this experiment, we consider two different scenarios based on the training process. First, we consider XceptionNet and EfficientNetV2 architectures both pre-trained only with ImageNet [38]. Secondly, we consider again both architectures but pre-trained with AI4Food-NutritionDB. In the last case, three different models are considered, each trained at a different level of granularity following our proposed nutrition taxonomy (category, subcategory, and final product). In order to reproduce the same experimental protocol proposed by the authors, we only train the last fully-connected layers from each pre-trained model for 30 epochs, freezing the rest of the model. Table 4 shows the final test results obtained in each scenario for the final product categorisation (i.e., 251 Chinese food plates). Again, the best performances are marked in bold for each deep learning model. The results indicate that using our pre-trained models with AI4Food-NutritionDB improve the performance in terms of both Top-1 and Top-5 Acc. in comparison with the models pre-trained with only the ImageNet database. For instance, for the Xception architecture, the model pre-trained with the AI4Food-NutritionDB achieves for the final product categorisation results of 82.10% Top-1 and 95.71% Top-5 Acc., much better results in comparison with the 58.91% Top-1 and 83.78% Top-5 Acc. obtained with the ImageNet model. For the EfficientNetV2 architecture, results are even better with 88.80% Top-1 Acc. and 98.07% Top-5 Acc. Therefore, the proposed deep learning models trained with the proposed AI4Food-NutritionDB can effectively serve as reliable pre-trained models, achieving accurate recognition results with unseen food databases.
5 Conclusion and future study
This article presents the AI4Food-NutritionDB, the first database with a nutrition taxonomy and over 560K food images. Furthermore, we propose a standardised experimental protocol and benchmark for the AI4Food-NutritionDB, utilising food recognition systems based on two state-of-the-art architectures. Our evaluation encompasses both intra- and inter-database scenarios across different food recognition levels. We finally prove that our pre-trained models using AI4Food-NutritionDB can improve state-of-the-art food recognition systems in challenging scenarios. Our contribution facilitates the development of novel food computing approaches that foster a better understanding of what we eat.
This study opens several future research lines including the improvement of the database by incorporating new taxonomy levels from nutritional experts (e.g., based on the nutritional composition of the ingredients or the composition of the prepared food at hand). On the other hand, behavioural habits (e.g., physical activity, sleep quality) are key factors strongly related to the impact of nutrition on our health [55]. Future studies will significantly benefit by incorporating comprehensive multimodal models of user habits towards personalised interventions adapted to individual characteristics and necessities [54]. For instance, new studies could focus on the impact of glucose from food intake on metabolic health or the impact of sleep quality on dietary patterns [27]. We also plan to integrate statistical [32] and human-readable food descriptors through recent Large Language Models (LLMs) [19] into our framework to improve both classification rates and interpretability of our models [6].
Data Availability
Notes
References
Acharya B, Ghosh A, Panda S et al (2023) Automated Plant Recognition System with Geographical Position Selection for Medicinal Plants. Adv Multimed 2023. https://doi.org/10.1155/2023/3974346
Acien A, Morales A, Vera-Rodriguez R, et al (2020) Smartphone Sensors for Modeling Human-Computer Interaction: General Outlook and Research Datasets for User Authentication. In: Proc. IEEE conference on computers, software, and applications, pp 1273–1278. https://doi.org/10.1109/COMPSAC48688.2020.00-81
Aguilar E, Bolaños M, Radeva P (2017) Food Recognition using Fusion of Classifiers Based on CNNs. In: Proc. international conference on image analysis and processing, Springer, pp 213–224. https://doi.org/10.1007/978-3-319-68548-9_20
Aguilar E, Remeseiro B, Bolaños M et al (2018) Grab, Pay, and Eat: Semantic Food Detection for Smart Restaurants. IEEE Trans Multimed 20(12):3266–3275. https://doi.org/10.1109/TMM.2018.2831627
Badshah S, Khan AA, Hussain S et al (2021) What Users Really Think about the Usability of Smartphone Applications: Diversity based Empirical Investigation. Multimed Tools Appl 80:9177–9207. https://doi.org/10.1007/s11042-020-10099-x
Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J et al (2020) Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf Fusion 58:82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Blechert J, Lender A, Polk S et al (2019) Food-Pics_Extended - An Image Database for Experimental Research on Eating and Appetite: Additional Images, Normative Ratings and an Updated Review. Front Psychol 10:307. https://doi.org/10.3389/fpsyg.2019.00307
Bossard L, Guillaumin M, Van Gool L (2014) Food-101 – Mining Discriminative Components with Random Forests. In: Fleet D, Pajdla T, Schiele B et al (eds) Proc. European Conference on Computer Vision, pp 446–461. https://doi.org/10.1007/978-3-319-10599-4_29
Charbonnier L, van Meer F, van der Laan LN et al (2016) Standardized Food Images: A Photographing Protocol and Image Database. Appetite 96:166–173. https://doi.org/10.1016/j.appet.2015.08.041
Chen J, Ngo CW (2016) Deep-based Ingredient Recognition for Cooking Recipe Retrival. ACM Multimedia pp 32–41. https://doi.org/10.1145/2964284.2964315
Chen J, Zhu B, Ngo CW et al (2021) A Study of Multi-Task and Region-Wise Deep Learning for Food Ingredient Recognition. IEEE Trans Image Process 30:1514–1526. https://doi.org/10.1109/TIP.2020.3045639
Chen L, Li S, Bai Q et al (2021) Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens 13(22):4712. https://doi.org/10.3390/rs13224712
Chen M, Dhingra K, Wu W et al (2009) PFID: Pittsburgh Fast-Food Image Dataset. In: Proc. IEEE International Conference on Image Processing, pp 289–292. https://doi.org/10.1109/ICIP.2009.5413511
Chen X, Zhou H, Zhu Y et al (2017) ChineseFoodNet: A Large-Scale Image Dataset for Chinese Food Recognition. arXiv:1705.02743. https://doi.org/10.48550/arXiv.1705.02743
Chollet F (2017) Xception: Deep Learning with Depthwise Separable Convolutions. In: Proc. Conference on Computer Vision and Pattern Recognition, pp 1251–1258. https://doi.org/10.1109/CVPR.2017.195
Ciocca G, Napoletano P, Schettini R (2015) Food Recognition and Leftover Estimation for Daily Diet Monitoring. In: Proc. International Conference on Image Analysis and Processing, pp 334–341. https://doi.org/10.1007/978-3-319-23222-5_41
Ciocca G, Napoletano P, Schettini R (2016) Food Recognition: A New Dataset, Experiments, and Results. IEEE J Biomed Health Informa 21(3):588–598. https://doi.org/10.1109/JBHI.2016.2636441
Ciocca G, Napoletano P, Schettini R (2017) Learning CNN-based Features for Retrieval of Food Images. In: Battiato S, Farinella GM, Leo M et al (eds) Proc. New Trends in Image Analysis and Processing, pp 426–434. https://doi.org/10.1007/978-3-319-70742-6_41
Deandres-Tame I, Tolosana R, Vera-Rodriguez R et al (2024) How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability. IEEE Access pp 1–1. https://doi.org/10.1109/ACCESS.2024.3370437
Delgado-Mohatar O, Tolosana R, Fierrez J et al (2020) Blockchain in the Internet of Things: Architectures and Implementation. In: Proc. IEEE conference on computers, software, and applications, pp 1072–1077. https://doi.org/10.1109/COMPSAC48688.2020.0-131
Deng L, Chen J, Sun Q et al (2019) Mixed-Dish Recognition with Contextual Relation Networks. In: Proc. ACM international conference on multimedia, pp 112–120. https://doi.org/10.1145/3343031.3351147
Dooley DM, Griffiths EJ, Gosal GS et al (2018) FoodOn: A Harmonized Food Ontology to Increase Global Food Traceability, Quality Control and Data Integration. npj Science of Food 2(1):1–10. https://doi.org/10.1038/s41538-018-0032-6
Farinella GM, Allegra D, Stanco F (2014) A Benchmark Dataset to Study the Representation of Food Images. In: Proc. European conference on computer vision, Springer, pp 584–599. https://doi.org/10.1007/978-3-319-16199-0_41
Farinella GM, Allegra D, Moltisanti M et al (2016) Retrieval and Classification of Food Images. Comput Biol Med 77:23–39. https://doi.org/10.1016/j.compbiomed.2016.07.006
Fierrez-Aguilar J, Garcia-Romero D, Ortega-Garcia J et al (2005) Adapted User-Dependent Multimodal Biometric Authentication Exploiting General Information. Pattern Recogn Lett 26(16):2628–2639. https://doi.org/10.1016/j.patrec.2005.06.008
Finkelstein EA, Khavjou OA, Thompson H et al (2012) Obesity and Severe Obesity Forecasts through 2030. Am J Prev Med 42(6):563–570. https://doi.org/10.1016/j.amepre.2011.10.026
Fontana JM, Farooq M, Sazonov E (2021) Detection and Characterization of Food Intake by Wearable Sensors. In: Wearable Sensors, pp 541–574. https://doi.org/10.1016/B978-0-12-819246-7.00020-6
Galbally J, Plamondon R, Fierrez J et al (2012) Synthetic On-line Signature Generation. Part I: Methodology and Algorithms. Pattern Recognition 45:2610–2621. https://doi.org/10.1016/j.patcog.2011.12.011
Güngör C, Baltacı F, Erdem A et al (2017) Turkish Cuisine: A Benchmark Dataset with Turkish Meals for Food Recognition. In: Proc. Signal Processing and Communications Applications Conference, pp 1–4. https://doi.org/10.1109/SIU.2017.7960494
Hou S, Feng Y, Wang Z (2017) VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization. In: Proc. IEEE international conference on computer vision, pp 541–549. https://doi.org/10.1109/ICCV.2017.66
Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation Networks. In: Proc. Conference on Computer Vision and Pattern Recognition, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
Huertas-Tato J, Martin A, Fierrez J et al (2022) Fusing CNNs and Statistical Indicators to Improve Image Classification. Inf Fusion 79:174–187. https://doi.org/10.1016/j.inffus.2021.09.012
Jalal M, Wang K, Jefferson S et al (2019) Scraping Social Media Photos Posted in Kenya and Elsewhere to Detect and Analyze Food Types. In: Proc. international workshop on multimedia assisted dietary management, pp 50–59. https://doi.org/10.1145/3347448.3357170
Jiang S, Min W, Liu L et al (2020) Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Trans Image Process 29:265–276. https://doi.org/10.1109/TIP.2019.2929447
Joutou T, Yanai K (2009) A Food Image Recognition System with Multiple Kernel Learning. In: Proc. IEEE international conference on image processing, pp 285–288. https://doi.org/10.1109/ICIP.2009.5413400
Kaur P, Sikka K, Wang W et al (2019) Foodx-251: A Dataset for Fine-Grained Food Classification. arXiv:1907.06167. https://doi.org/10.48550/arXiv.1907.06167
Kawano Y, Yanai K (2014) Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation. In: Proc. ECCV workshop on transferring and adapting source knowledge in computer vision, pp 3–17. https://doi.org/10.1007/978-3-319-16199-0_1
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet Classification with Deep Convolutional Neural Networks. Adv Neural Inf Process Syst 25. https://doi.org/10.1145/3065386
Mao R, He J, Shao Z et al (2021) Visual Aware Hierarchy Based Food Recognition. In: Proc. International Conference on Pattern Recognition, pp 571–598. https://doi.org/10.1007/978-3-030-68821-9_47
Marcus J (2014) Nutrition Basics: What is Inside Food, How it Functions and Healthy Guidelines. Culinary Nutrition pp 1–50. https://doi.org/10.1016/B978-0-12-391882-6.00001-7
Matsuda Y, Hoashi H, Yanai K (2012) Recognition of Multiple-Food Images by Detecting Candidate Regions. In: Proc. IEEE International Conference on Multimedia and Expo, pp 25–30. https://doi.org/10.1109/ICME.2012.157
McAllister P, Zheng H, Bond R et al (2018) Combining Deep Residual Neural Network Features with Supervised Machine Learning Algorithms to Classify Diverse Food Image Datasets. Comput Biol Med 95:217–233. https://doi.org/10.1016/j.compbiomed.2018.02.008
Merler M, Wu H, Uceda-Sosa R et al (2016) Snap, Eat, RepEat: A Food Recognition Engine for Dietary Logging. In: Proc. international workshop on multimedia assisted dietary management, pp 31–40. https://doi.org/10.1145/2986035.2986036
Min W, Jiang S, Liu L et al (2019) A Survey on Food Computing. ACM Comput Surv 52(5):1–36. https://doi.org/10.1145/3329168
Min W, Liu L, Luo Z et al (2019b) Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition. In: Proc. ACM international conference on multimedia, pp 1331–1339. https://doi.org/10.1145/3343031.3350948
Min W, Liu L, Wang Z et al (2020) ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. In: Proc. ACM international conference on multimedia, pp 393–401. https://doi.org/10.1145/3394171.3414031
Min W, Wang Z, Liu Y et al (2023) Large Scale Visual Food Recognition. IEEE Trans Pattern Anal Mach Intell 45(8):9932–9949. https://doi.org/10.1109/TPAMI.2023.3237871
Morales R, Quispe J, Aguilar E (2023) Exploring Multi-food Detection Using Deep Learning-based Algorithms. In: 2023 IEEE 13th International Conference on Pattern Recognition Systems (ICPRS), pp 1–7. https://doi.org/10.1109/ICPRS58416.2023.10179037
Myers A, Johnston N, Rathod V et al (2015) Im2Calories: Towards an Automated Mobile Vision Food Diary. In: Proc. IEEE international conference on computer vision, pp 1233–1241. https://doi.org/10.1109/ICCV.2015.146
Popovski G, Seljak BK, Eftimov T (2019) FoodBase Corpus: A New Resource of Annotated Food Entities. Database 2019:baz121. https://doi.org/10.1093/database/baz121
Pouladzadeh P, Shirmohammadi S, Yassine A (2014) Using Graph Cut Segmentation for Food Calorie Measurement. In: Proc. IEEE international symposium on medical measurements and applications, pp 1–6. https://doi.org/10.1109/MeMeA.2014.6860137
Qiu J, Lo FPW, Sun Y et al (2019) Mining Discriminative Food Regions for Accurate Food Recognition. In: Proc. British machine vision conference, p 158. https://doi.org/10.48550/arXiv.2207.03692
Rich J, Haddadi H, Hospedales TM (2016) Towards Bottom-up Analysis of Social Food. In: Proc. international conference on digital health conference, pp 111–120. https://doi.org/10.1145/2896338.2897734
Romero-Tapiador S, Lacruz-Pleguezuelos B, Tolosana R et al (2023a) AI4FoodDB: A Database for Personalized e-Health Nutrition and Lifestyle through Wearable Devices and Artificial Intelligence. Database 2023:baad049. https://doi.org/10.1093/database/baad049
Romero-Tapiador S, Tolosana R, Morales A et al (2023) AI4Food-NutritionFW: A Novel Framework for the Automatic Synthesis and Analysis of Eating Behaviours. IEEE Access 11:112199–112211. https://doi.org/10.1109/ACCESS.2023.3322770
Sahoo D, Hao W, Ke S et al (2019) FoodAI: Food Image Recognition Via Deep Learning for Smart Food Logging. In: Proc. ACM SIGKDD international conference on knowledge discovery & data mining, pp 2260–2268. https://doi.org/10.1145/3292500.3330734
Singla A, Yuan L, Ebrahimi T (2016) Food/Non-Food Image Classification and Food Categorization Using Pre-Trained GoogLeNet Model. In: Proc. international workshop on multimedia assisted dietary management, pp 3—11. https://doi.org/10.1145/2986035.2986039
Sociedad Española De Nutrición Comunitaria (2016) Guías Alimentarias para la Población Española (SENC, Diciembre 2016); la Nueva Pirámide de la Alimentación Saludablea. Nutrición hospitalaria 33(8):1–48. https://doi.org/10.20960/nh.827
Subhi MA, Ali SH, Mohammed MA (2019) Vision-Based Approaches for Automatic Food Recognition and Dietary Assessment: A Survey. IEEE Access 7:35370–35381. https://doi.org/10.1109/ACCESS.2019.2904519
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proc. IEEE conference on computer vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Tammachat N, Pantuwong N (2014) Calories Analysis of Food Intake Using Image Recognition. In: Proc. international conference on information technology and electrical engineering, pp 1–4. https://doi.org/10.1109/ICITEED.2014.7007901
Tan M, Le Q (2021) Efficientnetv2: Smaller Models and Faster Training. In: International conference on machine learning, pp 10096–10106. https://doi.org/10.48550/arXiv.2104.00298
Tolosana R, Romero-Tapiador S, Vera-Rodriguez R et al (2022) DeepFakes Detection Across Generations: Analysis of Facial Regions, Fusion, and Performance Evaluation. Eng Appl Artif Intell 110:104673. https://doi.org/10.1016/j.engappai.2022.104673
Waltner G, Schwarz M, Ladstätter S, et al (2017) Personalized Dietary Self-Management using Mobile Vision-based Assistance. In: Proc. workshop on multimedia assisted dietary management, pp 385–393. https://doi.org/10.1007/978-3-319-70742-6_36
Wang X, Kumar D, Thome N et al (2015) Recipe Recognition with Large Multimodal Food Dataset. In: Proc. IEEE international conference on multimedia & expo workshops, pp 1–6. https://doi.org/10.1109/ICMEW.2015.7169757
World Health Organization (2016) The Double Burden of Malnutrition: Policy Brief. World Health Organization, Tech. rep
Xu R, Herranz L, Jiang S et al (2015) Geolocalized Modeling for Dish Recognition. IEEE Trans Multimed 17(8):1187–1199. https://doi.org/10.1109/TMM.2015.2438717
Zayed SM, Attiya GM, El-Sayed A et al (2023) A Review Study on Digital Twins with Artificial Intelligence and Internet of Things: Concepts. Multimed Tools Appl, Opportunities, Challenges, Tools and Future Scope. https://doi.org/10.1007/s11042-023-15611-7
Finanzierung
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This study has been supported by projects: AI4FOOD-CM (Y2020/TCS6654), FACINGLCOVID-CM (PD2022-004-REACT-EU), INTER-ACTION (PID2021-126521OB-I00 MICINN/FEDER), and HumanCAIC (TED2021-131787BI00 MICINN).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Zusätzliche Informationen
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Romero-Tapiador, S., Tolosana, R., Morales, A. et al. Leveraging automatic personalised nutrition: food image recognition benchmark and dataset based on nutrition taxonomy. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19161-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19161-4