Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views19 pages

Applicability of Machine Learning Techniques in

This systematic review examines the use of machine learning (ML) techniques in assessing food intake across various populations, highlighting a growing interest in this area. The review identifies 36 studies that utilized ML algorithms, predominantly supervised learning, with food frequency questionnaires being the most common assessment method. The findings suggest that while the application of ML in nutrition is still emerging, it holds potential for improving dietary recommendations and public health policies.

Uploaded by

Miguel Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views19 pages

Applicability of Machine Learning Techniques in

This systematic review examines the use of machine learning (ML) techniques in assessing food intake across various populations, highlighting a growing interest in this area. The review identifies 36 studies that utilized ML algorithms, predominantly supervised learning, with food frequency questionnaires being the most common assessment method. The findings suggest that while the application of ML in nutrition is still emerging, it holds potential for improving dietary recommendations and public health policies.

Uploaded by

Miguel Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Critical Reviews in Food Science and Nutrition

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/bfsn20

Applicability of machine learning techniques in


food intake assessment: A systematic review

Larissa Oliveira Chaves, Ana Luiza Gomes Domingos, Daniel Louzada


Fernandes, Fabio Ribeiro Cerqueira, Rodrigo Siqueira-Batista & Josefina
Bressan

To cite this article: Larissa Oliveira Chaves, Ana Luiza Gomes Domingos, Daniel Louzada
Fernandes, Fabio Ribeiro Cerqueira, Rodrigo Siqueira-Batista & Josefina Bressan (2021):
Applicability of machine learning techniques in food intake assessment: A systematic review,
Critical Reviews in Food Science and Nutrition, DOI: 10.1080/10408398.2021.1956425

To link to this article: https://doi.org/10.1080/10408398.2021.1956425

Published online: 29 Jul 2021.

Submit your article to this journal

Article views: 69

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=bfsn20
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION
https://doi.org/10.1080/10408398.2021.1956425

REVIEW

Applicability of machine learning techniques in food intake assessment: A


systematic review
Larissa Oliveira Chavesa , Ana Luiza Gomes Domingosa , Daniel Louzada Fernandesb ,
Fabio Ribeiro Cerqueirac , Rodrigo Siqueira-Batistad,e , and Josefina Bressana
a
Department of Nutrition and Health, Universidade Federal de Viçosa, Viçosa, Brazil; bDepartment of Informatics, Universidade Federal de
Viçosa, Viçosa, Brazil; cDepartment of Production Engineering, Universidade Federal Fluminense, Petropolis, Brazil; dDepartment of Medicine
and Nursing, Universidade Federal de Viçosa, Viçosa, Brazil; eSchool of Medicine of the Faculdade Din^amica do Vale do Piranga, Ponte
Nova, Brazil

ABSTRACT KEYWORDS
The evaluation of food intake is important in scientific research and clinical practice to understand Food intake; diet; artificial
the relationship between diet and health conditions of an individual or a population. Large vol- intelligence; machine
umes of data are generated daily in the health sector. In this sense, Artificial Intelligence (AI) tools learning; supervised and
unsupervised algorithms;
have been increasingly used, for example, the application of Machine Learning (ML) algorithms to computational tools
extract useful information, find patterns, and predict diseases. This systematic review aimed to
identify studies that used ML algorithms to assess food intake in different populations. A literature
search was conducted using five electronic databases, and 36 studies met all criteria and were
included. According to the results, there has been a growing interest in the use of ML algorithms
in the area of nutrition in recent years. Also, supervised learning algorithms were the most used,
and the most widely used method of nutritional assessment was the food frequency question-
naire. We observed a trend in using the data analysis programs, such as R and WEKA. The use of
ML in nutrition is recent and challenging. Therefore, it is encouraged that more studies are carried
out relating these themes for the development of food reeducation programs and public policies.

Introduction by the need to extract useful information from the collected


data, Machine Learning (ML) algorithms have been applied
Unhealthy food intake is an important behavioral risk factor
with the potential to generate knowledge that can help
for many noncommunicable diseases (NCD), including car- improve the quality of service provided to patients, reduce
diovascular disease (CVD), cancer, and diabetes mellitus morbidity, mortality, the length of stay and hospital costs,
(Dao et al. 2019). For this reason, the evaluation of the food improving the quality of life of patients (Ma and Chen 2019).
intake of an individual or a population plays an important ML is a subarea of Artificial Intelligence (AI) whose
role in the search for answers and a better understanding objective is to detect and extract hidden information, pat-
between the relationship of diet and the triggering of these terns, and specific data that were previously unidentified in
diseases, intending to make dietary recommendations more large volumes of data because they were difficult to investi-
effective and promoting public policies (Shim, Oh, and Kim gate manually or cannot be detected with conventional stat-
2014; Rupasinghe, Perera, and Wickramaratne 2020). Several istical methods (Assari, Azimi, and Taghva 2017). This
methods of food intake evaluation are available and each knowledge discovery can be automatic or semi-automatic
has different positive and negative points. The most com- through algorithms that detect and extract information in
monly used are the Food Frequency Questionnaires (FFQ), datasets quickly and accurately (Siqueira-Batista and Silva
24-hour recall (R24H), food record, and food history. 2019). ML algorithms can be classified as supervised,
However, these methods’ imprecision poses a major chal- unsupervised, semi-supervised, and reinforcement learning
lenge to the understanding between diet and NCD (Shim, approaches and have a wide application in several fields,
Oh, and Kim 2014). such as space technology, criminal investigation, bioinfor-
In this sense, information technologies are being increas- matics, economics, business, among others (Kodati,
ingly used in several areas, including health, to support scien- Vivekanandam, and Ravi 2019). Those algorithms have been
tific research and clinical decision-making (Ma and Chen applied in the health area to help defining policies of pre-
2019). As healthcare institutions generate and store large vol- vention, prediction, diagnosis, early prognosis of diseases,
umes of data daily, clinical decisions should not be made only and appropriate and effective treatments (Al-Maqaleh and
based on the intuition and experience of the healthcare profes- Abdullah 2017; Babu et al. 2017).
sional, but also on the knowledge stored overtime in their It is noteworthy that using ML algorithms in the health
databases (Singh, Singh, and Pandi-Jai 2018). Thus, motivated field is still challenging, as health professionals need to

CONTACT Larissa Oliveira Chaves [email protected]


ß 2021 Taylor & Francis Group, LLC
2 L. OLIVEIRA CHAVES ET AL.

accurately interpret information derived from the algorithms comparison of the compiled data to ensure their integrity
in a clinical setting and epidemiological studies. In addition, and reliability was conducted by the authors. Divergent deci-
there are still studies to be carried out involving the applica- sions were resolved by consensus or by consultation with a
tion of ML algorithms in the area of nutrition, especially in third author (D.L.F). For each included study, the following
food intake. Therefore, this review can serve as a guide for information was extracted: authors, year of publication, the
health professionals interested in food intake and data sci- country where the research was developed, the objective of
ence, and to enlighten as well as to assist other researchers the study, characteristics of the participants, method of food
in the development of new studies. This systematic review intake evaluation, ML approach and algorithms, and compu-
aimed to identify and analyze original articles that applied tational tools.
ML algorithms of supervised and unsupervised approaches
to assess food consumption in different populations.
Data analysis
All studies selected in this article are summarized in Table 1
Methods
according to their main characteristics. The studies were
Protocol and registration organized chronologically by year of publication, starting
with the first published study. The year of publication, loca-
This review was conducted in accordance with Preferred
tion of the study, methods for assessing food consumption,
Reporting Items for Systematic Reviews and Meta-Analyses
and ML approaches and algorithms along with the computa-
(PRISMA) (Liberati et al. 2009) and was registered in the
tional tools used were considered the main characteristics of
PROSPERO database (www.crd.york.ac.uk/prospero/), regis-
this systematic review.
tration number CRD42020198633.
The performance of a meta-analysis was not justified due
to the heterogeneity among the studies included. Therefore,
Literature research according to the Cochrane manual, the authors performed a
systematic review (Higgins and Green 2011).
Two authors (L.O.C and A.L.G.D) independently searched
for original articles that used ML algorithms to evaluate
food intake using the following electronic databases: Results
MEDLINE (PubMed, www.pubmed.com), Lilacs (www.lilacs.
Study selection
bvsalud.org), Science Direct (www.sciencedirect.com),
SciELO (www.scielo.org) and Google Scholar (https:// A total of 252 studies were identified through the searches
scholar.google.com.br/). The following descriptors were used in the databases. After the removal of 49 duplicate studies,
as a strategy for research in titles and abstracts: (“machine 203 unique records remained, of which 133 studies were
learning” OR “deep learning” OR “data mining” OR excluded based on their titles and abstracts because they
“unsupervised learning”) AND (“food intake” OR “diet” OR were considered irrelevant: 72 did not use ML algorithms to
“food pattern” OR “dietary pattern” OR “food frequency evaluate food consumption, 45 were animal studies, 11 were
consumption” OR “food questionnaire”) NOT review. not original articles, 3 did not use ML algorithms and 2
The research strategy was not restricted by publication were in vitro studies. The remaining 70 studies were
year and language. The research was conducted between reviewed and evaluated in full for eligibility, and 36 met all
July 1st and 6th, 2020. A reverse search was conducted to the criteria adopted for this systematic review and were thus
identify relevant articles cited in the selected studies. included (Figure 1).

Eligibility criteria Description of included studies


The following criteria were applied for the inclusion of stud- Overview of the number of publications in relation to the
ies: (1) original research; (2) studies in humans; (3) evalu- year and countries of studies
ation of food consumption; (4) use of ML algorithms. The The first publication analyzed in this review was in 2008,
following exclusion criteria were applied: (1) non-original with only nine publications between 2008 and 2016. From
publications, such as reviews, letters, book chapters, case 2017 there was an increase in research in this area, with 27
reports, abstracts, and comments; (2) animal studies; (3) in studies published between 2017 and 2020 (up to the time of
vitro studies; (4) research on the composition and import- this review) as presented in Table 2 and Figure 2.
ance of a specific food; (5) research that did not use ML Of the 36 studies included in this review, 38.9% were
algorithms to assess food consumption. conducted in North America, 30.6% in Europe, 25% in Asia,
and 2.8% in Oceania. Table 2 shows the distribution of pub-
lications in relation to the countries where the studies were
Study selection and data collection process
conducted. Only one study used data from 12 countries, but
The studies’ selection was made by two authors (L.O.C and they were not mentioned (Yu et al. 2020).
A.L.G.D) in three phases: analysis of titles, abstracts, and full The heatmap in Figure 3 illustrates the relationship of
texts, independently. After reading the selected studies, a publications from 2008 to 2020 with the countries where the
Table 1. Characteristics of the studies that applied ML algorithms to assess food intake in different populations.
Method for assessing
Reference/Country Objective Population characteristics food intake ML Approach ML algorithms Computational tools
Hearty and Gibney (2008) Assess ML algorithms to predict 1,379 men and women - Registration of daily Supervised - ANN SPSS
Ireland HEI based on food intake Age: 18 to 64 years intake - Decision Tree
for 7 days, with the
times of each meal
- HEI divided into
quintiles 1 and 5
De Cos Juez et al. (2009) Develop mathematical method 305 healthy postmenopausal - Specific questionnaire Supervised -MARS Does not contain this
Spain to predict BMD of women in women taken from a food information
post-menopause, according to Age: 50 to 69 years history questionnaire,
nutritional variables type FFQ
nez et al. (2009)
Ordo~ Determine factors that influence 305 healthy postmenopausal - DHQ (FFQ type) Supervised - SVM Does not contain this
Spain the BMD, for the women - CARTs information
development of specific Age: 49 to 69 years
prevention programs
Zenitani, Nishiuchi, and Analyze and extract 634 employees of a Tokyo - Auto Meal Record Supervised - Multiple SAS
Kiuchi (2010) food patterns electricity company System (purchase of Linear Regression
Japan meals in 2 company
cafeterias for 1 year)
- Employee ID, date/
time of purchase,
quantity and price
De Cos Juez et al. (2011) Develop BMD prediction method 200 healthy post-menopausal - DHQ (FFQ type) Supervised - MLP R
Spain for post-menopausal women, women - GA
according to nutritional data Age: 50 to 69 years
Lazarou et al. (2012) Applying the ML approach to 1,140 children - FFQ semi-quantitative Supervised - Decision Tree WEKA
Republic of Cyprus ascertain eating habits related Age: 9 to 13 years - FGFQ: 15 food groups
to childhood obesity - SEBBQ: 8
psychological aspects of
eating
- SDHQ: 19
eating habits
Silvera et al. (2014) Examine the interactions 1,095 cases and 687 controls, - FFQ validated Supervised - Decision Tree CART
United States between diet, lifestyle, and men and women
medical factors with risks of Age: 30 to 79 years
esophageal and gastric
cancer subtypes
Zeevi et al. (2015) Quantify the PPGR, characterize 900 healthy men and women - Own questionnaire Supervised - Stochastic Gradient Does not contain this
Israel its variability and identify Age: 18 to 70 years - Registration on Boosting Regression information
associated factors smartphone of meals
for 7 days
Giabbanelli and Adams Evaluate food intake by ML and 4,156 (2,083 adults and 2,073 - Registration in Supervised - Decision Tree WEKA
(2016) investigate its performance to children) NDNS journals
United Kingdom predict and make food NDNS data 2008-2012
recommendations
Dipnall et al. (2017) Develop a method to evaluate 5,546 men and women  1st interview: R24H Supervised - GSEM probitmodel STATA
United States the risk of depression using Age: 18 to 80 years. - 2nd interview: - RL
GSEM, comparing ML with (Data from NHANES by phone
statistical analysis 2009-2010)
Kanerva et al. (2018) Explore sociodemographic and 6,258 men and women Age: - FFQ validated and Supervised - RF R
Finland lifestyle risk factors related to 25 to 74 years self-managed - LR
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION

overweight using ML and LR


(continued)
3
Table 1. Continued. 4
Method for assessing
Reference/Country Objective Population characteristics food intake ML Approach ML algorithms Computational tools
Mezgec and Seljak (2017) Introducing a new approach to 225,953 images, 512  512 - Food and beverages Supervised - NutriNet Google Custom
Slovenia food and beverage image pixel, from 520 classes of from dietary evaluation Search API
detection and recognition food and beverage images system of PD Nutrition
- Food and beverage
image search
on Google
Mutter et al. (2017) Understand the interaction 2,849 men and women - Three consecutive days Unsupervised - SOM R
China between nutritional and Age: 20 to 87 years food record, including
socioeconomic risk factors Five-year follow-up: 1,262 one day of
L. OLIVEIRA CHAVES ET AL.

with anemia individuals the weekend


Silva et al. (2018) Introducing new mobile system Creation of application to - Image capture of food Supervised - SVM Google Colaboratory
United States/Canada that enables automated evaluate diet through food by smartphone - Inception V3
image-based food recognition images and perform
and evaluation for dietary interventions
intervention
Easton, Sicilia, and Investigate whether specific food 11,385 men and women - FFQ applied by Supervised - Naïve Bayes Does not contain this
Stephens (2019) groups can predict and Age: 20 to 69 years interviewer information
Mexico classify adults with obesity,
diabetes or both
Forman, Goldstein, Zhang, Evaluate the feasibility, 43 men and women - Registration feeds in the Supervised - OT: set of algorithms Does not contain this
et al. (2019) effectiveness, and Age: 18 to 65 years WW (Logitboost, information
United States acceptability of OT, food With overweight or obesity - Registration of lapses and Bagging, Random
lapses and weight loss triggers in OT Subspace, RF, and
for 8 weeks Bayes Network)
Guan et al. (2018) Identify food choices for meals 433 (116 men and 317 - Food history applied by Unsupervised - Apriori R
Australia of overweight and obese women) a nutritionist - Hierarchical Clustering
volunteers for a weight Average age of 43 years
loss test With overweight or obesity
Jia et al. (2019) Develop an AI-based algorithm Free living individuals, - “eButton”application Supervised - CNN Does not contain this
United States to automatically detect food application users. information
items from images No sampling
Rosso and Giabbanelli Assess whether national surveys 4,156 children and adults of - Evaluated daily by Supervised - Decision Tree WEKA
(2018) can be simplified by both sexes Age: 18 months means of home
United Kingdom recording food intake only and over (NDNS data measurements and the
(2008-2012) weight of food and
beverages, for a
few days
Shiao et al. (2018a) Examine five folate pathway 106 men and women (53 CRC - FFQ Supervised - LR SAS
United States genes and food parameters patients and 53 family - HEI - GR Elastic Net
related to CRC, measuring HEI members and friends)
in families Age: 18 to 80 years
Shiao et al. (2018b) Examine healthy eating 106 men and women (53 CRC - FFQ Supervised - LR SAS
United States predictors, using HEI and GI, patients and 53 family - HEI - GR Elastic Net
in families with CRC members and friends)
Age: 18 to 80 years
Shiokawa, Date, and Describe a KPCA to extract 386 NMR datasets and 386  309 nutritional sets of Supervised - RF R e MATLAB
Kikuchi (2018) useful information from ICP-OES of urine daily food intake
Japan metabolic profile samples collected records obtained
from studies
Panaretos et al. (2018) Compare statistical and ML 2.020 men and women - FFQ semi-quantitative, Supervised - kNN R
Greece analyses in the association of Adults and elderly validated. - RF
food standards and CVD risk
- Questionnaire
(EPIC)-Greek
Faruqui et al. (2019) Predict daily glucose levels in 10 adults - Smartphone application Supervised - LSTM Does not contain this
United States T2D patients based on diet, Overweight or obesity for food information
physical activity, weight and and T2D intake monitoring
glucose level the previous day
Forman, Goldstein, Examine the effectiveness of 181 men and women - Smartphone applications Supervised OT - Set of algorithms R
Crochiere, et al. (2019) weight loss, participant Age: 18 to 70 years for 10 weeks (Logitboost,
United States satisfaction and the frequency Overweight or obesity Bagging, Random
of OT lapses. To verify the Subspace, RF,
precision of interventions by Bayes Net)
OT algorithm
Hamad et al. (2019) Identify new and existing SNAP PSDI  21,806 men and - FAH Supervised - Lasso Regression R
United States participants and examine women. Age: 18 years or - FAFH
differences in socio- older of the waves from - HEI
demographic, health, nutrition 1999 to 2013
and food purchasing behavior Food APS  4,775 families,
including 4,548 children
and 9,607 adults (men and
women). Held between
2012 and 2013
Shao et al. (2019) To investigate the effects of 23,682 (12,605 men and - Application to Supervised - Decision Tree WEKA
China smoking, diet and physical 11,077 women) Over investigate the - RF
activity on the risk of CVD 50 years of age consumption of alcohol,
and to propose a new model meat, milk, vegetables
of risk analysis and fruit
Yu et al. (2020) Demonstrate the use of ML 31,551 men and women - FFQ validated Supervised - Decision Tree R
12 countries methods to find food groups 8,320 Central Bank cases
related to the CBB and 23,231 non-CB cases
Burgermaster et al. (2020) Investigate applicability of an 4 patients with T2D - Smartphone application Supervised - Decision Tree R
United States application for dietary (photos of meals with
recommendations in description and
individuals with T2D ingredients for 30 days)
compared with the specialist
He et al. (2020) Analyze the causal relationship 100,000 men and women - Average quantity in kg Supervised - GTM MATLAB
United Kingdom between lifestyle factors, Over 18 years of age of cereals, fruits, - GPLVM
health indicators and diet vegetables and cheese
consumption, per year
Kwon et al. (2020) Identify subgroups in the 10,863 men and women - R24H applied Supervised and - K-means R
South Korea population based on Over 40 years old by nutrition Unsupervised - Multivariate
nutritional factors and risk Logistic Regression
factors for ALM decrease
using ML
Jiang et al. (2020) Identify associations between 23,195 women and 20,595 - FFQ Unsupervised - Compass-2 Does not contain this
Denmark diets and anthropometry men information
Age: 50 to 64 years
Xu et al. (2020) Assess the risks of metabolic 106 men and women non- - Own questionnaire Supervised - Multivariate STATA
United States syndrome in adults with diabetic with depression - Hedonic facial scales and Regressions
depression, treated with and 106 men and women horizontal lines:
antipsychotics and examine non-depressed controls. research of likes and
the ability of DQI to predict Age: 18 to 25 years dislikes, level of likes
cardiometabolic risks and dislikes and level
of satisfaction
- DQI
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION

(continued)
5
Table 1. Continued. 6

Method for assessing


Reference/Country Objective Population characteristics food intake ML Approach ML algorithms Computational tools
Bodnar et al. (2020) Estimate associations between 7,572 women - FFQ Supervised - Super Learner Does not contain this
United States fruit and vegetable intake in com TMLE information
relation to adverse
pregnancy outcomes
Narziev et al. (2020) Build ML model for depression 21 students: no depression - STDD smartphone Supervised - SVM WEKA
South Korea detection and classification (n ¼ 5), light (n ¼ 6), application - RF
using smartphone moderate (n ¼ 6) and
severe (n ¼ 4).
For 4 weeks
L. OLIVEIRA CHAVES ET AL.

Iwendi et al. (2020) Recommending diets using deep 30 men and women - Food intake data Supervised - MLP, RNN, GRU, Google Colaboratory
China learning in medical data, to of hospitals available from the LSTM, LR, Naïve
detect which food should be hospital database Bayes
administered RF
Abbreviations: AI: Artificial Intelligence; ALM: Appendicular Lean Mass; ANN: Artificial Neural Networks; BC: Bladder Cancer; BMD: Bone Mineral Density; CARTs: Classification and Regression Trees; CNN: Convolutional
Neural Network; CRC: Predictors of Colorectal Cancer; CVD: Cardiovascular Disease; DHQ: Diet History Questionnaire; DQI: Diet Quality Indexes; EPIC: European Prospective Investigation into Cancer and Nutrition; FAH:
Food at Home; FAFH: Food Away From Home; FFQ: Food Frequency Questionnaire; FGFQ: Food Groups Frequency Questionnaire; GA: Genetic Algorithms; GI: Glycemic Index; GPLVM: Gaussian Process Latent Variable
Model; GR: Generalized Regression; GRU: Gated Recurrent Units; GSEM: Generalized Structural Equation Model; GTM: Generative Topological Mapping; HEI: Healthy Eating Index; ICP-OES: Inductively Coupled Plasma
Optical Emission Spectrometry; k-NN: k-Nearest Neighbors; KPCA: Kernel Principal Component Analysis; LR: Logistic Regression; LSTM: Long Short-Term Memory; MARS: Multivariate Adaptive Regression Splines; ML:
Machine Learning; MLP: Multilayer Perceptron; NDNS: National Diet and Nutrition Survey; NHANES: National Health and Nutrition Examination Survey; NMR: Nuclear Magnetic Resonance; OT: On Track; PCA: Principal
Component Analysis; PPGR: Postprandial Glycemic Response; PSID: Panel Study of Income Dynamics; RF: Random Forest; RNN: Recurrent Neural Network; R24H: 24-h recall; SDHQ: Short Dietary Habits Questionnaire;
SEBBQ: Short Eating Habits Behaviors & Beliefs Questionnaire; SNAP: Supplemental Nutrition Assistance Program; SOM: Self-Organizing Map; STDD: Short-Term Depression Detector; SVM: Support Vector Machine; T2D:
Type 2 Diabetes; TMLE: Targeted Maximum Likelihood Estimation; WW: Weight Watchers.
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION 7

Figure 1. Flowchart of the study selection process, according to PRISMA recommendation.

studies were conducted. For a better visual analysis of these studies (11.1%) included only women, and 25 studies
results, the webColorBrewer 2.0 tool was used to choose the (69.4%) included individuals of both sexes. Regarding the
color palette that could also be differentiated by colorblind age group, 20 studies(55.5%) were conducted with adult and
individuals. elderly population, three studies (8.3%) with adults, two
studies (5.6%) with children and adults, and only one study
(2.8%) with children (Lazarou et al. 2012). It was observed
Population characteristics that the studies included mostly individuals with overweight
As shown in Table 2, five studies (13.8%) did not describe or obesity (11.1%) and with a diagnosis of cancer (11.1%),
the population characteristics for gender and age. Four followed by studies with postmenopausal women (8.3%) and
8 L. OLIVEIRA CHAVES ET AL.

Table 2. Characteristics of the population and publications in relation to the year and countries in which the studies were conducted.
Characteristics References
Publications in relation to the year
First: 2008 Hearty and Gibney 2008
2008–2016 Hearty and Gibney 2008; De Cos Juez et al. 2009; Ord ~ez et al. 2009; Zenitani, Nishiuchi, and Kiuchi 2010; De Cos
on
Juez et al. 2011; Lazarou et al. 2012; Silvera et al. 2014; Zeevi et al. 2015; Giabbanelli and Adams 2016
2017–2020 Dipnall et al. 2017; Kanerva et al. 2018; Mezgec and Seljak 2017; Mutter et al. 2017; Silva et al. 2018; Easton, Sicilia,
and Stephens 2019; Forman, Goldstein, Zhang, et al. 2019; Guan et al. 2018; Jia et al. 2019; Rosso and Giabbanelli
2018; Shiao et al. 2018a; Shiao et al. 2018b; Shiokawa, Date, and Kikuchi 2018; Panaretos et al. 2018; Faruqui et al.
2019; Forman, Goldstein, Crochiere, et al. 2019; Hamad et al. 2019; Shao et al. 2019; Yu et al. 2020; Burgermaster
et al. 2020; He et al. 2020; Kwon et al. 2020; Jiang et al. 2020; Xu et al. 2020; Bodnar et al. 2020; Narziev et al.
2020; Iwendi et al. 2020
Publications in relation to the countries
Australia Guan et al. 2018
China Mutter et al. 2017; Shao et al. 2019; Iwendi et al. 2020
Denmark Jiang et al. 2020
Finland Kanerva et al. 2018
Greece Panaretos et al. 2018
Ireland Hearty and Gibney 2008
Israel Zeevi et al. 2015
Japan Zenitani, Nishiuchi, and Kiuchi 2010; Shiokawa, Date, and Kikuchi 2018
Mexico Easton, Sicilia, and Stephens 2019
Republic of Cyprus Lazarou et al. 2012
Slovenia Mezgec and Seljak 2017
South Korea Kwon et al. 2020; Narziev et al. 2020
Spain De Cos Juez et al. 2009; Ord ~ez et al. 2009; De Cos Juez et al. 2011
on
United Kingdom Giabbanelli and Adams 2016; Rosso and Giabbanelli 2018; He et al. 2020
United States Silvera et al. 2014; Dipnall et al. 2017; Forman, Goldstein, Zhang, et al. 2019; Shiao et al. 2018a; Shiao et al. 2018b;
Faruqui et al. 2019; Forman, Goldstein, Crochiere, et al. 2019; Hamad et al. 2019; Burgermaster et al. 2020; Xu et al.
2020; Bodnar et al. 2020
United States and Canada Silva et al. 2018
Population characteristics
Adults Faruqui et al. 2019; Xu et al. 2020; Narziev et al. 2020
Adult and elderly Hearty and Gibney 2008; De Cos Juez et al. 2009; Ord ~ez et al. 2009; De Cos Juez et al. 2011; Silvera et al. 2014;
on
Zeevi et al. 2015; Dipnall et al. 2017; Kanerva et al. 2018; Mutter et al. 2017; Easton, Sicilia, and Stephens 2019;
Forman, Goldstein, Zhang, et al. 2019; Guan et al. 2018; Shiao et al. 2018a; Shiao et al. 2018b; Panaretos et al.
2018; Forman, Goldstein, Crochiere, et al. 2019; Shao et al. 2019; He et al. 2020; Kwon et al. 2020; Jiang et al. 2020
Anemia Mutter et al. 2017
Both sexes Hearty and Gibney 2008; Zenitani, Nishiuchi, and Kiuchi 2010; Lazarou et al. 2012; Silvera et al. 2014; Zeevi et al. 2015;
Giabbanelli and Adams 2016; Dipnall et al. 2017; Kanerva et al. 2018 ; Mutter et al. 2017; Easton, Sicilia, and
Stephens 2019; Forman, Goldstein, Zhang, et al. 2019; Guan et al. 2018; Rosso and Giabbanelli 2018; Shiao et al.
2018a; Shiao et al. 2018b; Panaretos et al. 2018 ; Faruqui et al. 2019; Forman, Goldstein, Crochiere, et al. 2019;
Shao et al. 2019; Yu et al. 2020; He et al. 2020; Kwon et al. 2020; Jiang et al. 2020; Xu et al. 2020; Narziev
et al. 2020
Cancer Silvera et al. 2014; Shiao et al. 2018a; Shiao et al. 2018b; Yu et al. 2020
Children Lazarou et al. 2012
Children and adults Giabbanelli and Adams 2016; Rosso and Giabbanelli 2018
Depression Xu et al. 2020; Bodnar et al. 2020
Did not describe Jia et al. 2019; Shiokawa, Date, and Kikuchi 2018; Hamad et al. 2019; Iwendi et al. 2020; Burgermaster et al. 2020
Overweight and / or obesity Kanerva et al. 2018; Forman, Goldstein, Zhang, et al. 2019; Guan et al. 2018; Forman, Goldstein, Crochiere, et al. 2019
Postmenopausal women De Cos Juez et al. 2009; Ord ~ez et al. 2009; De Cos Juez et al. 2011
on
Type 2 diabetes mellitus Burgermaster et al. 2020
Two or more noncommunicable diseases Easton, Sicilia, and Stephens 2019; Faruqui et al. 2019
Women De Cos Juez et al. 2009; Ord ~ez et al. 2009; De Cos Juez et al. 2011; Bodnar et al. 2020
on

individuals diagnosed with depression (5.6%), anemia (2.8%) (2.8%) used hedonic scales. Four studies (11.1%) of the
and type 2 diabetes mellitus (2.8%). Only two studies (5.6%) selected ones used food intake data recorded in database
investigated a population with two or more NCD. The other systems and/or studies already published. One study (2.8%)
studies did not inform the population characteristics. Note aimed at detecting and recognizing food and beverage
that most of the studies included in this review were with a images used Google site as a search tool (Table 3).
population composed of both sexes, adults and elderly, with
the presence of overweight, obesity, or cancer.
ML algorithms and computational tools
The studies included in this review used different ML algo-
Methods of food intake evaluation rithms, and the most used ones were in the category super-
A total of 13 studies (36.1%) used the FFQ as a method to vised learning. Of the 36 studies included, 32 studies
evaluate food consumption, followed by ten studies (27.8%) (88.9%) used supervised approach algorithms, being 23 stud-
that used smartphone/software applications. Five studies ies (63.9%) of the classification type and 14 studies (38.9%)
(13.9%) used other types of questionnaires, four studies of the regression type. The most used classification algo-
(11.1%) used the food registry, two studies (5.6%) used the rithms were based on Decision Trees with 13 studies
R24H applied by a trained interviewer, and only one study (36.1%), and the Artificial Neural Networks with 6 studies
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION 9

Figure 2. Number of publications per year.

Figure 3. Heatmap relating the number of articles published in the countries where the studies were conducted.

(16.7%). We found only four studies (11.1%) in which food intake in healthy and unhealthy individuals. Currently,
unsupervised approaches were applied. The clustering algo- there is considerable scientific interest in the use of those
rithms found were Hierarchical clustering, k-means, and algorithms due to the high predictive performance in large
Self-Organizing Maps, while the Apriori algorithm was the volumes of data, such as in agriculture, transport, finance,
only association rule procedure encountered. Only one study criminal justice, and health (Cutillo et al. 2020). In the
(2.8%) applied both ML approaches, supervised and health area, ML algorithms have great potential to improve
unsupervised (Kwon et al. 2020) (Table 3). the results of patients from clinical research to hospital care,
Nine studies (25%) did not inform the computational helping in the process of diagnosis and prediction of dis-
tools used. However, ten studies (27.8%) used the Statistical eases (Cutillo et al. 2020; Fernandes and Filho 2019).
Program R, five studies (13.9%) used the WEKA program, This growth in studies and publications involving the
three studies (8.3%) used SAS, two studies (5.6%) used application of ML algorithms in health is confirmed in our
STATA, two studies (5.6%) used Google Colaboratory, one review due to the significant increase in studies that
study (2.8%) used CART, one study (2.8%) MATLAB, one addressed ML and food intake from 2016 onwards with the
study (2.8%) SPSS and one study (2.8%) Google Custom start of the peak in 2017. Of the 36 studies included, 27
Search API. Only one study (2.8%) used two computational were published between 2017 and mid-2020. Moreover, we
tools, R and MATLAB (Table 3). observed an increase since 2011 in studies related to AI in
the health area, with peaks also from the year 2017 onwards
(data not shown). This observation emphasizes that the use
Discussion of computational methods is not restricted only to nutrition.
Overview of growth in studies and publications Therefore, there is a trend in the use of AI in the health
involving ML and nutrition area in general.
Even though there has been a recent increase in the num-
According to our knowledge, this was the first systematic ber of publications addressing the various applications of
review that analyzed the application of different ML algo- ML algorithms in nutrition, the use of artificial intelligence
rithms, supervised and unsupervised approaches, to evaluate approaches in other areas has been under discussion for
10 L. OLIVEIRA CHAVES ET AL.

Table 3. Characteristics of included studies in relation to the method of assessing food intake and the type of algorithm and computational tools.
Assessment methods of food intake
Food consumption data recorded in database systems and / Silvera et al. 2014; Giabbanelli and Adams 2016; Shiokawa, Date, and Kikuchi 2018; Iwendi et
or studies already published al. 2020
Food Frequency Questionnaires De Cos Juez et al. 2009; Ord o~
nez et al. 2009; De Cos Juez et al. 2011; Lazarou et al. 2012;
Silvera et al. 2014; Kanerva et al. 2018; Easton, Sicilia, and Stephens 2019; Shiao et al. 2018a;
Shiao et al. 2018b; Panaretos et al. 2018; Yu et al. 2020; Jiang et al. 2020; Bodnar et al. 2020
Food registry Hearty and Gibney 2008; Mutter et al. 2017; Rosso and Giabbanelli 2018; He et al. 2020
Google as a search tool Mezgec and Seljak 2017
Hedonic scales Xu et al. 2020
24-hour recall Dipnall et al. 2017; Kwon et al. 2020
Other types of questionnaires Lazarou et al. 2012; Zeevi et al. 2015; Guan et al. 2018; Hamad et al. 2019; Xu et al. 2020
Smartphone / software applications Zenitani, Nishiuchi, and Kiuchi 2010; Zeevi et al. 2015; Silva et al. 2018; Forman, Goldstein,
Zhang, et al. 2019; Jia et al. 2019; Faruqui et al. 2019; Forman, Goldstein, Crochiere, et al.
2019; Shao et al. 2019; Burgermaster et al. 2020; Narziev et al. 2020
Supervised approach algorithms
Type: classification Hearty and Gibney 2008; Ord on~ez et al. 2009; De Cos Juez et al. 2011; Lazarou et al. 2012;
Silvera et al. 2014; Giabbanelli and Adams 2016; Kanerva et al. 2018; Mezgec and Seljak 2017;
Silva et al. 2018; Easton, Sicilia, and Stephens 2019; Forman, Goldstein, Zhang, et al. 2019; Jia
et al. 2019; Rosso and Giabbanelli 2018; Shiokawa, Date, and Kikuchi 2018; Panaretos et al.
2018; Faruqui et al. 2019; Forman, Goldstein, Crochiere, et al. 2019; Shao et al. 2019; Yu et al.
2020; Burgermaster et al. 2020; Bodnar et al. 2020; Narziev et al. 2020; Iwendi et al. 2020
Type: regression De Cos Juez et al. 2009; Zenitani, Nishiuchi, and Kiuchi 2010; Zeevi et al. 2015; Dipnall et al.
2017; Kanerva et al. 2018; Forman, Goldstein, Zhang, et al. 2019; Shiao et al. 2018a; Shiao et
al. 2018b; Forman, Goldstein, Crochiere, et al. 2019; Hamad et al. 2019; He et al. 2020; Kwon
et al. 2020; Xu et al. 2020; Iwendi et al. 2020
Artificial Neural Networks Hearty and Gibney 2008; De Cos Juez et al. 2011; Mezgec and Seljak 2017; Silva et al. 2018; Jia
et al. 2019; Faruqui et al. 2019
Decision Trees Hearty and Gibney 2008; Ord on~ez et al. 2009; Lazarou et al. 2012; Silvera et al. 2014; Giabbanelli
and Adams2016; Kanerva et al. 2018; Rosso and Giabbanelli 2018; Shiokawa, Date, and
Kikuchi 2018; Panaretos et al. 2018; Shao et al. 2019; Yu et al. 2020; Burgermaster et al. 2020;
Narziev et al. 2020
Unsupervised approach algorithms
Apriori Guan et al. 2018; Jiang et al. 2020
Hierarchical Clustering Guan et al. 2018
K-means Kwon et al. 2020
Self-Organizing Map Mutter et al. 2017; Jiang et al. 2020
Computational tools
CART Silvera et al. 2014
Did not inform De Cos Juez et al. 2009; Ord o~
nez et al. 2009; Zeevi et al. 2015; Easton, Sicilia, and Stephens
2019; Forman, Goldstein, Zhang, et al. 2019; Jia et al. 2019; Faruqui et al. 2019; Jiang et al.
2020; Bodnar et al. 2020
Google Colaboratory Iwendi et al. 2020; Silva et al. 2018
Google Custom Search API Mezgec and Seljak 2017
MATLAB He et al. 2020
R De Cos Juez et al. 2011; Kanerva et al. 2018; Mutter et al. 2017; Guan et al. 2018; Panaretos et
al. 2018; Forman, Goldstein, Crochiere, et al. 2019; Hamad et al. 2019; Yu et al. 2020;
Burgermaster et al. 2020; Kwon et al. 2020
SAS Zenitani, Nishiuchi, and Kiuchi 2010; Shiao et al. 2018a; Shiao et al. 2018b
SPSS Hearty and Gibney 2008
STATA Dipnall et al. 2017; Xu et al. 2020
Statistical Program R and MATLAB Shiokawa, Date, and Kikuchi 2018
WEKA Lazarou et al. 2012; Giabbanelli and Adams 2016; Rosso and Giabbanelli 2018; Shao et al. 2019;
Narziev et al. 2020

many years (Smallwood and Sondik 1973). One of the pos- noted. Most studies were conducted in North America, fol-
sible explanations for the increase in the use of ML algo- lowed by Europe, Asia, and Oceania, and no studies were
rithms in nutrition and health in general in recent years is developed in Central America, South America, and Africa. It
precisely the search for more accurate procedures to meet is known that food intake is highly influenced by the region
the needs of professionals in their daily decision-making in which we live, so it is important that countries can con-
activities, treatment options, and reduction of health costs duct their research to understand the eating behavior of indi-
(Reis et al. 2017). In the long term, it is believed that ML viduals in the same region (Latha and Thegaleesan 2019).
approaches will benefit professionals in diverse fields, by The food intake pattern is directly influenced by social,
offering objective suggestions and ways to improve the effi- cultural, and economic factors (Savage, Bambrick, and
ciency, reliability, and accuracy of processes. Gallegos 2020). In addition, the characteristics of the popu-
lation in terms of customs, level of education, knowledge
about healthy eating, workplace, family and friends circle
Influence of regionalization on food consumption also have a major impact on food choice and habits (Latha
and Thegaleesan 2019). These differences can be observed
According to the results achieved, a small diversity of coun- between different countries or regions within the same
tries investigating food intake using ML algorithms was country (Vasileska and Rechkoska 2012).
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION 11

In this review, an interesting result found is that the food records are necessary. Repeated measurement requires
United States was the country that developed the largest resources and time and can influence respondents’ food
number of studies involving ML and food consumption. It intake, improving the quality of the diet, changing or omit-
is believed that this great interest in research on nutrition is ting information intentionally (Rupasinghe, Perera, and
related to the low quality of the diet consumed and also to Wickramaratne 2020). The R24H is conducted by interview
the reduction of physical activity practices of its inhabitants, and usually requires 20 to 30 minutes, and the information
which has been worsening since the 1980s (Popkin, Adair, depends on the interviewees’ memory and the interviewer’s
and Ng 2012). Economic development and increasing urban- skills. On the other hand, food recording is a method that
ization in developed countries, such as the United States, takes more time to obtain accurate data and respondents
brought benefits and negative consequences for lifestyle and must undergo prior training. Therefore, a high level of
dietary patterns, which include quantitative and qualitative motivation becomes necessary. Also, each questionnaire
changes in the diet. This more industrialized dietary pattern requires a thorough review to ensure that all reported data
includes an increase in the consumption of high-calorie are correct (Shim, Oh, and Kim 2014; Rupasinghe, Perera,
foods, refined carbohydrates, and saturated fats of animal and Wickramaratne 2020).
origin, in addition to a reduction in the intake of complex However, both methods to evaluate food intake also have
carbohydrates, fibers, vitamins, and minerals (Vasileska and common strengths, such as being easy to apply, having a
Rechkoska 2012). wide variety of foods, are made up of open questions that
It is essential to point out that cultural and behavioral allow the collection of great information on consumption
factors are also susceptible to change and that the circle of and can be used to estimate the average consumption of a
family and friends is extremely important in the correct given population. Moreover, food registration does not
choice of food. In addition, an increasing number of indi- depend on the individual’s memory since the information is
viduals have been eating outside their homes, which further self-reported when the food is consumed (Shim, Oh, and
increases the consumption of processed and ultra-processed Kim 2014; Chmurzynska et al. 2018).
foods since access to healthy options is often limited in As seen above, the most used methods nowadays have
many places, including at work and in school environments many limitations, including memory dependency, under-
(Latha and Thegaleesan 2019). standing of food portions, literacy, and training of inter-
viewers. Motivated by the development of reliable evaluation
methods, the technology emerges as a viable solution to cur-
Methods of food intake evaluation
rent methodological deficiencies with the potential to
The evaluation of adequate and reliable food intake in scien- improve adherence, communication, and data quality (Sharp
tific research is important to understand the association and Allman-Farinelli 2014). As a result, a large number of
between diet and the health conditions of an individual or a studies that used mobile applications were found in our
population. It has also been useful in predicting NCD review. In recent years, mobile devices have been used to
(Vucic et al. 2009). However, an accurate assessment of food evaluate individual and group diets in real-time, incorporat-
intake remains a major challenge, as it is subject to bias, ing their daily food routines. The easy access and interactive
and none of it is considered the gold standard. The most features of these applications, such as setting goals and diet-
commonly used methods to assess food intake are food his- ary lapses, allow users to monitor the diet and trigger
tories, food records, R24H, and FFQ (Vucic et al. 2009; healthier behaviors. Mobile applications have demonstrated
Shim, Oh, and Kim 2014). validity and reliability, similar to conventional methods. In
In this review, most of the selected studies used the FFQ addition, the use of the application feeds continuous pro-
to evaluate food consumption. This method is considered gress data to be used in future studies (Chmurzynska et al.
one of the simplest, cheapest, fastest, and easiest to adminis- 2018; Ahn et al. 2019).
ter and process, and allows for long-term food evaluation
(Chmurzynska et al. 2018). This method contains a defined
list of about 100 to 150 food items and options of the usual ML algorithms and computational tools
frequency of consumption over the period consulted. In ML is a subarea of AI whose objective is to develop algo-
some cases, portion sizes are also investigated, but little rithms that give computers or computer systems the ability
information is collected on the additional characteristics of to learn specific knowledge, behavior, or pattern automatic-
the food consumed (Shim, Oh, and Kim 2014). Despite this ally or semi-automatically from examples or informed obser-
methodological limitation, FFQ has been widely used in epi- vations (Michalski, Carbonell, and Mitchell 2013). ML
demiological studies since the 1990s. It is important to note approaches can be of types: supervised, unsupervised, semi-
that the FFQ should be developed specifically for each study supervised and reinforcement learning. Here, we will discuss
and research group because diet can be influenced by ethni- the first two main approaches, which were the ones found
city, culture, economic status, among others (Shim, Oh, and in the studies selected in this review.
Kim 2014; Rupasinghe, Perera, and Wickramaratne 2020).
The R24H and food registration have some important
limitations that can influence their choices, such as collect- Supervised learning approach
ing information for a specific period, usually for short-term In situations where supervised learning is applied, one has
intake. Thus, to measure the average intake, several R24H or prior knowledge of the values of the output variable, i.e., the
12 L. OLIVEIRA CHAVES ET AL.

classes or labels represented by categorical or continuous It is believed that the vast demand for this type of algo-
values of the input dataset used - composed of registers rithm in the review studies is due to its advantages, espe-
(instances) and variables (attributes). Therefore, the objective cially: fast construction of the predictive model; fast
of supervised learning is to learn, employing algorithms for classification of new instances; no need for normalization or
this type of task, a mapping function that best approximates standardization in the preprocessing phase; simplicity in
the relationship between input data and observable output understanding and interpreting the rules generated even for
so that when new instances are available, the output can be non-specialist users, as the resulting tree provides a consoli-
predicted with considerable accuracy (Pedregosa et al. 2011). dated view of the classification logic (Khan et al. 2010;
This learning process works in the following way: first, Rajput et al. 2011).
the dataset is split into two parts: training and test data. A According to Yu et al. (2020) the application of the DT
predictive model is then built based on an algorithm that algorithm in their study strongly contributed to the high
uses the training set so that the resulting model learns pat- accuracy found in the proposed classification, indicating that
terns by associating the input data values with the output the ML can adequately deal with missing data and measure-
labels. After the training, the model will receive the test set ments of complex investigation. The investigators concluded
split, which was left out of the previous step, and it will that the DT algorithm provided an effective approach to
apply the knowledge learned from previous experiences identify some food groups related to bladder cancer risk.
(training data) to this test set so that the accuracy, sensitiv- Another very powerful and frequently applied supervised
ity, specificity – and other important statistical measures – ML approach is ANN. ANN algorithms are inspired by the
are calculated to evaluate the predictive power of the model operating structure of the biological neural system concern-
(Dey 2016). Thus, together with performance metrics, the ing the ability to learn from data and improve its perform-
model ability to generalize to predict labels for previously ance according to what was learned through operations such
unseen instances during the training will be evaluated. as parallel calculations for data processing and knowledge
(Michalski, Carbonell, and Mitchell 2013). representation (Tan, Steinbach, and Kumar 2006).
Interestingly, it was observed that most of the studies An ANN is built by a set of processing units, also known
included in this review, totaling 32 out of 36, used some as neurons, linked by weighted connections or synaptic
weights responsible for the propagation of attribute values
supervised learning algorithm. Below, we will discuss these
between the neurons in the layers (Tan, Steinbach, and
algorithms, which are of the classification and regression
Kumar 2006; Michalski, Carbonell, and Mitchell 2013). A
type, and the reasons why they were the most used in the
neuron is a component that calculates the weighted sum of
studies included in this review.
the values received as input, applies an activation function,
and passes the result forward to the next layer. The inter-
Classification algorithms. The classification algorithms are
mediate layers, if any, between the input and output layers,
used when the goal is to map the input variables to a spe-
are known as hidden layers. The value propagation process
cific categorical class. It is common to find in the literature
continues until reaching the output layer with the predicted
works that applies some of the supervised algorithms based
response (Tan, Steinbach, and Kumar 2006; Michalski,
on Decision Trees (DT), Artificial Neural Networks (ANN), Carbonell, and Mitchell 2013).
Naïve Bayes (NB), Support Vector Machine (SVM), Logistic The main advantage of implementing ANN-based algo-
Regression, and k-Nearest Neighbor (kNN) (Khan et al. rithms is the high capacity to learn from large volumes of
2010). In this review, we provide further details of the algo- data, whether structured or not and in diverse applications
rithms based on DT and ANN, as they are the most present (e.g., speech recognition, machine translation, image cap-
approaches in the selected studies. tioning generator, among many others). However, ANN has
DT or derivatives from this approach are quite popular some disadvantages, such as its high computational cost and
classification algorithms used to build predictive models physical memory use. Moreover, their training is relatively
(Rajput et al. 2011). This type of technique expresses the slow, and the results learned are difficult for users to inter-
possible results of a series of choices related to attributes pret (Khan et al. 2010).
and classes through rules. Each tree is represented through a Many of the classification algorithms, including ANN, are
structure with nodes and branches, and each non-leaf node known to be difficult to understand and explained in simple
in the tree is a decision rule. A DT usually uses the top- terms how the predictions were made. When built, these
down approach, i.e., from the root node to leaves (Rajput models are called black-box models. In the health area, this
et al. 2011). It is started with a single root (parent node), kind of model is even more challenging because the profes-
representing the most important attribute in the dataset sionals will be apprehensive in making decisions, especially
according to an impurity metric. Between the root and the those related to death risk, without a firm understanding of
(child nodes), which represent other attributes, the branches how the algorithm came to that predicted recommendation
or edges connect these nodes and represent the possible val- (Khan et al. 2010).
ues of the attribute analyzed by the predecessor or parent A study by Silva et al. (2018) trained an effective food
node. Finally, after traversing the tree, one reaches the leaf classification model in a food image dataset and found that
nodes representing the target, i.e., predicted classes neural networks achieved an overall performance of 87.2%
(Dey 2016). (with 90.0% sensitivity and specificity of 84.4%). When the
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION 13

model was trained based on this food image dataset, it Unsupervised learning approach
achieved a precision of 65.5% (with a sensitivity of 59.0% Unlike supervised approaches, the algorithms of unsuper-
and a specificity of 72.0%). They concluded that the main vised learning are used to explore unlabeled data, i.e., when
contribution of neural networks is that they automatically instances have no associated value or category. As a result
learn resources through convolutional layers, with high per- the unsupervised learning algorithms do not aim to make
formance and accuracy. predictions but, instead, to find potentially useful hidden
In this context, we strongly believe that the frequent and structures and patterns that humans can interpret and that
broad use of DT in the studies addressed in this systematic allow a better description and understanding of the data
review was for the speed of training and mainly for provid- (Tan, Steinbach, and Kumar 2006).
ing a clear explanation for the results found by the model In this approach, the task of the ML algorithms is not to
(Lundberg and Lee 2017). find the right output from the input data but to explore the
data and be able to find clusters or make inferences accord-
Regression algorithms. Regression algorithm sare used in sit- ing to the similarities, patterns, and differences found evalu-
uations where the aim is to map the input variables to an ating the attributes of the instances, without any previous
output with a continuous value, i.e., any numerical value training (Tan, Steinbach, and Kumar 2006). The motivation
between two limits (Kan et al. 2019). Note that regression to use this approach is due to its ability to provide initial
algorithms as well as classification algorithms were widely insights that can then be used for testing scientific hypothe-
used in the studies selected in this review. ses and conduct research from a starting point for analysis.
The objective of regression is to define the parameter val- Unsupervised learning tasks are typically to find underly-
ues of a mathematical equation that defines y (the output to ing groups (clusters) in the data and/or reveal important
be predicted) as a function of variables x (input variables) so associations rules (Dey 2016). In our review, only four stud-
that the error concerning the adjusted curve and all the data ies applied unsupervised ML approaches of which three
used clustering algorithms and one used association
points is minimized. This equation, the final model, can
rules algorithms.
then be used to predict the result for new instances. In gen-
eral, a model fits the data well if the differences between the
Clustering algorithms. The most common task in unsuper-
observed values and the predicted values are small and
vised learning is clustering. In this case, the unlabeled data
unbiased (Pedregosa et al. 2011).
are analyzed and organized in clusters by their similarities
Among the many forms of regression described in the lit-
or dissimilarities (Tan, Steinbach, and Kumar 2006). The
erature (Multiple Linear Regression, Lasso Regression,
measurement of how similar or dissimilar the instances are
among others), one must select the best technique that
to each other is done using a proximity calculation, such as
explains the data to be analyzed. The best way to verify this
the Euclidean distance (Pedregosa et al. 2011). The goal is to
is by applying different regression models and comparing
create a clustering (a set of clusters) where instances in the
the performance in predicting for new instances (Goldstein,
same cluster are very similar to each other (each cluster is
Navar, and Carter 2017). For regression, we use as a meas-
cohesive), while instances in distinct clusters are highly dis-
ure of performance the calculated error in relation to the similar (i.e., clusters are well-separated from each other)
model obtained (curve) and the points belonging to the (Zheng et al. 2019). In a sense, clustering algorithms reveal
training set. Thus, the smaller the prediction error, the bet- hidden categories, i.e., each cluster can be thought as a class
ter the final performance of the model (Goldstein, Navar, of its instances (Ghorbani and Ghousi 2019).
and Carter 2017). The k-means algorithm is the most largely used clustering
The study by Pagamunici et al. (2014) aimed to develop a algorithm. To apply this procedure, it is necessary to give as
high nutritional value gluten-free granola and evaluate it input to the algorithm the number k of clusters sought.
during storage using ML techniques such as multivariate Initially, k centroids (center points) are randomly defined.
analysis and simple linear regression. Over the storage Then, for each following iteration, every instance is associ-
period analyzed, a positive correlation was observed between ated to its closest centroid, and each centroid is redefined
appearance and general acceptance and the product according to the grouped instances (typically, the new cen-
remained stable in relation to these parameters. The results troid location will be the mean point in the cluster). The
of this study demonstrate the high contribution and effect- redefinition of centroid and resulting association of instan-
iveness of the application of regression analysis. These ana- ces continue throughout multiple iterations until the cent-
lyzes enabled a presentation of an innovative predictive roids do not change anymore (Jain 2010).
report that gives greater prediction accuracy and a better- According to the study by Kwon et al. (2020), the appli-
tuned model to identify significant predictors. cation of the k-means algorithm was fundamental to extract
It is important to mention that some of the algorithms important and hidden information, such as the relashionship
used in classification problems, such as DT, Random Forest, between total energy and protein intake, which were difficult
SVM and ANN, also work as regression algorithms. to distinguish with conventional analyses. The k-means
However, those algorithms are modified to adapt the desired interestingly contributed to the proper formation of clusters
output type, in this case, a numeric value, not a categorical and the comparison of risk factors between them. Cluster-
label (Rodriguez-Galiano et al. 2015). specific risk factors were found to include high consumption
14 L. OLIVEIRA CHAVES ET AL.

of fat and smoking in the men’s cluster and low consump- potentially relevant associations or regularities between items
tion of carbohydrates, protein, fat, and alcohol consumption (or attribute values) of the instances (Lakshmi and Vadivu
in the women’s cluster. 2017). The following implication can represent rules: X ! Y,
Self-Organizing Map (SOM) is another well-known clus- where X is called rule antecedent and Y is called consequent.
tering technique that was found in the studies selected in The most well-known association rule algorithm is
this review. SOM is a particular type of unsupervised neural Apriori. Initially, it identifies the frequent individual items,
network, where neurons are arranged in a 2-dimensional i.e., those whose the number of occurrences in the dataset is
grid. Throughout the iterations, the neurons gradually agglu- greater or equal to a threshold called minimum support. In
tinate around regions presenting high density of data points. the second iteration, the algorithm seeks for frequent pair of
Therefore, regions with many neurons can be interpreted as items containing the frequent individual items of the previ-
clusters. (Fernandes and Filho 2019). ous iteration, taking the same minimum support into
The study by Mutter et al. (2017) used the SOM algo- account. Similarly, in the third iteration, the algorithm
rithm to highlight the inherent natural heterogeneity of determines the frequent item triplets containing the pairs
nutritional profiles and how they are associated with inci- found in iteration 2. Apriori keeps augmenting the sets of
dent anemia in a population setting, showing how nutri- frequent items in each following iteration until no changes
tional and economic differences between northern and are detected. Finally, the resulting frequent item sets are
southern Jiangsu predict differences in incident anemia. The used to build the association rules that unveil trends in the
authors highlighted the excellent contribution of this algo- dataset. To evaluate the putative rules, a minimum confi-
rithm for the complete separation between training and dence value is considered. The confidence value measures
evaluation data, being one of the strengths of the SOM the force of the implication described by the rule, i.e., it
approach, as its architecture avoids overfitting. Transparency measures how often items in Y appear in instances that con-
is another strong point of the SOM approach, where the tain X. Rules generated with a confidence value below the
process of defining subgroups and investigating their profiles minimum are discarded. Additionally, the lift measure can
is guided by the user and open to constructive criticism be used to evaluate the degree of correlation between X and
from other observers. Y in a rule (Lakshmi and Vadivu 2017).
Another clustering technique found in the studies The study by Jiang et al. (2020) used Apriori algorithm
included in this review was Hierarchical clustering, where to reveal correlations between dietary factors and anthropo-
data are partitioned successively, producing a hierarchical metric changes in middle-aged Danish citizens. This study
representation of the group. Hierarchical methods require a successfully identified subgroups that shared similar dietary,
matrix containing metrics of distance between clusters, this lifestyle, and anthropometric profiles. The authors mention
matrix is known as a matrix of similarities between groups. that this algorithm effectively contributes to the evaluation
Distance methods between groups are used to calculate of eating habits assessed by food frequency questionnaires,
proximity values between groups, such as the Euclidean dis- and that was able to retrieve known association rules, such
tance. Through the analysis of the dendogram (diagram that as the beneficial role of fruits and red meats in relation to
shows the hierarchy and the relationship of the clusters in a changes in waist circumference in both sexes.
structure) it is possible to infer the number of suitable clus- As demonstrated in this review, supervised learning algo-
ters. Hierarchical clustering generally falls into two types: rithms, whether of classification or regression, are more
agglomerative, with a bottom-up approach, in this case, all widely used for data mining. This is because supervised
elements start separately and are grouped in stages, one by learning is a much more objective task when compared to
one to form clusters, and the divisive, with a top-down unsupervised learning which has an exploratory characteris-
approach where all elements start together in a single cluster tic. Additionally, for the same reason, supervised learning
and divisions are performed recursively as the hierarchy is models are easier to validate and there are more validation
descended. As with the agglomerative method, we choose metrics available. Also, the possibility of classifying future
instances with the resulting model is of broad application
the optimal number of clusters from all possible
(Tan, Steinbach, and Kumar 2006).
combinations.
On the other hand, it is not always possible to perform
Guan et al. (2018), in his study, applied hierarchical clus-
supervised learning as it is common the situation in which
tering to explore food choices at meals in a sample of over-
only unlabeled data is available. In such cases, unsupervised
weight and obese participants. This algorithm allowed the
learning can be of great utility and that is why this machine
identification of food clusters closely related to meals based
learning field is also of great importance (Tan, Steinbach,
on reported foods and item frequencies in the screening of
and Kumar 2006). Exploratory analyses make it possible to
dietary data. According to the authors, these results can aid
obtain initial insights and an understanding of the behavior
in the development of strategies to improve food choices
of the data, thus facilitating the conduct of research that
and behavior change at the individual level through a deeper
does not yet have a final objective outlined.
understanding of these choices.

Association rules algorithm. Association rules comprise Computational tools


another type of unsupervised learning. However, instead of The big volume of data generated in daily health, including
clustering instances, this technique aims to discover the different types of medical records, whether electronic or
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION 15

paper, has led to a change in traditional data analysis forms. were the most used. The more frequent use of DT is pos-
Historically, the most used programs in the medical field are sibly because they are fast to apply, simple to understand,
SAS, SPSS, and STATA. However, some difficulties may and whose results are easy to interpret and explain. In add-
arise, such as updating or adding datasets of different types ition, there was a change in the use of computational tools
and sources or in unstructured data such as text and images for statistical analysis, with a tendency to use other software,
(Fernandes and Filho 2019). such as R, for being more complete, instead of the classic
In this work, the change in data analysis tools was noted, statistical programs.
as the vast majority of studies used the R software. This Regarding the assessment of food intake, we observed
software is an open-source, multi-platform, and free statis- that the FFQ was the most used method since it allows a
tical environment created by Ross Ihaka and Robert long-term evaluation, is simple, fast, and easy to administer
Gentleman in 1997 (Matloff 2009). R has become so popular and process. However, even understanding the importance
because it has a wide variety of integrated functions, pack- of investigating food intake in each population and how the
ages, and libraries that perform from simple to more com- use of ML algorithms can be interesting, there was little
plex tasks, such as applying statistical tests and ML diversity of countries involved in the studies analyzed. In
algorithms (Murrell 2005; Matloff 2009).
this sense, it is encouraged that studies focusing on the
Therefore, there was a growing trend to use the R soft-
application of ML algorithms in the investigation of food
ware instead of classic statistical software, especially for
intake in each country are conducted, as the problems faced
researchers who are not in the field of computing, such as
by different regions require different levels of research and
health, since R is more accepted by the scientific community
intervention for the development of food reeducation pro-
and is a more complete program.
The use of WEKA in the studies included in this review grams and specific public policies. Furthermore, health pro-
is also noteworthy. Its high adoption is probably due to its fessionals should understand that the use of ML is a
simplicity, to the fact that it contains many ML libraries collaborative activity, combining professional experience
implemented and ready to use, and, very importantly, it can with data analysis and processing, in order to facilitate deci-
be used without prior programming knowledge. sion making in planning and delivering health care.
We would like to stress that there are other machine
learning methods applied to food intake other than the ones
Strengths that we cited here. However, we exploited the most-fre-
This systematic review had several strengths, including the quently applied ML procedures to provide an overview of
fact that it is the first systematic review that analyzed the the main ML methods used in relevant publications in
application of different ML algorithms to evaluate food recent years.
intake in healthy and unhealthy individuals. In addition, we We suggest to researchers who use machine learning
include all studies regardless of the characteristics of the techniques in their studies that they mention broader search
populations studied, the type of study, the language, and the terms – such as: machine learning, deep learning, and data
year of publication. This decision allowed a broader search mining – in their texts and not just the specific names of
to identify and include all studies investigating food intake techniques, in order to expand their visibility and make it
with the application of different ML algorithms. Another easier the identification of their articles during the use of
strong point was the inclusion of studies that identified food search engines.
and beverage images to evaluated food consumption. This
inclusion allowed a more comprehensive review of the ML
application in the area of health, focusing on nutrition, Author contributions
highlighting the growth in the use of these algorithms in L.O.C., A.L.G.D., D.L.F., and J.B. designed the study. L.O.C and
recent years and the countries involved in these researches. A.L.G.D. selected and reviewed the articles and extracted the data.
Besides, both the main methods used to evaluate food intake L.O.C., A.L.G.D., and D.L.F. analyzed and interpreted the data and
and the main ML algorithms as well as computational tools drafted the manuscript. J.B., R.D-B., and F.R.C improved the manu-
script and critically revised the scientific content. All authors read and
employed were presented.
approved the final manuscript.

Conclusion
Conflict of interest
This review summarizes the latest information on the use of
The authors have no relevant interests to declare.
different ML algorithms to evaluate food intake. It can serve
as a guide for health professionals who want to work in the
area of AI. It is concluded from the results found that, cur- Funding
rently, there is a great and growing interest in the use of
This work was supported by the Fundaç~ao de Amparo a Pesquisa do
ML algorithms in the area of nutrition, mainly due to a sig- Estado de Minas Gerais (FAPEMIG), Belo Horizonte, Brazil; the
nificant increase in publications in recent years. Coordenaç~ao de Aperfeiçoamento de Pessoal de Nıvel Superior
In addition, it is also noted that the supervised learning (CAPES), Brasilia, Brazil; and the Conselho Nacional de
algorithms, more precisely those based on Decision Trees, Desenvolvimento Cientıfico e Tecnol
ogico (CNPq), Brasilia, Brazil.
16 L. OLIVEIRA CHAVES ET AL.

ORCID Australian & New Zealand Journal of Psychiatry 51 (11):1121–13.


doi:10.1177/0004867417726860.
Larissa Oliveira Chaves http://orcid.org/0000-0002-6962-2284 Easton, J. F. H R., Sicilia, and C. R. Stephens. 2019. Classification of
Ana Luiza Gomes Domingos http://orcid.org/0000-0001-7010-0574 diagnostic subcategories for obesity and diabetes based on eating
Daniel Louzada Fernandes http://orcid.org/0000-0002-6548-294X patterns. Nutrition & Dietetics: The Journal of the Dietitians
Fabio Ribeiro Cerqueira http://orcid.org/0000-0003-1325-2592 Association of Australia 76 (1):104–109. doi:10.1111/1747-0080.
Rodrigo Siqueira-Batista http://orcid.org/0000-0002-3661-1570 12495.
Josefina Bressan http://orcid.org/0000-0002-4993-9436 Faruqui, S. H. A. Y., Du, R. Meka, A. Alaeddini, C. Li, S. Shirinkam,
and J. Wang. 2019. Development of a deep learning model for
dynamic forecasting of blood glucose level for type 2 diabetes melli-
References tus: Secondary analysis of a randomized controlled trial. JMIR
Ahn, J. S., D. W. Kim, J. Kim, H. Park, and J. E. Lee. 2019. mHealth and uHealth 7 (11):e14452. doi:10.2196/14452.
Fernandes, F. T., and A. D. P. C. Filho. 2019. Data mining and
Development of a smartphone application for dietary self-monitor-
machine learning perspectives for occupational safety and health.
ing. Frontiers in nutrition 6:1–12. doi:10.3389/fnut.2019.00149.
Revista Brasileira de Sa ude Ocupacional 44:e13. doi:10.1590/2317-
Al-Maqaleh, B. M., and A. M. G. Abdullah. 2017. Intelligent predictive
6369000019418.
system using classification techniques for heart disease diagnosis.
Forman, E. M. S P., Goldstein, R. J. Crochiere, M. L. Butryn, A. S.
International Journal of Computer Science Engineering (IJCSE) 6 (6):
Juarascio, F. Z. Zhang, and G. D. Foster. 2019. Randomized con-
145–51.
trolled trial of OnTrack, a just-in-time adaptive intervention
Assari, R. P., Azimi, and M. R. Taghva. 2017. Heart disease diagnosis
designed to enhance weight loss. Translational Behavioral Medicine
using data mining techniques. International Journal of Economics &
6:1–13.
Management Sciences 06 (03):1–5. doi:10.4172/2162-6359.1000415.
Forman, E. M. S P., Goldstein, F. Zhang, B. C. Evans, S. M. Manasse,
Babu, S., E. M. Vivek, K. P. Famina, K. Fida, P. Aswathi, M. Shanid,
M. L. Butryn, A. S. Juarascio, P. Abichandani, G. J. Martin, and
and M. Hena. 2017. Heart disease diagnosis using data mining tech-
G. D. Foster. 2019. OnTrack: Development and feasibility of a
nique. Electronics, Communication and Aerospace Technology
smartphone app designed to predict and prevent dietary lapses.
(ICECA), International Conference 1:750–3.
Translational Behavioral Medicine 9 (2):236–245. doi:10.1093/tbm/
Bodnar, L. M., A. R. Cartus, S. I. Kirkpatrick, K. P. Himes, E. H.
iby016.
Kennedy, H. N. Simhan, W. A. Grobman, J. Y. Duffy, R. M. Silver,
Ghorbani, R., and R. Ghousi. 2019. Predictive data mining approaches
S. Parry, et al. 2020. Machine learning as a strategy to account for
in medical diagnosis: A review of some diseases prediction.
dietary synergy: An illustration based on dietary intake and adverse
International Journal of Data and Network Science 3:47–70. doi:10.
pregnancy outcomes. The American Journal of Clinical Nutrition 111
5267/j.ijdns.2019.1.003.
(6):1235–43. doi:10.1093/ajcn/nqaa027. Giabbanelli, P. J., and J. Adams. 2016. Identifying small groups of
Burgermaster, M., J. H. Son, P. G. Davidson, A. M. Smaldone, G. foods that can predict achievement of key dietary recommendations:
Kuperman, D. J. Feller, K. G. Burt, M. E. Levine, D. J. Albers, C. Data mining of the uk national diet and nutrition survey, 2008-12.
Weng, et al. 2020. A new approach to integrating patient-generated Public Health Nutrition 19 (9):1543–1551. doi:10.1017/
data with expert knowledge for personalized goal setting: A pilot S1368980016000185.
study. International Journal of Medical Informatics 139:104158. doi: Goldstein, B. A., A. M. Navar, and R. E. Carter. 2017. Moving beyond
10.1016/j.ijmedinf.2020.104158. regression techniques in cardiovascular risk prediction: Applying
Chmurzynska, A., M. A. Mlodzik-Czyzewska, A. M. Malinowska, J. machine learning to address analytic challenges. European Heart 38
Czarnocinska, and D. J. Wiebe. 2018. Use of a smartphone applica- (23):1805–1814.
tion can improve assessment of high-fat food consumption in over- Guan, V. X. Y C., Probst, E. P. Neale, M. J. Batterham, and L. C.
weight individuals. Nutrients 10 (11):1692–12. doi:10.3390/ Tapsell. 2018. Identifying usual food choices at meals in overweight
nu10111692. and obese study volunteers: Implications for dietary advice. The
ColorBrewer. 2020. Programa Color Brewer 2.0 Color Advice for car- British Journal of nutrition 120 (4):472–480. doi:10.1017/
tography. Disponpivelem. Accessed 2020. https://colorbrewer2.org/. S0007114518001587.
Cutillo, C. M., K. R. Sharma, L. Foschini, S. Kundu, M. Mackintosh, Hamad, R., Z. S. Templeton, L. Schoemaker, M. Zhao, and J.
and K. D. Mand. 2020. Machine intelligence in healthcare-perspec- Bhattacharya. 2019. Comparing demographic and health characteris-
tives on trustworthiness, explainability, usability, and transparency. tics of new and existing SNAP recipients: Application of a machine
NPJ Digital Medicine 3:47. doi:10.1038/s41746-020-0254-2. learning algorithm. The American Journal of clinical nutrition 109
Dao, M. C., A. F. Subar, M. Warthon-Medina, J. E. Cade, T. Burrows, (4):1164–1172. doi:10.1093/ajcn/nqy355.
R. K. Golley, N. G. Forouhi, M. Pearce, and B. A. Holmes. 2019. He, X. B R., Matam, S. Bellary, G. Ghosh, and A. K. Chattopadhyay.
Dietary assessment toolkits: An overview. Public Health Nutrition 22 2020. CHD risk minimization through lifestyle control: Machine
(3):404–418. doi:10.1017/S1368980018002951. learning gateway. Scientific Reports 10 (1):4090. doi:10.1038/s41598-
De Cos Juez, F. J., F. S. Lasheras, P. J. G. Nieto, and M. A. S. Suarez. 020-60786-w.
2009. A new data mining methodology applied to the modelling of Hearty, A. P., and M. J. Gibney. 2008. Analysis of meal patterns with
the influence of diet and lifestyle on the value of bone mineral dens- the use of supervised data mining techniques-artificial neural net-
ity in postmenopausal women. International Journal of Computer works and decision trees. The American Journal of Clinical Nutrition
Mathematics 86 (10-11):1878–87. doi:10.1080/00207160902783557. 88 (6):1632–42. doi:10.3945/ajcn.2008.26619.
De Cos Juez, F. J. M A., Suarez-Suarez, F. S. Lasheras, and A. Murcia- Higgins, J. P. T., and S. Green. 2011. Cochrane handbook for systematic
Maz on. 2011. Application of neural networks to the study of the reviews of interventions. West Sussex, England: John Wiley & Sons.
influence of diet and lifestyle on the value of bone mineral density Iwendi, C. S., Khan, J. H. Anajemba, A. K. Bashir, and F. Noor. 2020.
in post-menopausal women. Mathematical and Computer Modelling Realizing an efficient IoMT-assisted patient diet recommendation
54 (7-8):1665–1670. doi:10.1016/j.mcm.2010.11.069. system through machine learning model. IEEE Access 8:
Dey, A. 2016. Machine learning algorithms: A review. International 28462–28474. doi:10.1109/ACCESS.2020.2968537.
Journal of Computer Science and Information Technologies 7 (3): Jain, A. K. 2010. Data clustering: 50 years beyond k-means. Pattern
1174–1179. Recognition Letters 31 (8):651–666. doi:10.1016/j.patrec.2009.09.011.
Dipnall, J. F. J A., Pasco, M. Berk, L. J. Williams, S. Dodd, F. N. Jacka, Jia, W. Y., Li, R. Qu, T. Baranowski, L. E. Burke, H. Zhang, Y. Bai,
and D. Meyer. 2017. Getting RID of the blues: Formulating a Risk J. M. Mancino, G. Xu, Z.-H. Mao, et al. 2019. Automatic food detec-
Index for Depression (RID) using structural equation modeling. tion in egocentric images using artificial intelligence technology.
CRITICAL REVIEWS IN FOOD SCIENCE AND NUTRITION 17

Public Health Nutrition 22 (7):1168–1179. doi:10.1017/ Computer Modelling 50 (5-6):673–679. doi:10.1016/j.mcm.2008.12.


S1368980018000538. 024.
Jiang, L., K. Audouze, J. A. R. Herrera, L. H. Angquist, S. K. Kjaerulff, Pagamunici, L., M. De Souza, A. H. P. Gohara, A. K. Silvestre, A. A. F.
J. M. G. Izarzugaza, A. Tjønneland, J. Halkjaer, K. Overvad, T. I. A. Visentainer, J. V. De Souza, N. E. Gomes, and S. T. M. Matsushita.
Sørensen, et al. 2020. Conflicting associations between dietary pat- 2014. Multivariate study and regression analysis of gluten-free gran-
terns and changes of anthropometric traits across subgroups of mid- ola. Food Science and Technology 34 (1):127–134. doi:10.1590/S0101-
dle-aged women and men. Clinical Nutrition (Edinburgh, Scotland) 20612014005000005.
39 (1):265–275. doi:10.1016/j.clnu.2019.02.003. Panaretos, D. E., Koloverou, A. C. Dimopoulos, G.-M. Kouli, M.
Kan, H. J., H. Kharrazi, H.-Y. Chang, D. Bodycombe, K. Lemke, and Vamvakari, G. Tzavelas, C. Pitsavos, and D. B. Panagiotakos. 2018.
J. P. Weiner. 2019. Exploring the use of machine learning for risk A comparison of statistical and machine-learning techniques in eval-
adjustment: A comparison of standard and penalized linear regres- uating the association between dietary patterns and 10-year cardio-
sion models in predicting health care costs in older adults. PLoS metabolic risk (2002-2012): The ATTICA study. The British Journal
One 14 (3):e0213258. doi:10.1371/journal.pone.0213258. of Nutrition 120 (3):326–334. doi:10.1017/S0007114518001150.
Kanerva, N. J., Kontto, M. Erkkola, J. Nevalainen, and S. Mannisto. Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.
2018. Suitability of random forest analysis for epidemiological Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg.
research: Exploring sociodemographic and lifestyle-related risk fac- 2011. Scikit-learn: Machine learning in Python. The Journal of
tors of overweight in a cross-sectional design. Scandinavian Journal Machine Learning Research 12:2825–2830.
of Public Health 46 (5):557–564. doi:10.1177/1403494817736944. Popkin, B. M. L S., Adair, and S. W. Ng. 2012. Global nutrition transi-
Khan, A. B., Baharudin, L. H. Lee, and K. Khan. 2010. A review of tion and the pandemic of obesity in developing countries. Nutrition
machine learning algorithms for text-documents classification. Reviews 70 (1):3–21. doi:10.1111/j.1753-4887.2011.00456.x.
Journal of advances in information technology 1 (1):4–20. Rajput, A. R P., Aharwal, M. Dubey, S. Saxena, and M. Raghuvanshi.
Kodati, S. R., Vivekanandam, and G. Ravi. 2019. Comparative analysis 2011. J48 and JRIP rules for e-governance data. International
of clustering algorithms with heart disease data sets using data min- Journal of Computer Science and Security 5 (2):201–207.
ing weka tool. Soft Computing and Signal Processing 111–117. Reis, R. H., Peixoto, J. Machado, and A. Abelha. 2017. Machine learn-
Kwon, Y-JH S., Kim, D.-H. Jung, and J.-K. Kim. 2020. Cluster analysis ing in nutritional follow-up research. Open Computer Science 7 (1):
of nutritional factors associated with low muscle mass index in mid- 41–45. doi:10.1515/comp-2017-0008.
Rodriguez-Galiano, V., M. Sanchez-Castillo, M. Chica-Olmo, and M.
dle-aged and older adults. Clinical Nutrition (Edinburgh, Scotland)
Chica-Rivas. 2015. Machine learning predictive models for mineral
39 (11):3369–3376. doi:10.1016/j.clnu.2020.02.024.
prospectivity: An evaluation of neural networks, random forest,
Lakshmi, K. S., and G. Vadivu. 2017. Extracting association rules from
regression trees and support vector machines. Ore Geology Reviews
medical health records using multi-criteria decision analysis.
71:804–818. doi:10.1016/j.oregeorev.2015.01.001.
Procedia Computer Science 115:290–95.
Rosso, N., and P. Giabbanelli. 2018. Accurately inferring compliance to
Latha, R., and T. Thegaleesan. 2019. Complexity of food choice and
five major food guidelines through simplified surveys: Applying data
statistical techniques. International Journal of Innovative Studies in
mining to the UK National Diet and Nutrition Survey. JMIR Public
Sociology and Humanities 4 (2):90–95.
Health and surveillance 4 (2):e56. doi:10.2196/publichealth.9536.
Lazarou, C., M. Karaolis, A.-L. Matalas, and D. B. Panagiotakos. 2012.
Rupasinghe, W. S. W. A., H. T. S. Perera, and N. M. J. Wickramaratne.
Dietary patterns analysis using data mining method. An application
2020. A comprehensive review on dietary assessment methods in
to data from the CYKIDS study. Computer Methods and programs
epidemiological research. Public Health Nutrition 3 (1):204–211.
in biomedicine 108 (2):706–714. doi:10.1016/j.cmpb.2011.12.011. Savage, A. H., Bambrick, and D. Gallegos. 2020. From garden to store:
Liberati, A. D G., Altman, J. Tetzlaff, C. Mulrow, P. C. Gøtzsche,
Local perspectives of changing food and nutrition security in a
J. P. A. Ioannidis, M. Clarke, P. J. Devereaux, J. Kleijnen, and D. Pacific Island country. Food Security 12 (6):1331–1348.
Moher. 2009. The PRISMA statement for reporting systematic Shao, Z., C. Chen, W. Li, H. Ren, and W. Chen. 2019. Assessment of
reviews and meta-analyses of studies that evaluate healthcare inter- the risk factors in the daily life of stroke patients based on an opti-
ventions: Explanation and elaboration. Research Methods & mized decision tree. Technology and health care: Official journal of
Reporting 339:b2700. doi:10.1136/bmj.b2700. the European Society for Engineering and Medicine 27 (S1):317–S329.
Lundberg, S. M., and S.-I. Lee. 2017. A unified approach to interpreting doi:10.3233/THC-199030.
model predictions. Advances in Neural Information Processing Sharp, D. B., and M. Allman-Farinelli. 2014. The feasibility and validity
Systems 4765–4774. of mobile phones to assess dietary intake. Nutrition (Burbank, Los
Ma, S., and X. Chen. 2019. A data mining approach to predict risk of Angeles County, Calif.) 30 (11-12):1257–1266. doi:10.1016/j.nut.2014.
cardiovascular. AIP Conference Proceedings. 02.020.
Matloff, N. 2009. The art of R programming: A tour of statistical soft- Shiao, S. P. K., J. Grayson, A. Lie, and C. H. Yu. 2018a. Personalized
ware design. 373 p. nutrition—Genes, diet, and related interactive parameters as predic-
Mezgec, S., and B. K. Seljak. 2017. NutriNet: A deep learning food and tors of cancer in multiethnic colorectal cancer families. Nutrients 10
drink image recognition system for dietary assessment. Nutrients 9 (6):795. doi:10.3390/nu10060795.
(7):657. doi:10.3390/nu9070657. Shiao, S. P. K., J. Grayson, A. Lie, and C. H. Yu. 2018b. Predictors of
Michalski, R. S., J. G. Carbonell, and T. M. Mitchell. 2013. Machine the healthy eating index and glycemic index in multi-ethnic colorec-
learning: An artificial intelligence approach. Springer Science & tal cancer families. Nutrients 10 (6):674. doi:10.3390/nu10060674.
Business Media. Shim, J.-S., K. Oh, and H. C. Kim. 2014. Dietary assessment methods
Murrell, P. 2005. R graphics. 1st ed. Editora: Chapman and Hall/CRC; in epidemiologic studies. Epidemiology and health 36:e 2014009. doi:
328 p. 10.4178/epih/e2014009.
Mutter, S. A. E., Casey, S. Zhen, Z. Shi, and V.-P. M€akinen. 2017. Shiokawa, Y., Y. Date, and J. Kikuchi. 2018. Application of kernel prin-
Multivariable analysis of nutritional and socio-economic profiles cipal component analysis and computational machine learning to
shows differences in incident anemia for Northern and Southern exploration of metabolites strongly associated with diet. Scientific
Jiangsu in China. Nutrients 9 (10):1153. doi:10.3390/nu9101153. Reports 8 (1):3426. doi:10.1038/s41598-018-20121-w.
Narziev, N. H., Goh, K. Toshnazarov, S. A. Lee, K.-M. Chung, and Y. Silva, B. V. R., M. G. Rad, J. Cui, M. Mccabe, and K. Pan. 2018. A
Noh. 2020. STDD: Short-term depression detection with passive mobile-based diet monitoring system for obesity management.
sensing. Sensors 20 (5):1396. doi:10.3390/s20051396. Journal of Health & Medical Informatics 9 (2):1–20.
Ord ~ez, C. J M., Matıas, J. F. De Cos Juez, and P. J. Garcıa. 2009.
on Silvera, S. A. N., S. T. Mayne, M. D. Gammon, T. L. Vaughan, W.-H.
Machine learning techniques applied to the determination of osteo- Chow, J. A. Dubin, R. Dubrow, J. L. Stanford, A. B. West, H.
porosis incidence in post-menopausal women. Mathematical and Rotterdam, et al. 2014. Diet and lifestyle factors and risk of subtypes
18 L. OLIVEIRA CHAVES ET AL.

of esophageal and gastric cancers: Classification tree analysis. Annals Xu, R., B. E. Blanchard, J. M. McCaffrey, S. Woolley, L. M. L. Corso,
of epidemiology 24 (1):50–57. doi:10.1016/j.annepidem.2013.10.009. and V. B. Duffy. 2020. Food liking-based diet quality indexes (DQI)
Singh, P. S., Singh, and G. S. Pandi-Jai. 2018. Effective heart disease generated by conceptual and machine learning explained variability
prediction system using data mining techniques. International in cardiometabolic risk factors in young adults. Nutrients 12 (4):882.
Journal of nanomedicine 13:121–124. doi:10.2147/IJN.S124998. doi:10.3390/nu12040882.
Siqueira-Batista, R., and E. Silva. 2019. Notas sobre os fundamentos Yu, E. Y. W., A. Wesselius, C. Sinhart, A. Wolk, M. C. Stern, X. Jiang,
matematicos da Intelig^encia Artificial. Revista De Ci^encia, Tecnologia L. Tang, J. Marshall, E. Kellen, P. van den Brandt, et al. 2020. A
e Inovaç~ao 4:44–54. data mining approach to investigate food groups related to incidence
Smallwood, R. D., and E. J. Sondik. 1973. The optimal control of par- of bladder cancer in the bladder cancer epidemiology and nutri-
tional determinants international study. The British Journal of nutri-
tially observable Markov processes over a finite horizon. Operations
tion 124 (6):611–619. doi:10.1017/S0007114520001439.
Research 21 (5):1071–1088. doi:10.1287/opre.21.5.1071.
Zeevi, D., T. Korem, N. Zmora, D. Israeli, D. Rothschild, A. Weinberger,
Tan, P. N., M. Steinbach, and V. Kumar. 2006. Introduction to data
O. Ben-Yacov, D. Lador, T. Avnit-Sagi, M. Lotan-Pompan, et al.
mining. S~ao Carlos: Pearson Education.
2015. Personalized nutrition by prediction of glycemic responses. Cell
Vasileska, A., and G. Rechkoska. 2012. Global and regional food con-
163 (5):1079–1094. doi:10.1016/j.cell.2015.11.001.
sumption patterns and trends. Procedia - Social and Behavioral Zenitani, S. H., Nishiuchi, and T. Kiuchi. 2010. Smart-card-based auto-
Sciences 44:363–369. doi:10.1016/j.sbspro.2012.05.040. matic meal record system intervention tool for analysis using data
Vucic, V., M. Glibetic, R. Novakovic, J. Ngo, D. Ristic-Medic, J. Tepsic, mining approach. Nutrition Research (New York, N.Y.) 30 (4):
M. Ranic, L. Serra-Majem, and M. Gurinovic. 2009. Dietary assess- 261–270. doi:10.1016/j.nutres.2010.04.003.
ment methods used for low-income populations in food consump- Zheng, Q., H. Delingette, K. Fung, S. E. Petersen, and N. Ayache. 2019.
tion surveys: A literature review. British Journal of Nutrition 101 Unsupervised shape and motion analysis of 3822 cardiac 4D MRI of
(S2):S95–S101. doi:10.1017/S0007114509990626. UK Biobank. Preprint submitted toarXiv.

You might also like