0% found this document useful (0 votes)

7 views3 pages

Guió Data Processing

Uploaded by

x9zdyv9yj9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Guió Data Processing

Uploaded by

x9zdyv9yj9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

DIAPO 3: INTRODUCTION

Life expectancy shows how many years a person is expected to live. It

depends on many things like health, income, education, and access to
medical care.

This project aims to explore two central questions:

1. Which variables have the greatest impact on life expectancy?

2. Can we predict life expectancy using available data?

To answer this questions, public data from the World Health

Organization is used.

DIAPO 4: DATASET OVERVIEW

This dataset contains information from 193 countries between 2000

and 2015. The dataset includes 22 variables including:

 Demographic: Population, child and adult deaths

 Economic: GDP, income level
 Health: vaccine rates, alcohol use, HIV/AIDS
 Social: years of schooling, development status

The goal is to study how these relate to life expectancy, the main
variable we want to explain and predict.

DIAPO 5: DATA PREPROCESSING

Before starting the analysis, the dataset was cleaned and prepared.

First, I checked the structure and missing values. The year 2015 had
too many missing values, so I removed it from the dataset.

Next, I filled the missing values:

 For numerical variables I used the median of each country.

 For categorical variables I used the mode in each country.

Some variables still had missing values for entire countries, so I filled
those using the global average of all countries.

DIAPO 6: EXPLORATORY DATA ANALYSIS

I started by making scatter plots to compare different variables with

life expectancy. These helped show general trends. I also calculated a
correlation matrix to find which variables were most related to life
expectancy in a linear way.
I took the initial 8 most correlated values and plotted them but
realized that schooling and income composition had strange zero
values that seemed incorrect. I fixed these using values from nearby
years and removed countries that didn’t have valid data.

I also noticed a strange bimodal shape in the plot of adult mortality

vs. life expectancy, but could not find a clear reason, so for prediction
I decided to remove this lower part. (posar molts plots de aquests)

Then I focused on the two most correlated variables (schooling and

income) and saw that there was a pretty accurate linear trend for
both of them with respect to life expectancy.

I also found some interesting results in grouped plots where I grouped

by regions and for better visualization I chose a representative
country from each major region:

In this case, the distribution was clearly different between countries.

Schooling and income are both related to the country, and also
strongly related to life expectancy. The more developed a country
is, the higher its schooling, income, and life expectancy.

Another clear pattern appeared when grouping by development

status (developed or developing countries).

And In the case of the 5 representative countries, I saw a slight

increase in life expectancy over time, although there were some
outliers.

Next, I tested four types of models (linear, quadratic, cubic and

logarithmic) to see which worked best for each variable, since until
now, I had only worked with linear correlations. I used R^2 to see
which model showed better results for each variable. I also
removed outliers from some variables (BMI, Diphtheria, Polio, and
Adult Mortality) to improve results.

Because of this, the correlation rankings changed, so I selected

the top 8 variables again and created new final plots using
the best model for each one.

DIAPO 7: PREDICTIVE MODEL

The objective of this stage was to build a regression model capable of

predicting life expectancy using the eight most correlated variables,
each transformed using the best-fitting functional form (linear,
quadratic, cubic, or logarithmic).

Methodology:
 I selected the top 8 variables with highest R^2 from previous
correlation analysis.
 Then applied the best transformation for each variable to
maximize the model performance.
 Split the dataset randomly into 80% training and 20% testing
subsets.
 Trained a multiple regression model using the transformed
variables.
 Evaluated the model using MSE and RMSE on the test set.

Model performance:

 R² = 0.8663, Adjusted R² = 0.8658: The model explains

86.6% of the variance in life expectancy, a very strong result in
the context of health and development data.
 RMSE = 3.72 years: Predictions are accurate within ±3.7
years, which is reasonable given typical life expectancy values
range from 40 to 85 years.
 Statistical significance: F-statistic and p-value (< 2.2e−16)
confirm the model’s global significance.

Most Impactful variables

 Significant predictors (with low p-values):

 HIV/AIDS: strong negative effect (−4.08)
 Income composition of resources: strong positive
effect (+27.25)
 Adult Mortality: negative influence (−0.014)
 Other significant variables: Percentage expenditure,
Diphtheria, Polio, BMI.
 Schooling was not significant (p = 0.604) likely due to
multicollinearity or loss of variation after transformation.

So to answer the two initial questions. For the first one the answer
would be the mainly the income composition, schooling, HIV/AIDS and
immunization.

And for the second question related to prediction, yes, we can predict
life expectancy with this data with a relatively high accuracy.

Math IA Example
100% (1)
Math IA Example
26 pages
Applied Survival Analysis Using R (Use R!) - 1st Ed. 2016 Edition. ISBN 331931243X, 978-3319312439
100% (23)
Applied Survival Analysis Using R (Use R!) - 1st Ed. 2016 Edition. ISBN 331931243X, 978-3319312439
23 pages
Analysis of Multivariate Survival Data Full Download
No ratings yet
Analysis of Multivariate Survival Data Full Download
15 pages
Determinants of Life Expectancy
No ratings yet
Determinants of Life Expectancy
46 pages
Student Reading Comprehension Test
No ratings yet
Student Reading Comprehension Test
2 pages
Wonderlic Test Answer Key & Ranks
No ratings yet
Wonderlic Test Answer Key & Ranks
2 pages
Life Expectancy Using Data Analytics
100% (1)
Life Expectancy Using Data Analytics
9 pages
Soal Bahasa Inggris Bab Colors Warna-Warna Dan Kunci Jawaban
No ratings yet
Soal Bahasa Inggris Bab Colors Warna-Warna Dan Kunci Jawaban
8 pages
Population and Lifespan - The Linear Regression Mini-Project
No ratings yet
Population and Lifespan - The Linear Regression Mini-Project
4 pages
A Lesson Learnt: Read The Text Below and Answer Questions 17 To 24
100% (1)
A Lesson Learnt: Read The Text Below and Answer Questions 17 To 24
4 pages
Life Expectancy Prediction Through Analysis of Immunization and HDI Factors Using Machine Learning Regression Algorithms
No ratings yet
Life Expectancy Prediction Through Analysis of Immunization and HDI Factors Using Machine Learning Regression Algorithms
11 pages
Presentation Júlia
No ratings yet
Presentation Júlia
13 pages
MATH1 Q1 W1 MATATAG DLL New
No ratings yet
MATH1 Q1 W1 MATATAG DLL New
6 pages
The Determinants of Life Expectancy: A Cross-Country Multiple Linear Regression Analysis
No ratings yet
The Determinants of Life Expectancy: A Cross-Country Multiple Linear Regression Analysis
17 pages
Bio Enzyme Recipes
No ratings yet
Bio Enzyme Recipes
5 pages
Unit Operations in Mineral Processing: Prof. Rodrigo Serna and Dr. Robert Hartmann Spring 2019 Aalto University
No ratings yet
Unit Operations in Mineral Processing: Prof. Rodrigo Serna and Dr. Robert Hartmann Spring 2019 Aalto University
46 pages
Community Health Nursing Review (Edited)
91% (35)
Community Health Nursing Review (Edited)
407 pages
SAS Miner Assignment
No ratings yet
SAS Miner Assignment
12 pages
Esu105b Surveying I Notes 2024 A
No ratings yet
Esu105b Surveying I Notes 2024 A
143 pages
ASSIGNMENT
No ratings yet
ASSIGNMENT
8 pages
The Regression Project Report
No ratings yet
The Regression Project Report
4 pages
UFS SW Module 1 Review KEY
No ratings yet
UFS SW Module 1 Review KEY
5 pages
New Balance Developing An Integrated CSR Strategy
No ratings yet
New Balance Developing An Integrated CSR Strategy
9 pages
Life Expectancy Data Analysis Report
No ratings yet
Life Expectancy Data Analysis Report
14 pages
The Ultimate 5-Ingredient Cookbook - Fast and Flavorful 5 Ingredients or Less Recipes For Any Skill Leve
100% (1)
The Ultimate 5-Ingredient Cookbook - Fast and Flavorful 5 Ingredients or Less Recipes For Any Skill Leve
105 pages
Proiect Econometrie
No ratings yet
Proiect Econometrie
15 pages
Wine Study Rough Work
No ratings yet
Wine Study Rough Work
1 page
Reissuance Process - Lost Owner's Duplicate
No ratings yet
Reissuance Process - Lost Owner's Duplicate
5 pages
Chapter 4 Assigment
No ratings yet
Chapter 4 Assigment
3 pages
Debopriya
No ratings yet
Debopriya
21 pages
Frailty Models in Survival Analysis 1st Edition Full PDF Download
No ratings yet
Frailty Models in Survival Analysis 1st Edition Full PDF Download
16 pages
Hdi Report 1
No ratings yet
Hdi Report 1
9 pages
AIML Assignment 2
No ratings yet
AIML Assignment 2
2 pages
English Exam Video Guide
No ratings yet
English Exam Video Guide
8 pages
Statistics: An Introduction Using R by M.J. Crawley Exercises
No ratings yet
Statistics: An Introduction Using R by M.J. Crawley Exercises
29 pages
Introductory Econometrics Group Project
No ratings yet
Introductory Econometrics Group Project
9 pages
Regression Analysis of Gapminder Data
No ratings yet
Regression Analysis of Gapminder Data
41 pages
Life Expectancy Data Analysis Report
No ratings yet
Life Expectancy Data Analysis Report
14 pages
Factors Contributing To Lower Value of Life Expectancy
No ratings yet
Factors Contributing To Lower Value of Life Expectancy
18 pages
Can We Really Live Longer - A Machine Learning Study - by Nicolasdealba - Medium
No ratings yet
Can We Really Live Longer - A Machine Learning Study - by Nicolasdealba - Medium
34 pages
Adsız Doküman
No ratings yet
Adsız Doküman
3 pages
The Relationship Between Life Expectancy at Birth and Health Expenditures Estimated by A Cross-Country and Time-Series Analysis
No ratings yet
The Relationship Between Life Expectancy at Birth and Health Expenditures Estimated by A Cross-Country and Time-Series Analysis
7 pages
Cambridge O Level: Economics 2281/11
No ratings yet
Cambridge O Level: Economics 2281/11
12 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
5 pages
Economics Students' GDP & Life Study
No ratings yet
Economics Students' GDP & Life Study
16 pages
VCE General Mathematics Unit 3 AOS1 Review
No ratings yet
VCE General Mathematics Unit 3 AOS1 Review
6 pages
Factors Effecting Life Expectancy in Developed and Developing Countries of The World (An Approach To Available Literature)
No ratings yet
Factors Effecting Life Expectancy in Developed and Developing Countries of The World (An Approach To Available Literature)
4 pages
Machine Learning For Prognosis of Life Expectancy
No ratings yet
Machine Learning For Prognosis of Life Expectancy
7 pages
Term Project - Stats 1E
No ratings yet
Term Project - Stats 1E
14 pages
Group 1 Project Report DA
No ratings yet
Group 1 Project Report DA
65 pages
Service Manual: Finisher
No ratings yet
Service Manual: Finisher
235 pages
Sociology: Intermediate Quantitative Research Method
No ratings yet
Sociology: Intermediate Quantitative Research Method
37 pages
Group 16
No ratings yet
Group 16
24 pages
Music Listening and Critical Thinking
No ratings yet
Music Listening and Critical Thinking
15 pages
WEEK 3 Activity - Assignment 1
No ratings yet
WEEK 3 Activity - Assignment 1
5 pages
(Chapman & Hall CRC Biostatistics Series'',) Andreas Wienke - Frailty Models in Survival Analysis (Chapman & Hall CRC Biostatistics Series) - Chapman and Hall - CRC (2010)
No ratings yet
(Chapman & Hall CRC Biostatistics Series'',) Andreas Wienke - Frailty Models in Survival Analysis (Chapman & Hall CRC Biostatistics Series) - Chapman and Hall - CRC (2010)
320 pages
Statistical Lifetime Models
No ratings yet
Statistical Lifetime Models
197 pages
World Happiness Regression Analysis
No ratings yet
World Happiness Regression Analysis
11 pages
HSSC Cet: HKRHZ Ijh (Kk&2025
No ratings yet
HSSC Cet: HKRHZ Ijh (Kk&2025
128 pages
Determinants of Life Expectancy in Developing
No ratings yet
Determinants of Life Expectancy in Developing
46 pages
3caffc7a4480bae36d9b13faa92ee16f
No ratings yet
3caffc7a4480bae36d9b13faa92ee16f
11 pages
Ucs Director Admin Guide
No ratings yet
Ucs Director Admin Guide
164 pages
Sociology: Intermediate Quantitative Research Method
No ratings yet
Sociology: Intermediate Quantitative Research Method
34 pages
Statistics P
No ratings yet
Statistics P
14 pages
2022bbe1052 Ecotrix Merged
No ratings yet
2022bbe1052 Ecotrix Merged
18 pages
Nip: 903217 Name:: Villaverde Navarcorena, Inés
No ratings yet
Nip: 903217 Name:: Villaverde Navarcorena, Inés
10 pages
Proiect Econometrie
No ratings yet
Proiect Econometrie
15 pages
Eab Research Paper
No ratings yet
Eab Research Paper
21 pages
Female Life Expectancy Trends Exam
No ratings yet
Female Life Expectancy Trends Exam
8 pages
Bloom Canning Sevilla 2004
No ratings yet
Bloom Canning Sevilla 2004
13 pages
Survival Analysis Model Selection Guide
No ratings yet
Survival Analysis Model Selection Guide
26 pages
Lab 3 (Tutorial 1)
No ratings yet
Lab 3 (Tutorial 1)
20 pages
Nana - Andre Wendindonde - TFM
No ratings yet
Nana - Andre Wendindonde - TFM
21 pages
ECON20003 S1 2024 Sample Exam
No ratings yet
ECON20003 S1 2024 Sample Exam
27 pages
Group1 Assignment MAS202 MKT1504 Edit
No ratings yet
Group1 Assignment MAS202 MKT1504 Edit
27 pages
LTE: High-Speed Mobile Networks
No ratings yet
LTE: High-Speed Mobile Networks
15 pages
Exam4135 2004 Solutions
No ratings yet
Exam4135 2004 Solutions
8 pages
Advanced Statistical Methods Project: Data Analysis Using Spss
No ratings yet
Advanced Statistical Methods Project: Data Analysis Using Spss
31 pages
Full Download Linux Fundamentals Second Edition Richard Blum PDF
No ratings yet
Full Download Linux Fundamentals Second Edition Richard Blum PDF
40 pages
Labs
No ratings yet
Labs
114 pages
CAPE Chemistry Data Booklet New
No ratings yet
CAPE Chemistry Data Booklet New
10 pages
Additional Illustration 17
No ratings yet
Additional Illustration 17
2 pages
Mini Project 1
No ratings yet
Mini Project 1
9 pages
DMS Admission
No ratings yet
DMS Admission
3 pages
A Study of Timing in Two Louis Armstrong Solos
No ratings yet
A Study of Timing in Two Louis Armstrong Solos
21 pages
Friends - The One With Russ
No ratings yet
Friends - The One With Russ
15 pages
Ready To Progress Assessment
No ratings yet
Ready To Progress Assessment
5 pages
Soal Ulangan Genap3
No ratings yet
Soal Ulangan Genap3
7 pages
w2 - For Students - w2 - Preparation For Chap 5
No ratings yet
w2 - For Students - w2 - Preparation For Chap 5
3 pages

Guió Data Processing

Uploaded by

Guió Data Processing

Uploaded by

DIAPO 3: INTRODUCTION

Life expectancy shows how many years a person is expected to live. It

This project aims to explore two central questions:

1. Which variables have the greatest impact on life expectancy?

To answer this questions, public data from the World Health

DIAPO 4: DATASET OVERVIEW

This dataset contains information from 193 countries between 2000

 Demographic: Population, child and adult deaths

DIAPO 5: DATA PREPROCESSING

Next, I filled the missing values:

 For numerical variables I used the median of each country.

DIAPO 6: EXPLORATORY DATA ANALYSIS

I started by making scatter plots to compare different variables with

I also noticed a strange bimodal shape in the plot of adult mortality

Then I focused on the two most correlated variables (schooling and

I also found some interesting results in grouped plots where I grouped

In this case, the distribution was clearly different between countries.

Another clear pattern appeared when grouping by development

And In the case of the 5 representative countries, I saw a slight

Next, I tested four types of models (linear, quadratic, cubic and

Because of this, the correlation rankings changed, so I selected

DIAPO 7: PREDICTIVE MODEL

The objective of this stage was to build a regression model capable of

 R² = 0.8663, Adjusted R² = 0.8658: The model explains

Most Impactful variables

 Significant predictors (with low p-values):

You might also like