Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views3 pages

Biostatistics II. Final Assignment

Uploaded by

adriansyahtrial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views3 pages

Biostatistics II. Final Assignment

Uploaded by

adriansyahtrial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Here’s the final assignment for Biostatistics II.

Complete the exercises below and upload


them to the Final Assignment folder. Please create a separate folder inside it with your
name and upload the answers as separate files. The files should be named as follows:
name, last name, task.
The deadline is June 7th at 23:59.

1) Confounding Task:
Using the provided dataset, you are required to determine if age is a confounder and
calculate the adjusted (if needed) association between diabetes and sugar consumption in
grams per day.
Instructions:
1. Examine the dataset to understand the variables: "Age" (in years),
"Sugar_Consumption" (in grams per day), and "Has_Diabetes" (0 for no, 1 for yes).
2. Determine if age is a confounder in the association between diabetes and sugar
consumption.
3. If age is a confounder, calculate the adjusted association between diabetes and sugar
consumption.
4. Present your findings and methodology used in a clear and concise manner. Perform
tasks in R, and present R code.

If you have any questions regarding the task, kindly email: [email protected]

Dataset:
Use a dataset from this folder with your ID in filename:
Student ID can be found in this table
Confounding

If you have any questions regarding the task, kindly email: [email protected]

2) Missing Data Task:


Using the provided dataset, you are required to determine if the missing data follows the
Missing at Random (MAR) or Missing Not at Random (MNAR) mechanism. If MAR, you
should impute the missing values using the Multiple Imputation by Chained Equations
(MICE) method and estimate the association between asbestos exposure and lung cancer. If
MNAR, you should perform worst/best scenario analysis.

Instructions:
1. Examine the dataset to understand the variables: "smoking" (0 for no, 1 for yes), "age"
(in years), "asbestos" (0 for no exposure, 1 for exposure), and "lung_cancer" (0 for no, 1 for
yes).
2. Determine if the missing data follows the Missing at Random (MAR) or Missing Not at
Random (MNAR) mechanism.
3. If MAR, impute the missing values using the Multiple Imputation by Chained Equations
(MICE) method.
4. Estimate the association between asbestos exposure and lung cancer after
imputation.
5. If MNAR, perform worst/best scenario analysis to estimate the association between
asbestos exposure and lung cancer.
6. Present your findings and methodology used in a clear and concise manner. Perform
tasks in R, and present R code.

If you have any questions regarding the task, kindly email: [email protected]

Dataset:
Use a dataset from this folder with your ID in filename:
Missing Data

3) Simple and Multiple Linear Regressions Task

Using the Boston Housing dataset from “mlbench” library, you are required to perform
simple and multiple linear regression analyses using built-in R statistical functions, and write
a short technical report on your findings.

Instructions:
1. Prior to doing this task, please apply the following transformation to the data so that
each one of you have to deal with different data and may come up with different
results and conclusions. To do that, you are supposed to use the code below setting
the seed (random number generator) equal to your individual student number, i.e.
336245 for Imam, 410860 for Maria, etc. using (amending “123” by your own ID). If
you are unsure, please let me know.

library(dplyr)
BostonHousing<- BostonHousing %>% select("crim", "indus", "rm", "age", "tax",
"ptratio")
set.seed(123)
df<- cbind(BostonHousing[1], apply(BostonHousing[2:6], c(1,2),
function(x){x+round(runif(1, -2, 10), 2)}))

2. First, give a brief introduction to describe the dataset being analysed using the R help
or Google with some explanation behind the different variables of the data: how
many records (rows) and how many variables (columns) are available in the data as
well as what these variables represent. Comment on whether some values for
quantitative variables are surprising (you are advised to produce a boxplot to
support your comment).
3. Fit multiple linear regression model with the outcome being variable “per capita
crime rate” using all the available predictors. Comment on the regression coefficients
(null hypothesis and interpretation) and p-values for all the variables that show
statistical significance at 5% level. If none of the variables show statistical significance
at 5% level, comment on the one with the smallest p-value.
4. Choose the predictor with the smallest p-value (if more than one predictor has the
same smallest values, choose any one of them) and fit a simple linear regression
model using it. Comment of the regression coefficient and p-value for this single
predictor.
5. Compare the R-squared and adjusted R-squared between multiple and simple linear
regressions and comment on your findings.

If you have any questions regarding the task, kindly email: [email protected]

4) Life Tables Task


Using the available databases with the age-specific death rates , e.g.

- http://demogr.nes.ru/en/demogr_indicat/data_description (RusFMD)

- https://tochno.st/datasets

- https://www.mortality.org/

or any others, please construct the life tables


The variables that should be presented in the life table are following:
l(x), q(x), p(x), d(x), L(x), T(x), e(x)
Use any ax you wish (based on the formulas from the presentations)
The life table could be constructed in Excel or R (all the formulas for your calculations
should be visible)
The task is individual, please take the countries and/or regions unique for each student.

If you have any questions regarding the task, kindly email: [email protected]

You might also like