Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views3 pages

Data Analytics Solution - Assignment - 1

The document outlines an assignment for a data analytics course, focusing on using R for statistical calculations and data analysis. It includes tasks such as calculating probabilities, analyzing a dataset for quantitative and qualitative predictors, and creating graphical representations of relationships among variables. The assignment emphasizes the importance of data cleaning and visualization in understanding data patterns.

Uploaded by

daimingyue02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

Data Analytics Solution - Assignment - 1

The document outlines an assignment for a data analytics course, focusing on using R for statistical calculations and data analysis. It includes tasks such as calculating probabilities, analyzing a dataset for quantitative and qualitative predictors, and creating graphical representations of relationships among variables. The assignment emphasizes the importance of data cleaning and visualization in understanding data patterns.

Uploaded by

daimingyue02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

BU.510.

650 Assignment #1
Data Analytics Page 1 of 2
Dr. Ruxian Wang, Johns Hopkins Carey Business School

Solution to Assignment #1

1. Get familiar with Software R.

2. Calculate the probability for each of the following events:

(a) A standard normally distributed variable is larger than 3.


Solution: 0.0013. Sample R code: 1-pnorm(3)
(b) A normally distributed variable with mean 35 and standard deviation 6 is larger than
42.
Solution: 0.1217. Sample R code: 1-pnorm(42,mean=35,sd=6)
(c) Getting 10 out of 10 successes in a binomial distribution with probability 0.8.
Solution: 0.1074. Sample R code: dbinom(10,size=10,prob=0.8)
(d) X < 0.9 when X has the standard uniform distribution.
Solution: 0.9. Sample R code: punif(0.9)
(e) X > 6.5 in a χ2 distribution with 2 degrees of freedom.
Solution: 0.0388. Sample R code: 1-pchisq(6.5,df=2)

3. This exercise involves the Auto data set, which can be downloaded at the BlackBoard. Make
sure that the missing values have been removed from the data.

(a) Which of the predictors are quantitative, and which are qualitative?
Solution: Hint: use class() function. Read data into R
> Auto=read.csv("Auto.csv",header=T,na.strings="?")
> Auto=na.omit(Auto)
> attach(Auto)

If you have already read the data into R and found that there are some missing data in
the horsepower, stored as “?”, you can use the following code:
> Auto$horsepower[Auto$horsepower=="?"]=NA

Quantitative predictors: mpg, cylinders, displacement, weight, acceleration,


year, horsepower;
qualitative predictors: origin, name.
(b) What is the range of each quantitative predictor? You can answer this using the range()
function.
Solution: The ranges for the quantitative predictors are: [9.0, 46.6] for mpg; [3, 8] for
cylinders; [68, 455] for displacement; [1613, 5140] for weight; [8.0, 24.8] for acceleration;
[70, 82] for year; [1, 3] for origin.
(c) What is the mean and standard deviation of each quantitative predictor?
Solution: Hint: use function mean() and sd() to compute the mean and standard
deviation. Do not forget to remove missing values; otherwise, your results may be
slightly different.
2 BU.510.650, Assignment #1

The means are standard deviations are: 23.45 and 7.81 for mpg; 5.47 and 1.71 for
cylinders; 194.41 and 104.64 for displacement; 2977.58 and 849.40 for weight; 15.54
and 2.76 for acceleration; 75.98 and 3.68 for year; 1.58 and 0.81 for origin.
(d) Now remove the 10th through 85th observations. What is the range, mean, and standard
deviation of each predictor in the subset of the data that remains?
Solution: Use code Auto2=Auto[-seq(10,85),] to remove the 10th through 85th ob-
servations.
The ranges, means and standard deviations in the remaining data are the following:
• [11.0, 46.6], 24.44 and 7.91 for mpg;
• [3, 8], 5.37 and 1.65 for cylinders;
• [68, 455], 187.05 and 99.64 for displacement;
• [1649, 4997], 2933.96 and 810.64 for weight;
• [8.5, 24.8], 15.72 and 2.68 for acceleration;
• [70, 82], 77.15 and 3.11 for year;
• [1, 3], 1.60 and 0.82 for origin.
(e) Using the full data set, investigate the predictors graphically, using scatterplots or other
tools of your choice. Create some plots highlighting the relationships among the predic-
tors. Comment on your findings.
Solution: Answers can be different. Some relationship among the predictors are
provided below.
• Produce a plot for the acceleration w.r.t. horsepower (see Figure 1 below);
Observe that the more the horsepower the faster the acceleration.

Figure 1: acceleration w.r.t. horsepower Figure 2: horsepower w.r.t. displacement


25

200
20
acceleration

horsepower

150
15

100
10

50

50 100 150 200 100 200 300 400

horsepower displacement

• Produce a plot for the horsepower w.r.t. displacement (see Figure 2 below);
Observe that the larger the displacement the more the horsepower.
(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables.
Do your plots suggest that any of the other variables might be useful in predicting mpg?
Justify your answer.
Solution: The plots suggest that the variables displacement, horsepower, weight
and acceleration might be useful in predicting mpg. Please see Figure 3 to 6.
BU.510.650, Assignment #1 3

Figure 3: displacement w.r.t. mpg Figure 4: horsepower w.r.t. mpg


40

40
30

30
mpg

mpg
20

20
10

10
100 200 300 400 50 100 150 200

displacement horsepower

Figure 5: weight w.r.t. mpg Figure 6: acceleration w.r.t. mpg


40

40
30

30
mpg

mpg
20

20
10

10

1500 2000 2500 3000 3500 4000 4500 5000 10 15 20 25

weight acceleration

You might also like