Data Analytics Solution - Assignment - 1

The document outlines an assignment for a data analytics course, focusing on using R for statistical calculations and data analysis. It includes tasks such as calculating probabilities, analyzing a dataset for quantitative and qualitative predictors, and creating graphical representations of relationships among variables. The assignment emphasizes the importance of data cleaning and visualization in understanding data patterns.

Uploaded by

daimingyue02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views3 pages

Data Analytics Solution - Assignment - 1

Uploaded by

daimingyue02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

BU.510.

650 Assignment #1
Data Analytics Page 1 of 2
Dr. Ruxian Wang, Johns Hopkins Carey Business School

Solution to Assignment #1

1. Get familiar with Software R.

2. Calculate the probability for each of the following events:

(a) A standard normally distributed variable is larger than 3.

Solution: 0.0013. Sample R code: 1-pnorm(3)
(b) A normally distributed variable with mean 35 and standard deviation 6 is larger than
42.
Solution: 0.1217. Sample R code: 1-pnorm(42,mean=35,sd=6)
(c) Getting 10 out of 10 successes in a binomial distribution with probability 0.8.
Solution: 0.1074. Sample R code: dbinom(10,size=10,prob=0.8)
(d) X < 0.9 when X has the standard uniform distribution.
Solution: 0.9. Sample R code: punif(0.9)
(e) X > 6.5 in a χ2 distribution with 2 degrees of freedom.
Solution: 0.0388. Sample R code: 1-pchisq(6.5,df=2)

3. This exercise involves the Auto data set, which can be downloaded at the BlackBoard. Make
sure that the missing values have been removed from the data.

(a) Which of the predictors are quantitative, and which are qualitative?
Solution: Hint: use class() function. Read data into R
> Auto=read.csv("Auto.csv",header=T,na.strings="?")
> Auto=na.omit(Auto)
> attach(Auto)

If you have already read the data into R and found that there are some missing data in
the horsepower, stored as “?”, you can use the following code:
> Auto$horsepower[Auto$horsepower=="?"]=NA

Quantitative predictors: mpg, cylinders, displacement, weight, acceleration,

year, horsepower;
qualitative predictors: origin, name.
(b) What is the range of each quantitative predictor? You can answer this using the range()
function.
Solution: The ranges for the quantitative predictors are: [9.0, 46.6] for mpg; [3, 8] for
cylinders; [68, 455] for displacement; [1613, 5140] for weight; [8.0, 24.8] for acceleration;
[70, 82] for year; [1, 3] for origin.
(c) What is the mean and standard deviation of each quantitative predictor?
Solution: Hint: use function mean() and sd() to compute the mean and standard
deviation. Do not forget to remove missing values; otherwise, your results may be
slightly diﬀerent.
2 BU.510.650, Assignment #1

The means are standard deviations are: 23.45 and 7.81 for mpg; 5.47 and 1.71 for
cylinders; 194.41 and 104.64 for displacement; 2977.58 and 849.40 for weight; 15.54
and 2.76 for acceleration; 75.98 and 3.68 for year; 1.58 and 0.81 for origin.
(d) Now remove the 10th through 85th observations. What is the range, mean, and standard
deviation of each predictor in the subset of the data that remains?
Solution: Use code Auto2=Auto[-seq(10,85),] to remove the 10th through 85th ob-
servations.
The ranges, means and standard deviations in the remaining data are the following:
• [11.0, 46.6], 24.44 and 7.91 for mpg;
• [3, 8], 5.37 and 1.65 for cylinders;
• [68, 455], 187.05 and 99.64 for displacement;
• [1649, 4997], 2933.96 and 810.64 for weight;
• [8.5, 24.8], 15.72 and 2.68 for acceleration;
• [70, 82], 77.15 and 3.11 for year;
• [1, 3], 1.60 and 0.82 for origin.
(e) Using the full data set, investigate the predictors graphically, using scatterplots or other
tools of your choice. Create some plots highlighting the relationships among the predic-
tors. Comment on your ﬁndings.
Solution: Answers can be diﬀerent. Some relationship among the predictors are
provided below.
• Produce a plot for the acceleration w.r.t. horsepower (see Figure 1 below);
Observe that the more the horsepower the faster the acceleration.

Figure 1: acceleration w.r.t. horsepower Figure 2: horsepower w.r.t. displacement

200
20
acceleration

horsepower

150
15

100
10

50 100 150 200 100 200 300 400

horsepower displacement

• Produce a plot for the horsepower w.r.t. displacement (see Figure 2 below);
Observe that the larger the displacement the more the horsepower.
(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables.
Do your plots suggest that any of the other variables might be useful in predicting mpg?
Justify your answer.
Solution: The plots suggest that the variables displacement, horsepower, weight
and acceleration might be useful in predicting mpg. Please see Figure 3 to 6.
BU.510.650, Assignment #1 3

Figure 3: displacement w.r.t. mpg Figure 4: horsepower w.r.t. mpg

40
30

30
mpg

mpg
20

20
10

10
100 200 300 400 50 100 150 200

displacement horsepower

Figure 5: weight w.r.t. mpg Figure 6: acceleration w.r.t. mpg

40
30

30
mpg

mpg
20

20
10

1500 2000 2500 3000 3500 4000 4500 5000 10 15 20 25

weight acceleration

Example 11.15
No ratings yet
Example 11.15
7 pages
Machine Learning Project On Cars
92% (13)
Machine Learning Project On Cars
22 pages
HW1
100% (1)
HW1
18 pages
10 AI Sample Paper 2025-26
No ratings yet
10 AI Sample Paper 2025-26
6 pages
HW3 Isye 7406
No ratings yet
HW3 Isye 7406
8 pages
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
No ratings yet
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
16 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Answer Assignment 1 QMM
100% (1)
Answer Assignment 1 QMM
3 pages
Mtcars Dataset: Multilinear Regression Analysis
No ratings yet
Mtcars Dataset: Multilinear Regression Analysis
13 pages
Presentation On IPL Match Winner Prediction With ML
No ratings yet
Presentation On IPL Match Winner Prediction With ML
27 pages
Assignment 02
No ratings yet
Assignment 02
2 pages
Fall 2023-2024 IE 451 Homework 2 Solutions
No ratings yet
Fall 2023-2024 IE 451 Homework 2 Solutions
20 pages
DataDriven ReservoirModeling NAGAO THESIS 2021
No ratings yet
DataDriven ReservoirModeling NAGAO THESIS 2021
119 pages
Car Transmission & MPG Analysis
No ratings yet
Car Transmission & MPG Analysis
6 pages
Book
No ratings yet
Book
3 pages
Aston Martin
No ratings yet
Aston Martin
16 pages
Week2 Submission Assignment Solution AshaA-3
No ratings yet
Week2 Submission Assignment Solution AshaA-3
2 pages
Multiple Regression1
No ratings yet
Multiple Regression1
27 pages
Artificial Intelligence Semester Project: Topic: Car Mileage Predictor Presented by Abdullah Farooq
No ratings yet
Artificial Intelligence Semester Project: Topic: Car Mileage Predictor Presented by Abdullah Farooq
17 pages
Stat 305 Final Practice - Solutions
No ratings yet
Stat 305 Final Practice - Solutions
10 pages
Assignment Auto
No ratings yet
Assignment Auto
6 pages
Using R For Basic Statistical Analysis
No ratings yet
Using R For Basic Statistical Analysis
11 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
Report FinalProject
No ratings yet
Report FinalProject
89 pages
Linear Regression Using R Computer Labs: Cars - CSV Data Set - 2
No ratings yet
Linear Regression Using R Computer Labs: Cars - CSV Data Set - 2
9 pages
Lab 6
No ratings yet
Lab 6
2 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Week 02 Data Wrangling
No ratings yet
Week 02 Data Wrangling
10 pages
Exercises 2 Unfinished
No ratings yet
Exercises 2 Unfinished
8 pages
Manual vs Auto Transmission MPG Analysis
No ratings yet
Manual vs Auto Transmission MPG Analysis
5 pages
Assignment 1-1
No ratings yet
Assignment 1-1
1 page
Big Data Analytics Practical Guide
No ratings yet
Big Data Analytics Practical Guide
41 pages
Chapter 4 Exercise 11
No ratings yet
Chapter 4 Exercise 11
5 pages
Dijkstra's Algorithm Explained
No ratings yet
Dijkstra's Algorithm Explained
39 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Iqm Seminar03
No ratings yet
Iqm Seminar03
6 pages
Exam ML
No ratings yet
Exam ML
5 pages
BAN110 Final Project Documentation. 2
No ratings yet
BAN110 Final Project Documentation. 2
13 pages
BANA 3010 Assignment 2
No ratings yet
BANA 3010 Assignment 2
3 pages
Partitioning Algorithms
No ratings yet
Partitioning Algorithms
5 pages
Ankit Bansal CGT19005
No ratings yet
Ankit Bansal CGT19005
7 pages
Notes 8 - Examples (March5)
No ratings yet
Notes 8 - Examples (March5)
25 pages
7406HW03
No ratings yet
7406HW03
2 pages
Data Science Unit-5
No ratings yet
Data Science Unit-5
37 pages
R
No ratings yet
R
3 pages
CMSC 177 - Regressionlr&Svm
No ratings yet
CMSC 177 - Regressionlr&Svm
30 pages
Floating Point Representation
No ratings yet
Floating Point Representation
18 pages
ISyE7406 Homework3
No ratings yet
ISyE7406 Homework3
20 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Data - Wrangling Analysis
No ratings yet
Data - Wrangling Analysis
26 pages
Linear Regression
No ratings yet
Linear Regression
71 pages
Badm 8th Record R Language
No ratings yet
Badm 8th Record R Language
6 pages
Multi Regression
No ratings yet
Multi Regression
12 pages
Stat 4104 (Part B)
No ratings yet
Stat 4104 (Part B)
1 page
CS605 Labcf
No ratings yet
CS605 Labcf
30 pages
Lab 4
No ratings yet
Lab 4
4 pages
The Basic Tools of Finance (Chapter 14 of Mankiw) : IBE201 Principles of Macroeconomics, Sophia University FLA
No ratings yet
The Basic Tools of Finance (Chapter 14 of Mankiw) : IBE201 Principles of Macroeconomics, Sophia University FLA
9 pages
As Data Manipulation With Dplyr-2
No ratings yet
As Data Manipulation With Dplyr-2
6 pages
Discrete Time Signals PDF
100% (1)
Discrete Time Signals PDF
13 pages
Practice Questions On Central Tendency On Mtcars
No ratings yet
Practice Questions On Central Tendency On Mtcars
3 pages
Ankit Bansal-CGT19005
No ratings yet
Ankit Bansal-CGT19005
7 pages
AI 101: Demystifying Artificial Intelligence
No ratings yet
AI 101: Demystifying Artificial Intelligence
34 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
Bhattacharjee Et Al 2021 Nonlinear Model Predictive Control and Collision Cone Based Missile Guida
No ratings yet
Bhattacharjee Et Al 2021 Nonlinear Model Predictive Control and Collision Cone Based Missile Guida
17 pages
Conflict Between NPV and Irr
75% (4)
Conflict Between NPV and Irr
3 pages
Homework1 1
No ratings yet
Homework1 1
3 pages
Activity 2 QP
No ratings yet
Activity 2 QP
4 pages
Algorithm Up To 7 Lectures
No ratings yet
Algorithm Up To 7 Lectures
13 pages
S We 2009872
No ratings yet
S We 2009872
13 pages
Destabilizing Attack and Robust Defense For Inverter-Based Microgrids by Adversarial Deep Reinforcement Learning
No ratings yet
Destabilizing Attack and Robust Defense For Inverter-Based Microgrids by Adversarial Deep Reinforcement Learning
12 pages
PG DataMiningR Practicals
No ratings yet
PG DataMiningR Practicals
2 pages
EEG-Based Emotion Recognition
No ratings yet
EEG-Based Emotion Recognition
12 pages
03 Machine Learning Enabled Quantification of Stochastic Active Metadamping in Acoustic Metamaterials
No ratings yet
03 Machine Learning Enabled Quantification of Stochastic Active Metadamping in Acoustic Metamaterials
11 pages
Application of Hilbert Huang Transform in The Field of Power Quality Events Analysis
No ratings yet
Application of Hilbert Huang Transform in The Field of Power Quality Events Analysis
7 pages
Chapter 9
No ratings yet
Chapter 9
6 pages
Location Capacity Demand Allocation Telecom Optic
No ratings yet
Location Capacity Demand Allocation Telecom Optic
10 pages
Introduction To Minor Programme 2021
No ratings yet
Introduction To Minor Programme 2021
9 pages
Stable & Radix Sort Lecture
No ratings yet
Stable & Radix Sort Lecture
5 pages
Control Systems
No ratings yet
Control Systems
3 pages
Diabetes Detection Using Deep Learning Algorithms: ICT Express November 2018
No ratings yet
Diabetes Detection Using Deep Learning Algorithms: ICT Express November 2018
5 pages
SCD-HW1-Full Name-Student ID
No ratings yet
SCD-HW1-Full Name-Student ID
4 pages
Cost Estimation Methods Guide
No ratings yet
Cost Estimation Methods Guide
2 pages
Proses Pengendalian Proses
No ratings yet
Proses Pengendalian Proses
2 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
24 pages
Fake News Detection and Fact Verification Research Paper
No ratings yet
Fake News Detection and Fact Verification Research Paper
2 pages
N1
No ratings yet
N1
2 pages
MATH8009 2023-24 Project
No ratings yet
MATH8009 2023-24 Project
3 pages
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
No ratings yet
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
2 pages