0% found this document useful (0 votes)

14 views58 pages

Stats Presentation

The document describes a presentation on analyzing music sales data of the best selling artists of all time. It includes an overview of topics to be covered, definitions of key terms like statistics and parameters, and the methodology which involves exploring and cleaning the dataset.

Uploaded by

imaaniashannah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views58 pages

Stats Presentation

Uploaded by

imaaniashannah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Presentation by Javin Buchanan & Imaani DaCosta

University of the West Indies, Mona Campus Probability & Statistics Project | 2023

Statistics
Project
Javin Buchanan & Imaani DaCosta

Overview
A breakdown of the topics we are going to
cover in our presentation today!

04 Abstract 08 Results

05 Definition of Terms 09 Conclusion

06 Introduction 10 Question & Answer Session

07 Methodology 11 Thank you

Javin Buchanan & Imaani DaCosta

Abstract
What is our dataset about?
Our dataset is titled " Best Selling music
Artists of All Time" covers the music sales of
the 121 best selling artists of all time across
various genres, countries and spans of
careers.
Who is responsible for tracking sales?
The Recording Industry Association of America
(RIAA)
How can sales be categorized quantitatively?
Total Certified Units (TCU) and Claimed Sales
Presentation by J. Buchanan and
I. DaCosta
Definitions
What is the difference between a statistic and a parameter?
A statistic is simply any calculation performed on a sample of a population,
while a parameter is a calculation performed on a population.

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Definitions
Total Certified Units vs Claimed Sales
"Claimed Sales" are sales that get mentioned by people like Billboard in their
articles, for example Billboard claim ARTPOP sold 2.5 million copies, but there
aren't RIAA statistics to back it up. "Certified Units" are sales that have been
sent to companies like RIAA to get a certification, such as platinum, and are
actually proven to be real sales

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Definitions
What is conditional probability?
The probability that an event, such as music sales being greater than 500
million units, occurs given another event has occurred, such as the artist
being from the Pop Genre.

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Definitions
What is normal distribution?

Normal distribution refers to a probability where most values in the

dataset cluser towards the mean of the dataset, while the other values
represent both sides of the extreme.

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Example of Normal Distribution
Hypothesis
The hypothesis focused on for this dataset is that an artist who is
located in the more mainstream areas of society, both geographically-
the United States, and musically- Pop music, would more likely be
successful in the music industry when compared to an artist in the
music industry outside of these areas.

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Problem Statements
PS 1 PS 2
The total certified units for 50 artists are normally
Using the entire population as the sample, we can distributed with a sample mean of ‘x’ and a sample
see that 65% of the artists in this dataset are from standard deviation of ‘y’. The threshold to be
the United States. What is the probability that a considered a ‘highly successful artist’ is ‘P’. Estimate
randomly selected artist from the United States the number of artists who qualify for this title within
the Rock genre. Do the same for the Pop genre and
has sold more than 160 million certified units and
compare which genre would there more than likely be
has a period of activity lasting less than 15 years?
a highly successful artist.
Methodology
EXPLORING THE DATASET

Presentation by Estelle Darcy

Methodology
EXPLORING THE DATASET

Retrieving the dataset from online

data<-
read.csv("https://raw.githubusercontent.com/JavinBuchanan/Best-
Selling-
Artists/main/Raw%20DataSet%20for%20Best%20Selling%20Artists.csv")

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Result
Methodology
EXPLORING THE DATASET

Finding the amount of observations and variables

dim(data)

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Result
Methodology
EXPLORING THE DATASET

Viewing the Data Types of Variables in R

str(data)

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Result
Methodology
EXPLORING THE DATASET

Checking if there is missing data in R

#Finding out if there any data with N/As

sapply(data, function(m){
sum(is.na(m))
})

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Result
Methodology CLEANING THE DATASET

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
METHODOLOGY
CLEANING THE DATASET

In an attempt to make the

data$TCU<- columns uniform, the irregularities
gsub("million","",as.character(data$TCU)) and characters that aren't
data$Sales<- necessary were removed from
gsub("million",".",as.character(data$Sales)) certain columns.
data$Sales<-
gsub("\\..*","",as.character(data$Sales))
"million" removed from TCU and

data$Genre<- Sales column

gsub("\\/.*","",as.character(data$Genre))
data$Artist <- gsub("Ã©",
"",as.character(data$Artist))
"million" removed from Sales and
Genre column
METHODOLOGY
CLEANING THE DATASET

The original dataset provided

data[9, "period_active"] <- "1965-2014" multiple active periods for certain
data[15, "period_active"] <- "1971-present" artists. This made the data hard
data[29, "period_active"] <- "1980-present" to sort and removed the
data[31, "period_active"] <- "1972-present" possibility for easy calculations.
data[32, "period_active"] <- "1935-1995"
data[45, "period_active"] <- "1963-2012"
data[77, "period_active"] <- "1967-present"
For select lines the multiple
data[117, "period_active"] <- "1977-2008" ranges were just changed to
their starting and ending year
METHODOLOGY
CLEANING THE DATASET

The dataset had to be edited

data$period_active<- further seeing as "present" is not a
gsub("present","2022",as.character(data$period qualitative entry. All lines that had
_active)) "present" were changed to "2022"
library(dplyr) seeng as that is when the dataset
data[,"Genre", drop = FALSE] was last updated
data <-
data %>%
Code that reads all elements and changes
mutate(Genre = as_factor(Genre)) "present" to 2022 as well as change the
datatype from character to integer

data$`TCU` <- as.numeric(data$`TCU`)

# Convert TCU to numeric
str (data)
METHODOLOGY
CLEANING THE DATASET

data[9, "Genre"] <- "Rock" data[64, "Genre"] <- "Rock"

data[2, "Genre"] <- "Rock" data[68, "Genre"] <- "Rock"
The original dataset also provided
data[7, "Genre"] <- "Rock" data[76, "Genre"] <- "Rock" multiple genre assignments to a
data[17, "Genre"] <- "Rock" data[77, "Genre"] <- "Rock" significant of artists which made
data[25, "Genre"] <- "Rock" data[78, "Genre"] <- "Rock" organisation difficult.
data[28, "Genre"] <- "Rock" data[92, "Genre"] <- "Rock"
data[39, "Genre"] <- "Rock" data[96, "Genre"] <- "Pop"
data[41, "Genre"] <- "Rock" data[99, "Genre"] <- "Rock"
data[43, "Genre"] <- "Rock" data[104, "Genre"] <- "Rock"
data[47, "Genre"] <- "Rock" data[110, "Genre"] <- "Rock"
data[48, "Genre"] <- "Rock" data[113, "Genre"] <- "Rock" Research was done to determine the artist's
primary genre association and then the code

data[53, "Genre"] <- "Pop" data[118, "Genre"] <- "Rock" was made to show only the main genre.

data[54, "Genre"] <- "Rock" data[120, "Genre"] <- "Rock"

METHODOLOGY
CLEANING THE DATASET
For ease of documentation and
calculation the "Years active"
# Split name column into start year and end year column was split into "start year"
data <- data %>% separate(period_active, c('Start Year', 'End
and "end year" and the original
Year'))
column removed.
#Separation of period active column Making the new 'start year' and
'end year' columns
data$`Start Year` <- as.numeric(data$`Start Year`) # Convert
Start Year to numeric
data$`End Year` <- as.numeric(data$`End Year`) # Convert End
Year to numeric
Splitting the data in the 'years
active' column into the new
columns

data$'Years Active' <- (data$'End Year' - data$'Start Year')

#changing position of column 'Years Active'
Chnaging 'years active' column into the
data <- data %>% relocate('Years Active', .before = Year) number of years the artist has been active
and putting it after the 'start year' and 'end
year' columns
BEFORE
AFTER
Methodology CENTRAL TENDENCIES

Presentation by University of the West Indies

Javin Buchanan & Probability and Statistics |
Imaani DaCosta 2023
Methodology
Measures of Central Tendencies
01

Mean, Mode and Median Mean = 104.6 million

Normal Distribution
datamean<- mean(data$TCU) Median = 76.6 million
datamedian<-median(data$TCU)
# Define the Mode function
Mode <- function(x) {
ux <- unique(x) Mode = 99.8 million

ux[which.max(tabulate(match(x, ux)))]
}
# Find the mode of the column Presentation by Estelle Darcy
datamode <- Mode(data$TCU)
Visualizing the Data
BOX AND WHISKER PLOT

boxplot(data$TCU,
main = "Box and Whisker Plot showing
Total Certified Units",
xlab = "Total Certified Units",
ylab = "",
col = "orange",
border = "brown",
horizontal = TRUE,
notch = TRUE
)
Problem Statement 1
CONDITIONAL PROBABILITY

Using the entire population as the sample, we can see that 65% of the artists in this
dataset are from the United States. What is the probability that a randomly selected
artist from the United States has sold more than 160 million certified units and has a
period of activity lasting less than 15 years?
Problem Statement 1
CONDITIONAL PROBABILITY

This conditional probability has 3 events.

usfilter<- filter(data, Country=='United
States') Event A = the event that the artist
view(usfilter) chosen is from the United States
bothfilters<- filter(usfilter, TCU>160)
view(bothfilters) Event B= the event that the artist chosen
threefilters<-filter(bothfilters, has Event A and over 160 million TCU's
bothfilters$`Years Active`<15)
view(threefilters)
Event C= the event that the artist
chosen has Event B and has been
practiting for less than 15 years.
Original Population

NB: These results extend all the way down to 121 Artists!
Event A

NB: These results include 79 artists!

Event B

NB: These results include 14 artists!

Event C

NB: These results include 1 artist!

R Code for Conditional Probability
P(B ∩ C | A) = P(B | A) * P(C | A ∩ B)
answerforprob1=
probabilitycgivenaandb*bgivena

bgivena <- aintersectb/probabilityus

probabilitycgivenaandb<- probabilitycandaandb/bgivena
R Code for Conditional Probability
P(B | A) = P(B ∩ A)/P(A) Represents the number of artists
who have sold more than 160
million units given they are from the
P(B | A) = 14/79
United States

∩B=D
Let A Represents the number of artists

P(C |A∩ B) = P(C|D)

who have made music for less than
15 years given they have sold more
P(C |D) = P(D ∩ C)/P(D) than 160 million units and they are
from the United States
P(C |D) = 1/14
Answer for Problem Statement 1
P(B ∩ C | A) = P(B | A) * P(C | A ∩ B)
P(B ∩ C | A) = 14/79 * 1/14

P(B ∩ C | A) = 1/79
Problem Statement 2 Normal Distribution
The total certified units for 50 artists are normally distributed with a sample mean of ‘x’
and a sample standard deviation of ‘y’. The threshold to be considered a ‘highly
successful artist’ is ‘P’. Estimate the number of artists who qualify for this title within the
Rock genre. Do the same for the Pop genre and compare which genre would there
more than likely be a highly successful artist.

Presentation by Estelle Darcy

Problem Statement 2
NORMAL DISTRIBUTION

rand_sample<- data[sample(nrow(data), 50), ]

View(rand_sample)

This code first generates a table filled

with a random selection of 50 members
of the original dataset. There are no
initial criteria just a total of 50.
Problem Statement 2
NORMAL DISTRIBUTION
This table is then filtered and
split into 2 sub-tables: one for
Pop and one for Rock.
pop_filter<- filter(rand_sample,
rand_sample$Genre =='Pop' |
Filter code for pop
rand_sample$Genre =='Pop ')
View(pop_filter)

rock_filter<- filter(rand_sample,
rand_sample$Genre =='Rock' |
Filter code for rock
rand_sample$Genre =='Rock ')
View(rock_filter)
Problem Statement 2
NORMAL DISTRIBUTION
The mean, median and
standard deviation for both pop
and rock were found
pop_mean<-mean(pop_filter$TCU)
print(pop_mean)
Mean

pop_median<-median(pop_filter$TCU)
print(pop_median) Median

pop_standev<- sd(pop_filter$TCU)
print(pop_standev)
Standard Deviation
Problem Statement 2
NORMAL DISTRIBUTION

x_value<-160
pop_z<- (x_value - pop_mean)/pop_standev
print(z)

The z score was then found using the

mean, the standard deviation and the
threshold x value which for this question
is 160 (million)
Problem Statement 2
NORMAL DISTRIBUTION

pop_prob <- pnorm(pop_z, mean = 0, sd = 1,

lower.tail = FALSE)
print(pop_prob)

The calculated z score is then used to

calculate the probability that the chosen
artist has a probability less than 160
(million)
Problem Statement 2
NORMAL DISTRIBUTION

morethan_popprob<- 1- pop_prob

However, since we need the probability

that the chosen artist has a TCU greater
than 160 (million), we then subtract the
probability found from one.
Problem Statement 2
NORMAL DISTRIBUTION

pop_count<-nrow(pop_filter)
print(pop_count)

pop_estim<-(morethan_popprob * pop_count)
print(pop_estim)

To expand this probability to the entire

sample, the found probability is then
multiplied by the number of elements.
Histogram and distribution curve for Pop

Mean: 115.7583
Median: 89.75
Standard Deviation: 78.15496
Variance: 6108.19777
Histogram and distribution curve for Rock

Mean: 80.28824
Median: 62.1
Standard Deviation: 48.08145
Variance: 2311.8258
DISCUSSION
PROBLEM STATEMENT 1
DISCUSSION
We used conditional probability to closely investigate very specific
groups within the dataset, such as those artists who have been making
music in the United States.

As the results show, 65% of the best-selling artists are from the United States
and as such provided the perfect group to base our conditional probability
around as they are the majority of the dataset.
DISCUSSION
The term 'successful' has no set definition and as such, we had to quantify it -
setting the threshold at 160 million units. Conditional probability allows us to
delve further into this data as we can now calculate how many of those artists
have sold >160 million units given that they are from the United States

This probability represents the intersections of datasets, and can subsequently

be used to calculate conditional probability!
DISCUSSION
The probability that an artist has sold 160 million Total Certified Units given that
they are from the United States is 18%.

The data was then filtered or conditioned to these artists who have been making
music for less than 15 years, and only 1.2% of the artists can claim this feat.
DISCUSSION
PROBLEM STATEMENT 2
Discussion in document
From the random sample ‘Pop’ and ‘Rock’ were chosen as they were the two
most frequented genres in the dataset. Even though the ‘Pop’ genre was less
frequent in this sample than the ‘Rock’ Genre, the results throughout were
still in Pop’s favour. The final estimations for Pop and Rock were 3.428052
and 0.8274584 respectively.

This deduction was made with the calculations for the probability
that the artist chosen had a TCU less than 160. However the
intention was to find the probability that the TCU was greater
than 160.
Correct Discussion
The frequency of the Pop artists versus the Rock artists is proportional to their
probabilities of having a TCU above 160. For this random sample Pop's probability
was 0.5929348 and Rock's probability was 0.8377515.

This shows that the probability of having a 'more successful' career is not directly
related to the genre the artist is in. Even though from the overall population, there
are more Pop Artists which means their may be a higher chance of having a TCU
over 160, that was not the case for the sample.

Presentation by J. Buchanan and

I. DaCosta
Correct Discussion
The standard deviation for Pop and Rock were 78.15 And 48.08 respectively. With
this, it indicates that the total certified units for Pop have a wider distribution from
the mean than Rock and with Pop having the higher average, it shows overall
promise of success in this genre

Presentation by J. Buchanan and

I. DaCosta
Conclusion
By deploying the models of normal distribution and conditional
probability, along with the methods of central tendency, problem
statement 1 supported our hypothesis while problem statement 2 did not.

Final Task
No ratings yet
Final Task
19 pages
Calculating For Descriptive Statistics Jazmine Ibarra
No ratings yet
Calculating For Descriptive Statistics Jazmine Ibarra
4 pages
Spotify Analysis
No ratings yet
Spotify Analysis
3 pages
Task
No ratings yet
Task
5 pages
R Assignment
No ratings yet
R Assignment
32 pages
CHAPTER 1 Examining Distributions
No ratings yet
CHAPTER 1 Examining Distributions
74 pages
SPSS Data Workbook
No ratings yet
SPSS Data Workbook
6 pages
Chapter1 2023
No ratings yet
Chapter1 2023
76 pages
Lecture 1ASADA Descriptive Stats
No ratings yet
Lecture 1ASADA Descriptive Stats
38 pages
PA Assignment 2
No ratings yet
PA Assignment 2
3 pages
Basic Statistics For FPM-EFPM Day 1-2
No ratings yet
Basic Statistics For FPM-EFPM Day 1-2
81 pages
Statistics Excel Practice Additional
No ratings yet
Statistics Excel Practice Additional
176 pages
Aicte L1
No ratings yet
Aicte L1
47 pages
13 - Histograms and The Normal Distribution - pcs-1
No ratings yet
13 - Histograms and The Normal Distribution - pcs-1
28 pages
7u7 PDF
No ratings yet
7u7 PDF
31 pages
How Much Data Does Google Handle?
No ratings yet
How Much Data Does Google Handle?
132 pages
QM 1
No ratings yet
QM 1
58 pages
Quantitative Methods 3
No ratings yet
Quantitative Methods 3
174 pages
Chapter 2 DESCRIPTIVE ANALYTICS
No ratings yet
Chapter 2 DESCRIPTIVE ANALYTICS
86 pages
Chapter 5 Summarising and Analysing Data (S)
No ratings yet
Chapter 5 Summarising and Analysing Data (S)
20 pages
ML Lab Manual Bcsl602
No ratings yet
ML Lab Manual Bcsl602
108 pages
GroupNo 6 QMB Final Submission
No ratings yet
GroupNo 6 QMB Final Submission
20 pages
Data Representation Interpretation
No ratings yet
Data Representation Interpretation
61 pages
Descriptive Statistics SV
No ratings yet
Descriptive Statistics SV
77 pages
Advanced Statistics
No ratings yet
Advanced Statistics
259 pages
Songs
No ratings yet
Songs
3 pages
Introduction To Statistics: Quantitative Methods For Economics Dr. Katherine Sauer Metropolitan State College of Denver
No ratings yet
Introduction To Statistics: Quantitative Methods For Economics Dr. Katherine Sauer Metropolitan State College of Denver
28 pages
Unit - Iii: Descriptive Analytics
No ratings yet
Unit - Iii: Descriptive Analytics
89 pages
STK110 - Chapter 2
No ratings yet
STK110 - Chapter 2
29 pages
Modul Statistika Untuk Bisnis Dan Manajemen
No ratings yet
Modul Statistika Untuk Bisnis Dan Manajemen
11 pages
Statistics For Marketing Notebook
No ratings yet
Statistics For Marketing Notebook
50 pages
EMDP 1 Send
No ratings yet
EMDP 1 Send
88 pages
M1 & M2 Supplementaries
No ratings yet
M1 & M2 Supplementaries
52 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
EDA (Omkar Mane 67)
No ratings yet
EDA (Omkar Mane 67)
9 pages
Statistical Analysis 2023
No ratings yet
Statistical Analysis 2023
56 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Chapter Five
No ratings yet
Chapter Five
48 pages
Spotify Analysis
No ratings yet
Spotify Analysis
1 page
Module 4 Part 1 - 082406
No ratings yet
Module 4 Part 1 - 082406
31 pages
Business Statistics Unit-2 Notes
No ratings yet
Business Statistics Unit-2 Notes
18 pages
Data Mining: Prepared By: Eesha Tur Razia Babar
No ratings yet
Data Mining: Prepared By: Eesha Tur Razia Babar
49 pages
DVA Unit 1 - Part 2
No ratings yet
DVA Unit 1 - Part 2
53 pages
Camm BA 5e PPT CH02 03-09-23 PC - Final
No ratings yet
Camm BA 5e PPT CH02 03-09-23 PC - Final
52 pages
02data DMDW
No ratings yet
02data DMDW
40 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
33 pages
Ma 3
No ratings yet
Ma 3
32 pages
Process and Summarize Data
No ratings yet
Process and Summarize Data
2 pages
DAT101x Lab 3 - Statistical Analysis
No ratings yet
DAT101x Lab 3 - Statistical Analysis
19 pages
Business Statistics Course Overview
No ratings yet
Business Statistics Course Overview
63 pages
Basics of Statistics For Analytics Using SAS/ Excel
No ratings yet
Basics of Statistics For Analytics Using SAS/ Excel
28 pages
04 - Aggregating Data With PivotTables
No ratings yet
04 - Aggregating Data With PivotTables
52 pages
DataUnderstandingAndPreparation DOM304
No ratings yet
DataUnderstandingAndPreparation DOM304
19 pages
Intro to Descriptive Statistics
No ratings yet
Intro to Descriptive Statistics
92 pages
Chapter 2descriptive Statistics and PCA
No ratings yet
Chapter 2descriptive Statistics and PCA
26 pages
Lecture - Exploratory Data Analysis
No ratings yet
Lecture - Exploratory Data Analysis
39 pages
STAT2024 Assignment 3-1
No ratings yet
STAT2024 Assignment 3-1
4 pages
Preparation PLM 11
No ratings yet
Preparation PLM 11
18 pages
Column Base Plate Calculation Report
No ratings yet
Column Base Plate Calculation Report
13 pages
Maths 6
No ratings yet
Maths 6
12 pages
Dolder 1961
No ratings yet
Dolder 1961
19 pages
NetSDK Programming Manual
No ratings yet
NetSDK Programming Manual
49 pages
Exp 2 (Homemade Ice Cream)
No ratings yet
Exp 2 (Homemade Ice Cream)
8 pages
Chapter 5 Group 13 Elements
No ratings yet
Chapter 5 Group 13 Elements
16 pages
CAPA Test 1 2014 Regular
No ratings yet
CAPA Test 1 2014 Regular
3 pages
Roadmap To
No ratings yet
Roadmap To
58 pages
Blockholders' Power & Firm Value
No ratings yet
Blockholders' Power & Firm Value
13 pages
Comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2/comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2 PDF
No ratings yet
Comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2/comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2 PDF
13 pages
Field Test Genius 20 - Gearless
100% (2)
Field Test Genius 20 - Gearless
3 pages
Diffusion of Solids in Liquids
No ratings yet
Diffusion of Solids in Liquids
8 pages
1-117 Ac Comp Quiz
100% (1)
1-117 Ac Comp Quiz
394 pages
(Business Statistics) Chapter 3 Part 1
No ratings yet
(Business Statistics) Chapter 3 Part 1
30 pages
DT-10 Owner's Manual: Turning On The Power
No ratings yet
DT-10 Owner's Manual: Turning On The Power
3 pages
Danyal Education: Tanjong Katong Girls' I
No ratings yet
Danyal Education: Tanjong Katong Girls' I
20 pages
F2014L
No ratings yet
F2014L
4 pages
Fex Guide
No ratings yet
Fex Guide
60 pages
Jaa Principles of Flight Demo
No ratings yet
Jaa Principles of Flight Demo
7 pages
Waves Exam Q
0% (1)
Waves Exam Q
24 pages
Midas Gen: 1. Design Information
No ratings yet
Midas Gen: 1. Design Information
1 page
New Pattern Input Output Exam Cart
No ratings yet
New Pattern Input Output Exam Cart
55 pages
Parts Manual: Mechanical Unit
No ratings yet
Parts Manual: Mechanical Unit
240 pages
Operation / Installation Instructions: Dickow Pumpen KG
No ratings yet
Operation / Installation Instructions: Dickow Pumpen KG
47 pages
Whole-Body Vibration Therapy: An Overview
No ratings yet
Whole-Body Vibration Therapy: An Overview
6 pages
TENARIS Pipes-For-Civil-Industrial-Installation
No ratings yet
TENARIS Pipes-For-Civil-Industrial-Installation
28 pages
Act. 2 - Micropipetting Techni
No ratings yet
Act. 2 - Micropipetting Techni
29 pages
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
No ratings yet
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
5 pages
Computer Science Engineering Course Outcomes
No ratings yet
Computer Science Engineering Course Outcomes
17 pages

Stats Presentation

Uploaded by

Stats Presentation

Uploaded by

Presentation by Javin Buchanan & Imaani DaCosta

05 Definition of Terms 09 Conclusion

06 Introduction 10 Question & Answer Session

07 Methodology 11 Thank you

Presentation by University of the West Indies

Presentation by University of the West Indies

Presentation by University of the West Indies

Normal distribution refers to a probability where most values in the

Presentation by University of the West Indies

Presentation by University of the West Indies

Presentation by Estelle Darcy

Retrieving the dataset from online

Presentation by University of the West Indies

Finding the amount of observations and variables

Presentation by University of the West Indies

Viewing the Data Types of Variables in R

Presentation by University of the West Indies

Checking if there is missing data in R

#Finding out if there any data with N/As

Presentation by University of the West Indies

Presentation by University of the West Indies

In an attempt to make the

data$Genre<- Sales column

The original dataset provided

The dataset had to be edited

data$`TCU` <- as.numeric(data$`TCU`)

data[9, "Genre"] <- "Rock" data[64, "Genre"] <- "Rock"

data[54, "Genre"] <- "Rock" data[120, "Genre"] <- "Rock"

data$'Years Active' <- (data$'End Year' - data$'Start Year')

Presentation by University of the West Indies

Mean, Mode and Median Mean = 104.6 million

This conditional probability has 3 events.

NB: These results include 79 artists!

NB: These results include 14 artists!

NB: These results include 1 artist!

bgivena <- aintersectb/probabilityus

P(C |A∩ B) = P(C|D)

Presentation by Estelle Darcy

rand_sample<- data[sample(nrow(data), 50), ]

This code first generates a table filled

The z score was then found using the

pop_prob <- pnorm(pop_z, mean = 0, sd = 1,

The calculated z score is then used to

However, since we need the probability

To expand this probability to the entire

This probability represents the intersections of datasets, and can subsequently

Presentation by J. Buchanan and

Presentation by J. Buchanan and

You might also like