Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views28 pages

Lec7 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Lec7 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to Biostatistics

Shamik Sen
Dept. of Biosciences & Bioengineering
IIT Bombay
Standard Deviation: Practical Significance

Chebyshev’s Theorem:

Given a number k (greater than 1) and a set of N


measurements, at least 1 − 1/𝑘2 of the measurements will lie
within k standard deviations of their mean

Example: n = 26, Mean = 75, Variance = 100. Comment on the


distribution.
Standard Deviation of Normal Distributions

+/- 1 s.d.: 68%


+/- 2 s.d.: 95%
+/- 3 s.d.: 99.7%
Relative Frequency

Height
Relative Standing: z-score

Z-score = (x-Mean)/Standard Deviation

Example: Mean = 25, Standard Deviation = 4; x = 30

Z-score > 3 is an outlier!

Example: 1, 1, 0, 15, 2, 3, 4, 0, 1, 3
Relative Standing: Percentiles
‘p’th percentile is the value which is greater than p % of the
measurements

Q1: first quartile (at position 0.25*(n+1))

Q3: third quartile (at position 0.75*(n+1))

Inter-quartile Range = IQR = Q3 – Q1


Box-plot

Min, Q1, Median, Q3, Max


Area (sq. microns)
Detecting Outliers

Lower Fence = Q1 – 1.5 IQR


Upper Fence = Q3 + 1.5 IQR
Plotting Box-Plot

340, 300, 520, 340, 320, 290, 260, 330


Introduction to Biostatistics

Shamik Sen
Dept. of Biosciences & Bioengineering
IIT Bombay
Moments

Given a set of observations yi of a variable Y, the rth sample


moment about zero is defined as:
Moments

The rth sample moment about the mean is defined as:


Skewness
Kurtosis
Introduction to statistical analysis
with R
What is R?
• Software environment for statistical computing and data
analysis

• R is a GNU package and source code of R is freely available.


• Pre-compiled binary versions are provided for various
operating systems.

• R has a command line interface. But many graphical user


interfaces are available.

• R can produce publication-quality graphs with


mathematical symbols
R is an interpreted language
Applications of R
• Mainly used by statisticians and other practitioners
requiring an environment for statistical computation and
software development.

• R supports matrix arithmetic and can also operate as a


general matrix calculation toolbox – with performance
benchmarks comparable to GNU Octave or MATLAB

• R can be used to perform high-performance statistical


computation required for statistical analysis of Big Data.

• R is also being used in Business Analytics.


Getting R - 1
• R is an open source programming language. Due
to its popularity pre-compiled R binaries are also
available for different platforms.

• Binaries for windows, Unix or MacOS can be


downloaded from R project website
https://www.r-project.org.

• These binaries can directly be used to install the R


programming of a computer.
Getting R - 2
• However, R is command line so may not be
suitable for learners.
• For this, many graphical under interfaces
(GUIs) software are available for R.
• These GUIs-based software provide an user
friendly interface to write, correct and run R
code.
• Rstudio is one such widely used GUI interface
for R.
Getting R - 3
• RStudio
workspace

Command
windows
Additional
information
Creating vectors in R
Custom vector

Sequence

Repeat

Repeat of range

Repeat of sequence

All variables are vectors. Variables are case sensitive.


Basic operations on vectors - 1

Scalar addition

Scalar subtraction

Scalar multiplication

Scalar division
Element-wise sum

Element-wise multiplication
Element-wise square

Element-wise exponential
Importing data to R - 1
CSV: comma separated values

.xlsx format

.csv format
Importing CSV data to R

Workspace
Calculating descripting statistics in R-1
Finding frequency in categorical data

Mean and medium

Minimum and maximum


Calculating descripting statistics in R-2

Variance and standard deviation

Alternatively
Plotting in R - 1
Plotting in R - 2
Data Plotting barplots

You might also like