# Statistics Fundamentals Course Notes
## Arithmetic Mean: Calculate and interpret the average of a set of numbers.
- **Introduction**: The arithmetic mean is a fundamental statistical measure used
to summarize a dataset with a single number, representing the central tendency.
- **Key Concepts**: The formula for the arithmetic mean is \( #ar{x} =
rac{\sum_{i=1}^{n} x_i}{n} \).
- **Practical Example**: Dataset of engineering component lifespans: [2, 3, 5, 8,
10]. Mean lifespan = 5.6 years.
## Median: Determine the middle value in a data set and its significance.
- **Introduction**: The median is the middle value in an ordered dataset, useful
for skewed distributions.
- **Practical Example**: Dataset of ages: [22, 23, 29, 34, 34, 45, 56]. Median age
= 34.
## Mode: Identify the most frequently occurring value and its applications.
- **Introduction**: The mode is the most frequently occurring value in a dataset,
important for categorical data.
- **Practical Example**: Favorite programming languages: ["Python", "Java",
"Python", "C#", "Python"]. Mode = "Python".
## Quantiles and IQR (Interquartile Range): Explore data spread through quartiles.
- **Introduction**: Quantiles divide a dataset into equal parts, with the IQR
measuring the spread between the 25th and 75th percentiles.
- **Practical Example**: Test scores: [20, 35, 50, 60, 65, 75, 80, 90, 95]. \(Q_1 =
50\), \(Q_3 = 80\), IQR = 30.
## Variance and Standard Deviation: Measure data variability and dispersion.
- **Introduction**: Variance and standard deviation quantify the spread of data
points in a dataset.
- **Practical Example**: Test scores: [50, 60, 70, 80, 90]. Variance = 200,
Standard Deviation = 14.14.
## Regression: Analyze linear relationships between variables.
- **Introduction**: Linear regression models the relationship between a dependent
variable and one or more independent variables.
- **Practical Example**: Fuel efficiency based on engine size.
## Deceptive Graph Design: Recognize misleading visual representations.
- **Introduction**: Deceptive graph design involves misrepresenting data to mislead
viewers.
- **Practical Example**: Manipulating the y-axis to exaggerate sales growth.
## Cumulative Frequency Plots: Visualize data distribution.
- **Introduction**: Cumulative frequency plots show the accumulation of data points
up to certain values.
- **Practical Example**: Time taken for students to complete an exam.
## p-values: Evaluate statistical significance.
- **Introduction**: p-values help determine whether the observed data significantly
deviate from a specified hypothesis.
- **Practical Example**: Testing a new drug's effectiveness.
## Normal Curves: Understand the bell-shaped distribution and its properties.
- **Introduction**: Normal curves describe distributions where values cluster
around a mean, central to statistical inference.
- **Practical Example**: Heights of adults in a population.
## Experiment Design: Principles for planning and executing experiments.
- **Introduction**: Experiment design is crucial for obtaining valid, reliable
results and involves control groups, randomization, and blinding.
- **Practical Example**: Clinical trial for a new medication.
## Type I and Type II Errors: Consequences of statistical decisions.
- **Introduction**: Type I errors occur when a true null hypothesis is rejected;
Type II errors occur when a false null hypothesis is accepted.
- **Practical Example**: Clinical trial errors in evaluating a new drug.
## Simpson’s Paradox: Investigate aggregated data leading to counterintuitive
results.
- **Introduction**: Simpson’s Paradox occurs when trends in aggregated data reverse
or disappear upon subgroup analysis.
- **Practical Example**: University admissions by gender across departments.