Name: Eunice Sp.
Alonzo Date to be Submitted: June
25, 2024
Course, Year and Section: BSCPE 1-1 Subject Course:
Engineer Data Analysis
I. Video Analysis about the topic statistics.
II. The topic will cover: [a.] Measures of Central Tendency, [b.] Measures
of Dispersion, [c.] Measures of Relative Position, [d.] Normal Distribution,
[e.] Linear Regression and [f.] Correlation.
III. Statistics.
Stated by Karl Pearson, Statistics is the grammar of science. This analogy
highlights the fundamental role statistics plays in the scientific method,
enabling researchers to organize, interpret, and draw meaningful conclusions
from data.
[a.] Measures of Central Tendency
Measures of central tendency are statistical metrics that summarize a
dataset by identifying the center point. The three primary measures are the
mean, median, and mode. The mean, or arithmetic average, is calculated by
dividing the sum of all values by the number of values. While it provides a
simple summary, it is highly sensitive to outliers. The median, the middle
value in a dataset when the numbers are arranged in ascending or
descending order, offers a more robust central point in skewed distributions,
as it is resistant to outliers. The mode, the most frequently occurring value in
a dataset, is particularly useful in categorical data analysis. These measures
are crucial in understanding the typical values in a dataset and are
foundational for further statistical analysis.
[b.] Measures of Dispersion
While measures of central tendency provide insight into the central
point of a dataset, measures of dispersion describe the spread or variability
of the data. Key measures of dispersion include the range, variance, and
standard deviation. The range, the difference between the highest and
lowest values in a dataset, offers a basic understanding of data spread but is
sensitive to outliers. Variance, the average of the squared differences from
the mean, provides a measure of how much the values in a dataset deviate
from the mean. Standard deviation, the square root of the variance, is in the
same units as the data, making it easier to interpret compared to variance.
Understanding dispersion helps in assessing the reliability of the mean and in
identifying the degree of variability within the data.
[c.] Measures of Relative Position
Measures of relative position indicate where a particular value stands
in relation to the rest of the dataset. The most commonly used measures are
percentiles and z-scores. Percentiles are values below which a certain
percentage of the data falls; for instance, the 25th percentile (Q1) is the
value below which 25% of the data lies. Z-scores, standardized scores
indicating how many standard deviations a value is from the mean, allow for
comparison between different datasets by standardizing the values. These
measures are essential in comparative analysis and in identifying outliers.
[d.] Normal Distribution
The normal distribution, or Gaussian distribution, is a continuous
probability distribution characterized by its bell-shaped curve. Key properties
include symmetry, the distribution being symmetric about the mean, and the
mean, median, and mode being equal in a perfectly normal distribution. The
68-95-99.7 rule states that approximately 68% of data falls within one
standard deviation of the mean, 95% within two, and 99.7% within three.
The normal distribution is foundational in statistics due to its natural
occurrence in various datasets and its properties that simplify analysis and
inference.
[e.] Linear Regression
Linear regression is a method used to model the relationship between
a dependent variable and one or more independent variables. The simplest
form, simple linear regression, involves one dependent and one independent
variable. The relationship is described by the equation y = mx + b, where y
is the dependent variable, xxx is the independent variable, m is the slope,
and b is the y-intercept. The slope m represents the change in y for a one-
unit change in x. Linear regression is a powerful tool for prediction and
understanding relationships between variables.
[f.] Correlation
Correlation measures the strength and direction of the linear
relationship between two variables. The correlation coefficient, denoted by r,
ranges from -1 to 1. A positive correlation (r > 0) indicates that as one
variable increases, the other also increases. A negative correlation (r < 0)
indicates that as one variable increases, the other decreases. No correlation
(r ≈ 0) means no linear relationship exists. Correlation analysis helps in
understanding the degree to which variables are related and is often a
precursor to regression analysis.
IV. Conclusion.
Statistics, as the grammar of science, provides the tools and methods
necessary to translate raw data into meaningful information. Measures of
central tendency and dispersion offer insights into the nature and spread of
data. Measures of relative position and the normal distribution allow for
deeper understanding and comparison. Linear regression and correlation
enable the modeling and analysis of relationships between variables.
Together, these statistical concepts form the foundation upon which
scientific inquiry is built, enabling researchers to draw valid and reliable
conclusions from data.