Notes on Data Analytics and Descriptive Statistics
1. Introduction to Data Analytics
Big Data and Data Science: Big Data refers to datasets that are extremely large and
complex, requiring advanced storage, management, and analysis techniques. Data
Science combines statistical methods, algorithms, and computational tools to derive
insights from both big and small data.
Small Data: Refers to datasets that are small enough to be managed and analyzed on a
personal computer or with simple statistical tools. Though small in size, small data is
often rich in context and detail.
A Short Taxonomy of Data Analytics:
- Descriptive Analytics: Summarizes past data to understand what has happened.
- Diagnostic Analytics: Explains why something happened.
- Predictive Analytics: Uses models and historical data to predict future outcomes.
- Prescriptive Analytics: Recommends actions to achieve desired outcomes.
Examples of Data Use: Applications include healthcare, finance, marketing,
manufacturing, and social sciences.
Breast Cancer in Wisconsin: A famous dataset used for classification tasks, helping to
distinguish between benign and malignant tumors using attributes of cell nuclei.
Polish Company Insolvency Data: Used for predicting company insolvency based on
financial ratios and attributes.
A Little History on Methodologies for Data Analytics: Evolution from traditional
statistics to data mining, machine learning, and modern AI-driven analytics.
2. Descriptive Statistics
Scale Types:
- Nominal: Categories without order (e.g., gender, colors).
- Ordinal: Ordered categories (e.g., satisfaction ratings).
- Interval: Numeric scale without true zero (e.g., temperature in Celsius).
- Ratio: Numeric scale with a true zero (e.g., weight, height).
Descriptive Univariate Analysis: Focuses on analyzing one attribute at a time.
Univariate Frequencies: Counting how often each value or category occurs.
Contents of Univariate Analysis: Measures of central tendency (mean, median, mode),
measures of dispersion (range, variance, standard deviation), and distribution shape
(skewness, kurtosis).
Univariate Statistics: Statistics describing one variable at a time.
Common Univariate Probability Distributions: Normal, Binomial, Poisson, Exponential,
Uniform.
3. Data Visualization
Data Visualization is the graphical representation of data to make trends, patterns, and
insights easier to understand. Common techniques include histograms, bar charts, scatter
plots, box plots, and heatmaps.
4. Descriptive Bivariate Analysis
Two Quantitative Attributes: Scatter plots, correlation coefficients, regression analysis.
Two Qualitative Attributes: Cross-tabulation (contingency tables), Chi-square tests.
At Least One Nominal Attribute: Bar charts, stacked bar charts, ANOVA for group
comparison.
Two Ordinal Attributes: Spearman’s rank correlation, Kendall’s tau.