Fundamentals of Data Science
Chapter 1: Definition of Data Science
Data science is an interdisciplinary field that combines statistics, computer science, domain
knowledge, and data analysis techniques to extract insights and knowledge from structured and
unstructured data.
It draws upon tools and techniques from mathematics, statistics, data engineering, machine
learning, visualization, and domain-specific knowledge to transform raw data into actionable
intelligence.
Fundamentals of Data Science
Chapter 2: Basic Terminology
Basic terminology in data science includes:
- Dataset: A collection of data.
- Feature: A variable or attribute used in analysis.
- Label: The target variable in supervised learning.
- Algorithm: A procedure or formula for solving a problem.
- Model: The representation produced by training an algorithm on data.
Fundamentals of Data Science
Chapter 3: Venn Diagram of Data Science
A common Venn diagram for data science illustrates the intersection of three fields:
1. Computer Science (Programming and Software Engineering)
2. Mathematics & Statistics (Inference and Data Analysis)
3. Domain Expertise (Subject Matter Knowledge)
The center of this intersection is data science.
Fundamentals of Data Science
Chapter 4: Types of Data
Types of Data:
1. Structured Data: Organized in rows and columns (e.g., SQL databases).
2. Unstructured Data: No pre-defined format (e.g., text, images, videos).
Quantitative vs Qualitative Data:
- Quantitative: Numerical, measurable data (e.g., height, weight).
- Qualitative: Descriptive data (e.g., gender, color, opinion).
Fundamentals of Data Science
Chapter 5: The Four Levels of Data
The Four Levels of Data:
1. Nominal: Categorical without order (e.g., gender, color).
2. Ordinal: Categorical with order (e.g., ratings, education level).
3. Interval: Numerical without a true zero (e.g., temperature in Celsius).
4. Ratio: Numerical with a true zero (e.g., height, weight).
Fundamentals of Data Science
Chapter 6: Five Steps of the Data Science Process
Five Steps of the Data Science Process:
1. Data Collection: Gathering data from various sources.
2. Data Cleaning: Fixing or removing incorrect, incomplete, or duplicate data.
3. Data Exploration: Understanding patterns and distributions.
4. Modeling: Applying algorithms to build predictive models.
5. Deployment and Communication: Sharing results and deploying models.
Fundamentals of Data Science
Chapter 7: Data Science Classification
Data science classification refers to the process of categorizing data points into predefined labels or
classes using supervised learning techniques such as:
- Logistic Regression
- Decision Trees
- Random Forests
- Support Vector Machines (SVM)
Fundamentals of Data Science
Chapter 8: Data Science Algorithms
Common data science algorithms include:
- Linear and Logistic Regression
- Decision Trees and Random Forests
- K-Nearest Neighbors (KNN)
- Support Vector Machines (SVM)
- Naive Bayes
- K-Means Clustering
- Principal Component Analysis (PCA)
Fundamentals of Data Science
Chapter 9: Components of Data Science
Components of Data Science:
- Data Engineering
- Data Preparation
- Modeling
- Evaluation
- Visualization
- Communication
Fundamentals of Data Science
Chapter 10: Role of a Data Scientist
Role of a Data Scientist:
- Gather and preprocess data
- Analyze and interpret complex data
- Develop models and algorithms
- Communicate results to stakeholders
- Collaborate with domain experts and software engineers