Machine Learning or Data Science
Cycle
1 Planning 2 Data Preparation
• assessing the scope, success metric, and Data collection and labeling
feasibility of the ML application. Cost-benefit Data Cleaning
analysis • we will clean the data by imputing
• Furthermore, you need to define clear and missing values, analyzing wrong-
measurable success metrics for business, labeled data, removing outliers, and
machine learning models (Accuracy, F1 score, reducing the noise
Dr. Rahul Dubey
AUC), and economic (key performance Data processing
indicators). • The data processing stage involves
3 Model Engineering feature selection, dealing with
•Build effective model architecture by doing imbalanced classes, feature
extensive research. engineering, data augmentation, and
•Defining model metrics. normalizing and scaling the da
•Training and validating the model on the training
and validation dataset. 4 Model Deployment
•Tracking experiments, metadata, features, code 5 Maintenance & Monitoring
changes, and machine learning pipelines.
•Performing model compression and ensembling.
•Interpreting the results by incorporating domain
knowledge experts.
Machine Learning Challenges
1. Data Challenges
•Data Quality: Poor-quality data, such as missing values, noise, or outliers, can negatively
impact model performance., Insufficient data can prevent models from generalizing effectively.
•Imbalanced Data: When certain classes or labels are underrepresented, it can lead to biased
models.
•Feature Engineering: Identifying and crafting relevant features is labor-intensive and requires
domain knowledge.
•Data Privacy and Security: Ensuring privacy while collecting and processing sensitive data is
Dr. Rahul Dubey
a critical challenge.
2. Algorithmic Challenges
•Overfitting and Underfitting: Balancing model complexity to avoid these issues can be
difficult.
•Model Interpretability: Complex models, such as deep neural networks, are often hard to
interpret and explain.
•Algorithm Selection: Choosing the right algorithm for a specific task requires expertise and
experimentation.
3. Computational Challenges
•Resource Intensity: Training large models requires significant computational power and can be
time-consuming.
•Scalability: Handling large datasets or deploying models at scale can be technically challenging.
•Hyperparameter Tuning: Finding the optimal set of hyperparameters is often a trial-and-error
Machine Learning Challenges
4. Deployment and Maintenance
•Integration with Existing Systems
•Model Monitoring:
•Updating Models:
•5. Ethical and Social Challenges
•Bias and Fairness:
•Accountability:
•Transparency:
6. Domain-Specific Challenges
expertise. Dr. Rahul Dubey
•Contextual Knowledge: Applying ML to specific industries often requires deep domain
•Regulatory Compliance: Navigating legal and regulatory frameworks, especially in sensitive
fields like healthcare and finance.
7. Learning and Experimentation
•Reproducibility: Reproducing results across different environments and datasets can be
challenging.
•Experiment Management: Keeping track of various experiments, configurations, and results is
crucial but complex.
8. Human Factors
•Skill Gap: A shortage of skilled ML practitioners can hinder the adoption of ML in
organizations.
•Collaboration: Effective communication between data scientists, engineers, and domain experts
How ML Algorithms Works?
Dr. Rahul Dubey
Source: https://www.spaceotechnologies.com/machine-learning-app-development-complete-guide/
24
Types of Data in ML
➢ ML is simply a mapping between input to output data.
➢ Numeric data
➢ Categorical data
➢ Text data
Dr. Rahul Dubey
➢ Image data
➢ Video Data
➢ Audio Data
➢ Time Series Data
25
Types of Data in ML
➢ ML is simply a mapping between input to output data.
➢ Numeric data Name Age Height Weight M/F
Anil 50 5.6 70.2 M
➢ Categorical data
Raju 25 5.4 75.8 M
➢ Text data
Dr. Rahul Dubey
Neetu
Meethi
35
8
5.3
3.4
46
24
F
F
➢ Image data
➢ Video Data
➢ Audio Data
➢ Time Series Data
26
Types of data
1) Numerical data
➢ It represents some quantifiable thing that you can measure
(a) Discrete data (b) Continuous data
2) Categorical data
Nominal Data
Dr. Rahul Dubey
➢ A categorical variable (sometimes called a nominal variable) is one that has two or
more categories, but there is no intrinsic ordering to the categories.
➢ For example, gender is a categorical variable having two categories (male and female)
Ordinal data
➢ Mixture of numerical and categorical data.
➢ An ordinal variable is similar to a categorical variable. The difference between the
two is that there is a clear ordering of the variables.
➢ For example, suppose you have a variable, economic status, with three categories
(low, medium and high), movie rating.
27
Data Set
Datasets:
➢ A collection of instances
➢ Dataset consist of feature matrix and target vector
Dr. Rahul Dubey
13 January 2025 28
Iris Dataset
Dr. Rahul Dubey
13 January 2025 29
Training To build
Dataset model
Dataset
To
Test Evaluate
Dataset Model
Training set:
Dr. Rahul Dubey
➢ Training set is used to build a model.
➢ It is used find relevant information on how to associate input data with
output decision. The system is trained by applying these algorithms on the
dataset, all the relevant information is extracted from the data and results are
obtained.
➢ Generally, 70% of the data of the dataset is taken for training data.
Testing set:
Testing data is used to test model. It is the set of data which is used to verify
whether the system is producing the correct output after being trained or
not. Generally, 30% of the data of the dataset is used for testing.
13 January 2025 30
Learning Algorithm
➢ Machine Learning is a concept which provides ability to the machine to
automatically learn and improve from experience without being explicitly
programmed.
➢ The process of learning begins with observations in order to find patterns in
data and make better decisions in the future based on the examples that we
provide.
Dr. Rahul Dubey
➢ The primary aim of learning algorithm is to allow the computers learn
automatically without human intervention
Machine Learning
Algorithm
Supervised Un-Supervised Reinforcement
Learning Learning Learning
13 January 2025 Algorithm Algorithm Algorithm 31
Types of ML
Dr. Rahul Dubey
Supervised Learning
➢ Learning in the presence of instructor/supervisor/teacher
❖ Ex. Classroom teaching
Dr. Rahul Dubey
➢ Trained machine on a labelled dataset.
➢ Labelled dataset is one which have both input and output
parameters.
➢ It is task driven because outcomes of a supervised learning
algorithm are controlled by the task.
13 January 2025 33
10
Num-1 Num-2 Sum
5 5 10
8 2 10 5
Model Logic
10 3 13
5
15 6 21
Dr. Rahul Dubey
20 4 21 Training Phase
30 40 70
30
Trained Model 70
40
Testing Phase
13 January 2025 34
Dr. Rahul Dubey
Training Phase
Testing Phase
Source: https://mc.ai/supervised-vs-unsupervised-learning/
13 January 2025 35
Types of Supervised Learning
Machine Learning
Algorithm
Dr. Rahul Dubey
Supervised Un-Supervised Reinforcement
Learning Learning Learning
Algorithm Algorithm Algorithm
Regression Classification
13 January 2025 36
Supervised Learning
Regression vs. Classification
Regression Classification
Dr. Rahul Dubey
Linear Regression
Dr. Rahul Dubey
How to Calculate Coefficient
➢Using correlation & standard deviation (shortcut method).
Dr. Rahul Dubey
39
Dr. Rahul Dubey
40