0% found this document useful (0 votes)

18 views20 pages

Notes Unit 1-3 Part-II

The document outlines the machine learning cycle, including planning, data preparation, model engineering, deployment, and maintenance. It discusses various challenges in machine learning, such as data quality, algorithm selection, and ethical considerations. Additionally, it describes types of data used in machine learning and the differences between supervised, unsupervised, and reinforcement learning algorithms.

Uploaded by

Mayank Purohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views20 pages

Notes Unit 1-3 Part-II

Uploaded by

Mayank Purohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Machine Learning or Data Science

Cycle
1 Planning 2 Data Preparation
• assessing the scope, success metric, and Data collection and labeling
feasibility of the ML application. Cost-benefit Data Cleaning
analysis • we will clean the data by imputing
• Furthermore, you need to define clear and missing values, analyzing wrong-
measurable success metrics for business, labeled data, removing outliers, and
machine learning models (Accuracy, F1 score, reducing the noise

Dr. Rahul Dubey

AUC), and economic (key performance Data processing
indicators). • The data processing stage involves
3 Model Engineering feature selection, dealing with
•Build effective model architecture by doing imbalanced classes, feature
extensive research. engineering, data augmentation, and
•Defining model metrics. normalizing and scaling the da
•Training and validating the model on the training
and validation dataset. 4 Model Deployment
•Tracking experiments, metadata, features, code 5 Maintenance & Monitoring
changes, and machine learning pipelines.
•Performing model compression and ensembling.
•Interpreting the results by incorporating domain
knowledge experts.
Machine Learning Challenges
1. Data Challenges
•Data Quality: Poor-quality data, such as missing values, noise, or outliers, can negatively
impact model performance., Insufficient data can prevent models from generalizing effectively.
•Imbalanced Data: When certain classes or labels are underrepresented, it can lead to biased
models.
•Feature Engineering: Identifying and crafting relevant features is labor-intensive and requires
domain knowledge.
•Data Privacy and Security: Ensuring privacy while collecting and processing sensitive data is

Dr. Rahul Dubey

a critical challenge.
2. Algorithmic Challenges
•Overfitting and Underfitting: Balancing model complexity to avoid these issues can be
difficult.
•Model Interpretability: Complex models, such as deep neural networks, are often hard to
interpret and explain.
•Algorithm Selection: Choosing the right algorithm for a specific task requires expertise and
experimentation.
3. Computational Challenges
•Resource Intensity: Training large models requires significant computational power and can be
time-consuming.
•Scalability: Handling large datasets or deploying models at scale can be technically challenging.
•Hyperparameter Tuning: Finding the optimal set of hyperparameters is often a trial-and-error
Machine Learning Challenges
4. Deployment and Maintenance
•Integration with Existing Systems
•Model Monitoring:
•Updating Models:
•5. Ethical and Social Challenges
•Bias and Fairness:
•Accountability:
•Transparency:
6. Domain-Specific Challenges

expertise. Dr. Rahul Dubey

•Contextual Knowledge: Applying ML to specific industries often requires deep domain

•Regulatory Compliance: Navigating legal and regulatory frameworks, especially in sensitive

fields like healthcare and finance.
7. Learning and Experimentation
•Reproducibility: Reproducing results across different environments and datasets can be
challenging.
•Experiment Management: Keeping track of various experiments, configurations, and results is
crucial but complex.
8. Human Factors
•Skill Gap: A shortage of skilled ML practitioners can hinder the adoption of ML in
organizations.
•Collaboration: Effective communication between data scientists, engineers, and domain experts
How ML Algorithms Works?

Dr. Rahul Dubey

Source: https://www.spaceotechnologies.com/machine-learning-app-development-complete-guide/

24
Types of Data in ML
➢ ML is simply a mapping between input to output data.

➢ Numeric data

➢ Categorical data

➢ Text data
Dr. Rahul Dubey
➢ Image data

➢ Video Data

➢ Audio Data

➢ Time Series Data

25
Types of Data in ML
➢ ML is simply a mapping between input to output data.

➢ Numeric data Name Age Height Weight M/F

Anil 50 5.6 70.2 M
➢ Categorical data
Raju 25 5.4 75.8 M
➢ Text data
Dr. Rahul Dubey
Neetu
Meethi
35
8
5.3
3.4
46
24
F
F
➢ Image data

➢ Video Data

➢ Audio Data

➢ Time Series Data

26
Types of data
1) Numerical data
➢ It represents some quantifiable thing that you can measure
(a) Discrete data (b) Continuous data

2) Categorical data
Nominal Data
Dr. Rahul Dubey
➢ A categorical variable (sometimes called a nominal variable) is one that has two or
more categories, but there is no intrinsic ordering to the categories.
➢ For example, gender is a categorical variable having two categories (male and female)

Ordinal data
➢ Mixture of numerical and categorical data.
➢ An ordinal variable is similar to a categorical variable. The difference between the
two is that there is a clear ordering of the variables.
➢ For example, suppose you have a variable, economic status, with three categories
(low, medium and high), movie rating.
27
Data Set
Datasets:
➢ A collection of instances
➢ Dataset consist of feature matrix and target vector

Dr. Rahul Dubey

13 January 2025 28
Iris Dataset

Dr. Rahul Dubey

13 January 2025 29
Training To build
Dataset model

Dataset
To
Test Evaluate
Dataset Model
Training set:
Dr. Rahul Dubey
➢ Training set is used to build a model.
➢ It is used find relevant information on how to associate input data with
output decision. The system is trained by applying these algorithms on the
dataset, all the relevant information is extracted from the data and results are
obtained.
➢ Generally, 70% of the data of the dataset is taken for training data.
Testing set:
 Testing data is used to test model. It is the set of data which is used to verify
whether the system is producing the correct output after being trained or
not. Generally, 30% of the data of the dataset is used for testing.
13 January 2025 30
Learning Algorithm
➢ Machine Learning is a concept which provides ability to the machine to
automatically learn and improve from experience without being explicitly
programmed.
➢ The process of learning begins with observations in order to find patterns in
data and make better decisions in the future based on the examples that we
provide.

Dr. Rahul Dubey

➢ The primary aim of learning algorithm is to allow the computers learn
automatically without human intervention

Machine Learning
Algorithm

Supervised Un-Supervised Reinforcement

Learning Learning Learning
13 January 2025 Algorithm Algorithm Algorithm 31
Types of ML

Dr. Rahul Dubey

Supervised Learning
➢ Learning in the presence of instructor/supervisor/teacher
❖ Ex. Classroom teaching

Dr. Rahul Dubey

➢ Trained machine on a labelled dataset.

➢ Labelled dataset is one which have both input and output

parameters.

➢ It is task driven because outcomes of a supervised learning

algorithm are controlled by the task.

13 January 2025 33
10
Num-1 Num-2 Sum

5 5 10
8 2 10 5
Model Logic
10 3 13
5
15 6 21

Dr. Rahul Dubey

20 4 21 Training Phase
30 40 70

30
Trained Model 70
40

Testing Phase

13 January 2025 34
Dr. Rahul Dubey
Training Phase

Testing Phase
Source: https://mc.ai/supervised-vs-unsupervised-learning/

13 January 2025 35
Types of Supervised Learning
Machine Learning
Algorithm

Dr. Rahul Dubey

Supervised Un-Supervised Reinforcement
Learning Learning Learning
Algorithm Algorithm Algorithm

Regression Classification

13 January 2025 36
Supervised Learning
Regression vs. Classification
Regression Classification

Dr. Rahul Dubey

Linear Regression

Dr. Rahul Dubey

How to Calculate Coefficient
➢Using correlation & standard deviation (shortcut method).

Dr. Rahul Dubey

39
Dr. Rahul Dubey

Machine Learning?
100% (5)
Machine Learning?
114 pages
Autoencoder Models for Asset Pricing
No ratings yet
Autoencoder Models for Asset Pricing
22 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
73 pages
Machine Learning (Unit I)
No ratings yet
Machine Learning (Unit I)
12 pages
Zarantech - Intro To ML
No ratings yet
Zarantech - Intro To ML
105 pages
Intro to Machine Learning & kNN
No ratings yet
Intro to Machine Learning & kNN
90 pages
Basic Concepts of Machine Learning For Beginners
No ratings yet
Basic Concepts of Machine Learning For Beginners
102 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Machine Learning in Data Science
No ratings yet
Machine Learning in Data Science
4 pages
ML Unit1
No ratings yet
ML Unit1
6 pages
ML Module I
No ratings yet
ML Module I
71 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
Module 1 Part - 1
No ratings yet
Module 1 Part - 1
42 pages
Chapter1 Machine Learning
No ratings yet
Chapter1 Machine Learning
26 pages
Machine Learning Basics & kNN Guide
No ratings yet
Machine Learning Basics & kNN Guide
94 pages
3 - InnovatiCS - Introduction To CRISP-DM
No ratings yet
3 - InnovatiCS - Introduction To CRISP-DM
35 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
Data Science & ML Course Guide
No ratings yet
Data Science & ML Course Guide
83 pages
Module 4
No ratings yet
Module 4
28 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
316 pages
Introduction - Final
No ratings yet
Introduction - Final
64 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Unit 1
No ratings yet
Unit 1
62 pages
Faculty Notes 2
No ratings yet
Faculty Notes 2
44 pages
Module - 1
No ratings yet
Module - 1
9 pages
Introduction To ML Unit-1
No ratings yet
Introduction To ML Unit-1
90 pages
AI Module 1 Simple Notes
No ratings yet
AI Module 1 Simple Notes
14 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
1-Introduction To Machine Learning
No ratings yet
1-Introduction To Machine Learning
61 pages
Tutorial Sheet1 (M.L.)
No ratings yet
Tutorial Sheet1 (M.L.)
49 pages
Lec-7 Intro Machine Learning
No ratings yet
Lec-7 Intro Machine Learning
87 pages
Class1 - Introduction and Foundation-1717413257735
No ratings yet
Class1 - Introduction and Foundation-1717413257735
23 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Unit 1
No ratings yet
Unit 1
93 pages
Unit1 ML NGP
No ratings yet
Unit1 ML NGP
106 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
40 pages
Top 45 Machine Learning Interview Questions in 2025
100% (1)
Top 45 Machine Learning Interview Questions in 2025
37 pages
Introduction To ML
No ratings yet
Introduction To ML
48 pages
L2 - SLM Notes (Pre-Processing)
No ratings yet
L2 - SLM Notes (Pre-Processing)
37 pages
Made By: Swati Tripathi
No ratings yet
Made By: Swati Tripathi
31 pages
Unit 1 Notes - FML
No ratings yet
Unit 1 Notes - FML
95 pages
Module1 Introduction
No ratings yet
Module1 Introduction
35 pages
Chapter-1 ML Intro
No ratings yet
Chapter-1 ML Intro
36 pages
Dtree
No ratings yet
Dtree
101 pages
Decision Tree Algorithms For Prediction of Heart Disease: Srabanti Maji and Srishti Arora
No ratings yet
Decision Tree Algorithms For Prediction of Heart Disease: Srabanti Maji and Srishti Arora
8 pages
Customer Churn Prediction in The Telecommunication Industries Using RNN
No ratings yet
Customer Churn Prediction in The Telecommunication Industries Using RNN
8 pages
AI Basics for Tech Enthusiasts
No ratings yet
AI Basics for Tech Enthusiasts
125 pages
Managing The Fifth Generation (5G) Wireless Mobile Communication: A Machine Learning Approach For Network Traffic Prediction
No ratings yet
Managing The Fifth Generation (5G) Wireless Mobile Communication: A Machine Learning Approach For Network Traffic Prediction
6 pages
Drones 07 00095
No ratings yet
Drones 07 00095
17 pages
AI and Machine Learning in Azure
No ratings yet
AI and Machine Learning in Azure
31 pages
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
Explainable AI Based Neck Direction Prediction and Analysis During Head Impacts
No ratings yet
Explainable AI Based Neck Direction Prediction and Analysis During Head Impacts
10 pages
Assignment Guideline and Rubric CPC251
No ratings yet
Assignment Guideline and Rubric CPC251
3 pages
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
No ratings yet
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
6 pages
Data Science & Big Data Lab Guide
No ratings yet
Data Science & Big Data Lab Guide
167 pages
Answerkey Set B
No ratings yet
Answerkey Set B
5 pages
xRNNs for Real-World Text Classification
No ratings yet
xRNNs for Real-World Text Classification
4 pages
EC1 M2 Applied Syllabus
No ratings yet
EC1 M2 Applied Syllabus
102 pages
Airline Tweet Sentiment Analysis
No ratings yet
Airline Tweet Sentiment Analysis
16 pages
DANIELA - Perez - Beyond Language Barriers
No ratings yet
DANIELA - Perez - Beyond Language Barriers
16 pages
Mapping Effective Connectivity by Virtually Perturbing A Surrogate Brain
No ratings yet
Mapping Effective Connectivity by Virtually Perturbing A Surrogate Brain
20 pages
Entropy: Comparison of Entropy Calculation Methods For Ransomware Encrypted File Identification
No ratings yet
Entropy: Comparison of Entropy Calculation Methods For Ransomware Encrypted File Identification
28 pages
Crop Yield Pred Iction Using Regression Model
No ratings yet
Crop Yield Pred Iction Using Regression Model
6 pages
XGBoost for DDOS Detection in SDN
No ratings yet
XGBoost for DDOS Detection in SDN
8 pages
Artificial Neural Networks Guide
No ratings yet
Artificial Neural Networks Guide
63 pages
Chatbot NLP
No ratings yet
Chatbot NLP
5 pages
Deep Learning for Mental Illness Prediction
No ratings yet
Deep Learning for Mental Illness Prediction
58 pages
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
No ratings yet
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
12 pages
YoloV8 Classification Model
No ratings yet
YoloV8 Classification Model
6 pages
Springer Lecture Notes in Computer Science
No ratings yet
Springer Lecture Notes in Computer Science
16 pages
Codiste Decode
No ratings yet
Codiste Decode
5 pages
Leaf Alert: A Systematic Rapid Plant Disease Detection
No ratings yet
Leaf Alert: A Systematic Rapid Plant Disease Detection
7 pages

Notes Unit 1-3 Part-II

Uploaded by

Notes Unit 1-3 Part-II

Uploaded by

Machine Learning or Data Science

Dr. Rahul Dubey

Dr. Rahul Dubey

expertise. Dr. Rahul Dubey

•Regulatory Compliance: Navigating legal and regulatory frameworks, especially in sensitive

Dr. Rahul Dubey

➢ Time Series Data

➢ Numeric data Name Age Height Weight M/F

➢ Time Series Data

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

Supervised Un-Supervised Reinforcement

Dr. Rahul Dubey

Dr. Rahul Dubey

➢ Labelled dataset is one which have both input and output

➢ It is task driven because outcomes of a supervised learning

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

You might also like