Session 4 Machine Learning Process

This document outlines the machine learning process, detailing the steps involved in developing a machine learning model, including problem definition, data gathering, preparation, analysis, feature engineering, model training, evaluation, and deployment. It emphasizes the importance of systematic development and best practices to enhance model performance. Additionally, it includes an assignment to describe various machine learning processes and compare them with data mining processes.

Uploaded by

owekesa361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views28 pages

Session 4 Machine Learning Process

Uploaded by

owekesa361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Session 4

Machine Learning Process

Learning Outcomes
• By the end of this lecture, you will be able to:

• Understand the process of developing a machine

learning model.
• Identify and explain each step in the machine learning
life cycle.
• Apply the machine learning life cycle to real-world
examples.
• Recognize common challenges and best practices in
each phase of the cycle.
Machine learning overview
• Machine learning is a subset of artificial intelligence (AI).
• Trains computers to mimic human thinking.
• Utilizes real-world data for training.
• It follows predefined steps to train computer
• This process is known as a machine learning lifecycle.
Steps in the Machine Learning Process
• Guides the development and deployment of machine
learning models.
• It’s a Structured process with various steps.
• Understanding the life cycle ensures:
• systematic development and deployment,
• improves efficiency, and
• enhances model performance.
Steps in the Machine Learning Process
• Prior to starting the process, you need toClearly define the
problem you aim to solve Problem Definition

Example: Predicting customer churn for a telecom

company [problem].
• Key Considerations: Business objectives, success metrics,
feasibility.
Step 1: Gathering Data
• Identify Data Sources
• Recognize where data can be collected from.
• Examples: Files, databases, internet, mobile devices.
• Collect Data
• Gather data from identified sources.
• Ensure data is relevant and comprehensive.
• Integrate Data
• Combine data from different sources.
• Create a coherent and unified dataset.
• Outcome
• Readytouse dataset for further processing.
Step 2: Data Preparation
• Raw data, is often messy and unstructured.
• Data cleaning involves addressing issues such as missing
values, outliers, and inconsistencies that could compromise the
accuracy and reliability of the machine learning model.
Objective
• Refine raw data for meaningful analysis.
• Lay the foundation for robust model development.

• The basic features of Data Cleaning and Preprocessing are

discussed next:
Step 2: Data Preparation
Data Cleaning
• Address missing values.
• Handle outliers.
• Resolve inconsistencies.
Data Preprocessing
• Standardize formats.
• Scale values.
• Encode categorical variables.
Step 2: Data Preparation
Data Quality
• Ensure well-organized data.
• Prepare for meaningful analysis.
Data Integrity
• Maintain dataset integrity.
• Effective cleaning and preprocessing.
Step 3: Data Wrangling
• The process of cleaning and converting raw data into a
useable format.
• It is the process of cleaning the data, selecting the
variable to use, and transforming the data in a proper
format to make it more suitable for analysis in the next
step.
• Cleaning of data is required to address the quality issues.
Step 3: Data Wrangling
• In real-world applications, collected data may have
various issues, including:
Missing Values
Duplicate data
Invalid data
Noise (irrelevant or meaningless data)
• So, we use various filtering techniques to clean the data.
• It is mandatory to detect and remove the above issues
because it can negatively affect the quality of the
outcome.
Step 4: Analyze Data
• Also called “Exploratory Data Analysis (EDA) ”
• Understanding the underlying patterns and characteristics
of collected data.
• Leveraging statistical and visual tools to gain insights into
the dataset’s structure.
• Visualizations, summary statistics, and correlation
analyses play crucial role.
• Example of data visualization (e.g., histogram, scatter
plot).
Step 4: Analyze Data
• Exploration: Use statistical and visual tools to explore the
structure and patterns in the data.
• Patterns and Trends: Identify underlying patterns, trends,
and potential challenges within the dataset.
• Insights: Gain valuable insights to inform decisions in later
stages of the machine learning process.
• Decision Making: Use exploratory data analysis to make
informed decisions about feature engineering and model
selection.
Step 5: Feature Engineering and
Selection
• Feature Selection: Identify the subset of features that most
significantly impact the model’s performance.
• Feature Engineering: Create new features or transform
existing ones to better capture patterns and relationships.
• Requires domain expertise and a deep understanding of
the problem
• Aim is o engineer features that contribute meaningfully to
predictive power.
• Optimization: Balance feature set for predictive accuracy
while minimizing computational complexity.
Step 5: Feature Engineering and
Selection - Example using Python
Problem: to predict the `price` of houses using the available
features.
Dataset :Assume we have a dataset `house_data.csv` with the
following columns:
• house_id
• size_in_sqft
• num_bedrooms
• num_bathrooms
• location
• year_built
• price
Step 5: Feature Engineering and
Selection – Example using Python
Loading the Data:
Step 5: Feature Engineering and
Selection – Example using Python
Exploring the Data :
Step 5: Feature Engineering and
Selection – Example using Python
Handling Missing Values :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Total Rooms: Create a new feature by adding the number
of bedrooms and bathrooms :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Age of House: Create a new feature representing the age
of the house :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Age of House: Create a new feature representing the age
of the house :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Creation
• Location Encoding: Convert categorical data into
numerical data. :
Step 5: Feature Engineering and
Selection – Example using Python
Feature Selection
• Drop less relevant or redundant features :
Step 6: Train Model
• Split the dataset into training and testing
Training Set: Used to train the model.
Testing Set: Used to evaluate the model.
• Select an appropriate machine learning algorithm
Regression: Linear Regression, Ridge, Lasso, etc.
Classification: Logistic Regression, Decision Trees, Random Forest,
SVM, etc.
Clustering: K-Means, Hierarchical Clustering, etc.
• Train the model
Step 7: Model Evaluation
• Test the model to determine the percentage accuracy of
the model.
• Involves rigorous testing against validation datasets.
• Evaluation metrics such as accuracy, precision, recall, and
F1 score are computed to gauge its effectiveness.
• Provides insights into the model’s strengths and
weaknesses.
Step 7: Model Deployment
• We deploy the model in the real-world system.
• The deployment phase is similar to making the final report
for a project.
Next Steps
1. Install Python compatible IDE (Integrated Development
Environment).
2. Install Weka Machine Learning Environment
Assignment:
1. Describe the following machine learning processes:
a. CRISP-DM
b. SEMMA
c. KDD
(6 marks)
2. Identify the key differences and similarities among the
data miming (KDD) and machine learning (CRISP-DM,
SEMMA) processes? (4 marks)
Submit by: 19/05/2025 (hard copy)

Module 1
No ratings yet
Module 1
25 pages
EXAMPLE ML in Real Life
No ratings yet
EXAMPLE ML in Real Life
6 pages
7 Data Preprocessing Steps in Machine Learning
No ratings yet
7 Data Preprocessing Steps in Machine Learning
5 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
No ratings yet
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
21 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Modelling
No ratings yet
Modelling
1,161 pages
ML Life Cycle
No ratings yet
ML Life Cycle
4 pages
Breaking Into AI!
100% (1)
Breaking Into AI!
30 pages
Project Proposal Machine Learning
No ratings yet
Project Proposal Machine Learning
6 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Business Analytics Module 5
No ratings yet
Business Analytics Module 5
263 pages
Ad3511 DL Lab All Lab Manual
No ratings yet
Ad3511 DL Lab All Lab Manual
36 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Machine Learning
No ratings yet
Machine Learning
84 pages
Present Explain
No ratings yet
Present Explain
11 pages
Complete Data Science Learning Guide - Beginner To Expert
No ratings yet
Complete Data Science Learning Guide - Beginner To Expert
25 pages
ML Da
No ratings yet
ML Da
55 pages
Topic 3 Introduction To ARENA
No ratings yet
Topic 3 Introduction To ARENA
96 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Capstone Overview
No ratings yet
Capstone Overview
58 pages
The Machine Learning Process
No ratings yet
The Machine Learning Process
5 pages
Unit 1 Part 4
No ratings yet
Unit 1 Part 4
8 pages
How To Apply ML
No ratings yet
How To Apply ML
4 pages
Machine Learning for Level 5 Students
No ratings yet
Machine Learning for Level 5 Students
116 pages
Churn Prediction with ML Techniques
No ratings yet
Churn Prediction with ML Techniques
77 pages
Supervised Learning Research Paper With Images
No ratings yet
Supervised Learning Research Paper With Images
10 pages
الفصل ١
No ratings yet
الفصل ١
15 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
No ratings yet
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
9 pages
AI & Data Science Project Guide
No ratings yet
AI & Data Science Project Guide
22 pages
ML Notes All
No ratings yet
ML Notes All
32 pages
Week 3 A
No ratings yet
Week 3 A
18 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
Current Trends in Software
No ratings yet
Current Trends in Software
26 pages
Lesson 1 Web App Web Services
No ratings yet
Lesson 1 Web App Web Services
35 pages
Machine Learning Introduction
100% (1)
Machine Learning Introduction
20 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
86 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
30 pages
Unit 1
No ratings yet
Unit 1
41 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
No ratings yet
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
69 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
Unit 1
No ratings yet
Unit 1
32 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
ML Life Cycle
No ratings yet
ML Life Cycle
10 pages
Lesson 8 Intro To Laravel
No ratings yet
Lesson 8 Intro To Laravel
26 pages
Literature Review of Supply Chain Management System
100% (2)
Literature Review of Supply Chain Management System
7 pages
Machine Learning Training Report
No ratings yet
Machine Learning Training Report
36 pages
Session 3 Types of Machine Learning
No ratings yet
Session 3 Types of Machine Learning
22 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
B.Tech Data Mining Exam Guide
No ratings yet
B.Tech Data Mining Exam Guide
3 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
18 pages
ML System Optimization - Lecture 10 - Tiny-Machine-Learning
No ratings yet
ML System Optimization - Lecture 10 - Tiny-Machine-Learning
40 pages
Shwet Mlds
No ratings yet
Shwet Mlds
35 pages
MSDSModule 2
No ratings yet
MSDSModule 2
35 pages
Robotic and AI Book-9 Booklet
No ratings yet
Robotic and AI Book-9 Booklet
12 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Copy of PLP Standard Pitch Deck Template
No ratings yet
Copy of PLP Standard Pitch Deck Template
16 pages
Manual Data
No ratings yet
Manual Data
13 pages
Artificial Intelligence and Autonomous Vehicles
No ratings yet
Artificial Intelligence and Autonomous Vehicles
6 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
Machine Learning Essentials Guide
No ratings yet
Machine Learning Essentials Guide
33 pages
Research Paper 4
No ratings yet
Research Paper 4
12 pages
Research Paper 2
No ratings yet
Research Paper 2
12 pages
How To Prepare Data For Machine Learning
No ratings yet
How To Prepare Data For Machine Learning
34 pages
Research Paper 5
No ratings yet
Research Paper 5
11 pages
10 Machine Learning
No ratings yet
10 Machine Learning
9 pages
Lesson 6 PHP MYSQL CRUD
No ratings yet
Lesson 6 PHP MYSQL CRUD
13 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
AIML Unit 2 Introduction To Machine Learning
No ratings yet
AIML Unit 2 Introduction To Machine Learning
32 pages
Machine Learning in Software Effort Estimation
No ratings yet
Machine Learning in Software Effort Estimation
18 pages
Expert Systems - 2023 - Khalane - Evaluating Significant Features in Context Aware Multimodal e
No ratings yet
Expert Systems - 2023 - Khalane - Evaluating Significant Features in Context Aware Multimodal e
25 pages
Machine Learning-1
No ratings yet
Machine Learning-1
64 pages
AI's Impact on Workplace Safety
No ratings yet
AI's Impact on Workplace Safety
14 pages
Higher Education Loans Board
No ratings yet
Higher Education Loans Board
4 pages
A Digital Twin Framework For Aircraft Hydraulic Systems Failure Detection Using Machine Learning Techniques
No ratings yet
A Digital Twin Framework For Aircraft Hydraulic Systems Failure Detection Using Machine Learning Techniques
53 pages
Knowledge Enhanced Graph Convolutional Networks For Arabic Aspect Sentiment Classification
No ratings yet
Knowledge Enhanced Graph Convolutional Networks For Arabic Aspect Sentiment Classification
14 pages
Short Code Application Form
No ratings yet
Short Code Application Form
3 pages
AMA 4417FUNCTIONAL ANALYSIS Course Outline
No ratings yet
AMA 4417FUNCTIONAL ANALYSIS Course Outline
3 pages
8 Machine Learning in Trading
No ratings yet
8 Machine Learning in Trading
17 pages
IBM Watson Studio Explanation Cleaned
No ratings yet
IBM Watson Studio Explanation Cleaned
3 pages
AMA 4415 ALGEBRAIC GEOMETRY Course Outline
No ratings yet
AMA 4415 ALGEBRAIC GEOMETRY Course Outline
2 pages
Data Science for SpaceX Competitors
No ratings yet
Data Science for SpaceX Competitors
13 pages
Updated Research Paper - 295
No ratings yet
Updated Research Paper - 295
19 pages
CV Houssemeddine Sassi
No ratings yet
CV Houssemeddine Sassi
1 page
Roleof Artificial Intelligencein Education Published
No ratings yet
Roleof Artificial Intelligencein Education Published
6 pages
AI Meets Database AI4DB and DB4AI
No ratings yet
AI Meets Database AI4DB and DB4AI
8 pages
Face Recognition Using AI
No ratings yet
Face Recognition Using AI
42 pages
Samuel's Resume
No ratings yet
Samuel's Resume
1 page
DS 1
No ratings yet
DS 1
2 pages
Comprehensive Viva Amit Rawat
No ratings yet
Comprehensive Viva Amit Rawat
12 pages
PRIMETAL - 8A-4 Through-Process Optimization (TPO) PDF
No ratings yet
PRIMETAL - 8A-4 Through-Process Optimization (TPO) PDF
12 pages
Resume Mayank Yadav
No ratings yet
Resume Mayank Yadav
2 pages
Mathematical Algorithms For Artificial Intelligence and Big Data
No ratings yet
Mathematical Algorithms For Artificial Intelligence and Big Data
34 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Battery Management System To Estimate Battery Agin
No ratings yet
Battery Management System To Estimate Battery Agin
15 pages
Crowd Management Main
No ratings yet
Crowd Management Main
33 pages

Session 4 Machine Learning Process

Uploaded by

Session 4 Machine Learning Process

Uploaded by

Session 4

Machine Learning Process

• Understand the process of developing a machine

Example: Predicting customer churn for a telecom

• The basic features of Data Cleaning and Preprocessing are

You might also like