Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views20 pages

Notes Unit 1-3 Part-II

The document outlines the machine learning cycle, including planning, data preparation, model engineering, deployment, and maintenance. It discusses various challenges in machine learning, such as data quality, algorithm selection, and ethical considerations. Additionally, it describes types of data used in machine learning and the differences between supervised, unsupervised, and reinforcement learning algorithms.

Uploaded by

Mayank Purohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views20 pages

Notes Unit 1-3 Part-II

The document outlines the machine learning cycle, including planning, data preparation, model engineering, deployment, and maintenance. It discusses various challenges in machine learning, such as data quality, algorithm selection, and ethical considerations. Additionally, it describes types of data used in machine learning and the differences between supervised, unsupervised, and reinforcement learning algorithms.

Uploaded by

Mayank Purohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Machine Learning or Data Science

Cycle
1 Planning 2 Data Preparation
• assessing the scope, success metric, and Data collection and labeling
feasibility of the ML application. Cost-benefit Data Cleaning
analysis • we will clean the data by imputing
• Furthermore, you need to define clear and missing values, analyzing wrong-
measurable success metrics for business, labeled data, removing outliers, and
machine learning models (Accuracy, F1 score, reducing the noise

Dr. Rahul Dubey


AUC), and economic (key performance Data processing
indicators). • The data processing stage involves
3 Model Engineering feature selection, dealing with
•Build effective model architecture by doing imbalanced classes, feature
extensive research. engineering, data augmentation, and
•Defining model metrics. normalizing and scaling the da
•Training and validating the model on the training
and validation dataset. 4 Model Deployment
•Tracking experiments, metadata, features, code 5 Maintenance & Monitoring
changes, and machine learning pipelines.
•Performing model compression and ensembling.
•Interpreting the results by incorporating domain
knowledge experts.
Machine Learning Challenges
1. Data Challenges
•Data Quality: Poor-quality data, such as missing values, noise, or outliers, can negatively
impact model performance., Insufficient data can prevent models from generalizing effectively.
•Imbalanced Data: When certain classes or labels are underrepresented, it can lead to biased
models.
•Feature Engineering: Identifying and crafting relevant features is labor-intensive and requires
domain knowledge.
•Data Privacy and Security: Ensuring privacy while collecting and processing sensitive data is

Dr. Rahul Dubey


a critical challenge.
2. Algorithmic Challenges
•Overfitting and Underfitting: Balancing model complexity to avoid these issues can be
difficult.
•Model Interpretability: Complex models, such as deep neural networks, are often hard to
interpret and explain.
•Algorithm Selection: Choosing the right algorithm for a specific task requires expertise and
experimentation.
3. Computational Challenges
•Resource Intensity: Training large models requires significant computational power and can be
time-consuming.
•Scalability: Handling large datasets or deploying models at scale can be technically challenging.
•Hyperparameter Tuning: Finding the optimal set of hyperparameters is often a trial-and-error
Machine Learning Challenges
4. Deployment and Maintenance
•Integration with Existing Systems
•Model Monitoring:
•Updating Models:
•5. Ethical and Social Challenges
•Bias and Fairness:
•Accountability:
•Transparency:
6. Domain-Specific Challenges

expertise. Dr. Rahul Dubey


•Contextual Knowledge: Applying ML to specific industries often requires deep domain

•Regulatory Compliance: Navigating legal and regulatory frameworks, especially in sensitive


fields like healthcare and finance.
7. Learning and Experimentation
•Reproducibility: Reproducing results across different environments and datasets can be
challenging.
•Experiment Management: Keeping track of various experiments, configurations, and results is
crucial but complex.
8. Human Factors
•Skill Gap: A shortage of skilled ML practitioners can hinder the adoption of ML in
organizations.
•Collaboration: Effective communication between data scientists, engineers, and domain experts
How ML Algorithms Works?

Dr. Rahul Dubey

Source: https://www.spaceotechnologies.com/machine-learning-app-development-complete-guide/

24
Types of Data in ML
➢ ML is simply a mapping between input to output data.

➢ Numeric data

➢ Categorical data

➢ Text data
Dr. Rahul Dubey
➢ Image data

➢ Video Data

➢ Audio Data

➢ Time Series Data

25
Types of Data in ML
➢ ML is simply a mapping between input to output data.

➢ Numeric data Name Age Height Weight M/F


Anil 50 5.6 70.2 M
➢ Categorical data
Raju 25 5.4 75.8 M
➢ Text data
Dr. Rahul Dubey
Neetu
Meethi
35
8
5.3
3.4
46
24
F
F
➢ Image data

➢ Video Data

➢ Audio Data

➢ Time Series Data

26
Types of data
1) Numerical data
➢ It represents some quantifiable thing that you can measure
(a) Discrete data (b) Continuous data

2) Categorical data
Nominal Data
Dr. Rahul Dubey
➢ A categorical variable (sometimes called a nominal variable) is one that has two or
more categories, but there is no intrinsic ordering to the categories.
➢ For example, gender is a categorical variable having two categories (male and female)

Ordinal data
➢ Mixture of numerical and categorical data.
➢ An ordinal variable is similar to a categorical variable. The difference between the
two is that there is a clear ordering of the variables.
➢ For example, suppose you have a variable, economic status, with three categories
(low, medium and high), movie rating.
27
Data Set
Datasets:
➢ A collection of instances
➢ Dataset consist of feature matrix and target vector

Dr. Rahul Dubey

13 January 2025 28
Iris Dataset

Dr. Rahul Dubey

13 January 2025 29
Training To build
Dataset model

Dataset
To
Test Evaluate
Dataset Model
Training set:
Dr. Rahul Dubey
➢ Training set is used to build a model.
➢ It is used find relevant information on how to associate input data with
output decision. The system is trained by applying these algorithms on the
dataset, all the relevant information is extracted from the data and results are
obtained.
➢ Generally, 70% of the data of the dataset is taken for training data.
Testing set:
 Testing data is used to test model. It is the set of data which is used to verify
whether the system is producing the correct output after being trained or
not. Generally, 30% of the data of the dataset is used for testing.
13 January 2025 30
Learning Algorithm
➢ Machine Learning is a concept which provides ability to the machine to
automatically learn and improve from experience without being explicitly
programmed.
➢ The process of learning begins with observations in order to find patterns in
data and make better decisions in the future based on the examples that we
provide.

Dr. Rahul Dubey


➢ The primary aim of learning algorithm is to allow the computers learn
automatically without human intervention

Machine Learning
Algorithm

Supervised Un-Supervised Reinforcement


Learning Learning Learning
13 January 2025 Algorithm Algorithm Algorithm 31
Types of ML

Dr. Rahul Dubey


Supervised Learning
➢ Learning in the presence of instructor/supervisor/teacher
❖ Ex. Classroom teaching

Dr. Rahul Dubey


➢ Trained machine on a labelled dataset.

➢ Labelled dataset is one which have both input and output


parameters.

➢ It is task driven because outcomes of a supervised learning


algorithm are controlled by the task.

13 January 2025 33
10
Num-1 Num-2 Sum

5 5 10
8 2 10 5
Model Logic
10 3 13
5
15 6 21

Dr. Rahul Dubey


20 4 21 Training Phase
30 40 70

30
Trained Model 70
40

Testing Phase

13 January 2025 34
Dr. Rahul Dubey
Training Phase

Testing Phase
Source: https://mc.ai/supervised-vs-unsupervised-learning/

13 January 2025 35
Types of Supervised Learning
Machine Learning
Algorithm

Dr. Rahul Dubey


Supervised Un-Supervised Reinforcement
Learning Learning Learning
Algorithm Algorithm Algorithm

Regression Classification

13 January 2025 36
Supervised Learning
Regression vs. Classification
Regression Classification

Dr. Rahul Dubey


Linear Regression

Dr. Rahul Dubey


How to Calculate Coefficient
➢Using correlation & standard deviation (shortcut method).

Dr. Rahul Dubey

39
Dr. Rahul Dubey

40

You might also like