0% found this document useful (0 votes)

22 views36 pages

Assignment 1

The document discusses Machine Learning (ML), defining it as a subset of AI focused on systems that learn from data. It outlines the evolution of ML, key historical developments, and its impact on various industries such as healthcare, finance, and transportation. Additionally, it compares supervised and unsupervised learning, explains reinforcement learning, and introduces active learning as a method to improve model performance by selecting informative data points for labeling.

Uploaded by

Bidya Sagar Lekhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views36 pages

Assignment 1

Uploaded by

Bidya Sagar Lekhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Assignment: - 1

Subject: - Machine Learning

Submitted by: Submitted to: -

Bidya Sagar Lekhi Er. Pradip Sharma
Roll: - 10
1. Define Machine Learning. How does it differ from traditional
programming approaches?

Ans: - Machine Learning is a subset of artificial intelligence (AI) that

focuses on developing systems that can learn from data, identify
patterns, and make decisions with minimal human intervention.
Machine Learning is the study of algorithms and statistical models that
enable a system to improve its performance on a task through
experience (i.e., data), without being explicitly programmed for that
task.
Aspect Traditional Programming Machine Learning
Approach Programmer defines System learns patterns
rules/logic explicitly from data
Input Data + Program (rules) → Data + Output → Program
Output (model)
Example Writing code to detect Training a model to detect
spam manually by keyword spam based on labeled
matching email examples
Flexibility Rigid – hard to adapt to Adaptive – learns and
new scenarios improves over time
Human Developer writes all logic Developer provides data
Role and chooses algorithms;
the system finds the logic
Error Errors are fixed through Errors reduced through
Handling code changes retraining or more data

2. Discuss the evolution of Machine Learning. Mention key

historical developments and technologies that influenced it.

Ans: - Evolution of Machine Learning: -

Machine Learning (ML) has evolved over several decades, influenced

by advancements in mathematics, statistics, computer science, and
artificial intelligence. Below is a timeline highlighting its key historical
developments and technologies:

1. 1940s–1950s: Foundations and Early Ideas

• Alan Turing (1950): Proposed the idea of a "learning machine"
and introduced the Turing Test, laying conceptual groundwork
for AI.

• Hebbian Learning (1949): A learning theory by Donald Hebb –

"Cells that fire together wire together." Influential in neural
networks.

2. 1950s–1960s: Birth of AI and ML Concepts

• Perceptron (1957): Frank Rosenblatt created the first neural

network model, capable of simple pattern recognition.
• Limitations (1969): Marvin Minsky and Seymour Papert showed
that single layer perceptron couldn’t solve non-linear problems
(e.g., XOR), stalling neural network research.

3. 1970s–1980s: Rule-Based Systems & Statistical Learning

• Expert Systems: Focus shifted to manually coded if-then rules

(e.g., MYCIN for medical diagnosis).
• Bayesian Methods: Interest grew in probabilistic models like the
Naive Bayes classifier.
• Nearest Neighbor & Decision Trees: Simple but effective
algorithms like k-NN, ID3, and CART were introduced.

4. 1986–1990s: Revival of Neural Networks

• Backpropagation Algorithm (1986): Geoffrey Hinton and others

improved training of multi-layer neural networks (MLPs),
reigniting neural net research.
• Support Vector Machines (1995): Introduced by Vapnik; robust
classifier using margin maximization.
• Ensemble Methods: Techniques like Boosting (AdaBoost) and
Bagging (Random Forests) became popular.

5. 2000s: Big Data & Real-World Applications

• Rise of the Internet: Explosion of data led to the need for scalable
ML algorithms.
• Recommender Systems: Used in e-commerce (e.g., Amazon,
Netflix) using collaborative filtering and matrix factorization.
• Open-Source Libraries: Scikit-learn, Weka, and others made ML
widely accessible.

6. 2010s: Deep Learning & Modern AI Boom

• Deep Learning Renaissance:

o Alex Net (2012): Deep Convolutional Neural Network
won ImageNet competition, marking a breakthrough in
computer vision.
o RNNs/LSTMs: Excelled in sequence tasks like speech and
language.
• Hardware Boost: GPUs massively accelerated ML training.
• Frameworks: TensorFlow, PyTorch, and Keras enabled easy
deep learning development.

7. 2020s–Present: Foundation Models & Generative AI

• Transformers (2017): Revolutionized NLP (e.g., BERT, GPT,

T5).
• Large Language Models (LLMs): GPT-3, GPT-4, and others
showcased capabilities in text, code, and multimodal tasks.
• Generative AI: Tools like ChatGPT, DALL·E, and Sora
introduced text-to-image, text-to-video, and general AI
assistance.
• AutoML and Federated Learning: Focus on automating ML
workflows and ensuring privacy-preserving training.

3. Explain with examples how Machine Learning has transformed

various industries.

Ans: - Machine Learning (ML) has significantly reshaped many

industries by automating tasks, improving decision-making, enhancing
user experiences, and creating new services. Below are key industries
with real-world examples:

1. Healthcare

Transformations:
• Disease Prediction & Diagnosis: ML models can analyze
symptoms, medical images, and patient records to detect diseases
early.

Examples:

• Google DeepMind: Detects over 50 eye diseases from retinal

scans.
• IBM Watson: Assists in cancer diagnosis and treatment
recommendations.
• Wearable Devices: Fitbit and Apple Watch use ML to monitor
heart rate, sleep patterns, and detect abnormalities.

2. Retail & E-commerce

Transformations:

• Personalized shopping experiences, dynamic pricing, demand

forecasting, and inventory optimization.

Examples:

• Amazon & Flipkart: Use ML to recommend products based on

browsing and purchase history.
• Walmart: Uses ML for stock management and sales forecasting.
• Chatbots: AI-driven assistants help customers with orders and
support (e.g., H&M’s shopping assistant).

3. Finance & Banking

Transformations:

• Fraud detection, risk assessment, algorithmic trading, and

personalized financial services.

Examples:

• PayPal & Mastercard: Use ML to detect suspicious transactions

in real time.
• Robo-Advisors (e.g., Betterment): Automatically investing
money based on financial goals.
• Credit Scoring: ML models evaluate alternative data (e.g.,
spending behavior) to assess creditworthiness.

4. Transportation & Autonomous Vehicles

Transformations:

• Traffic prediction, route optimization, autonomous driving.

Examples:

• Google Maps & Waze: Predict traffic patterns and suggest fastest
routes.
• Tesla, Waymo: Use ML-powered sensors and vision systems to
enable self-driving cars.
• Uber: Uses ML for dynamic pricing, ETA estimation, and
demand forecasting.

5. Entertainment & Media

Transformations:

• Personalized content, recommendation systems, content creation.

Examples:

• Netflix & YouTube: Recommend shows/videos based on

viewing behavior.
• Spotify: Curates custom playlists (e.g., Discover Weekly) using
user preferences and ML.
• AI-Generated Content: Tools like Sora (OpenAI) and DALL·E
generate video and visual content.

6. Manufacturing & Industry 4.0

Transformations:

• Predictive maintenance, quality control, automation of

inspection.

Examples:
• Siemens & GE: Use ML to monitor equipment and predict
failures before they occur.

• Smart Factories: Use sensors and ML to optimize production

lines.

7. Education

Transformations:

• Adaptive learning, plagiarism detection, student performance

prediction.

Examples:

• Khan Academy & Coursera: Use ML to personalize learning

paths.
• Turnitin: Detects plagiarism using NLP and ML algorithms.
• AI Tutors: Apps like Duolingo adapt language lessons based on
student progress.

8. Government & Security

Transformations:

• Crime prediction, surveillance, smart cities, citizen services

automation.

Examples:

• Predictive Policing: Analyzes crime data to allocate resources

more effectively.
• Smart Cities: Use ML for traffic control, waste management, and
public safety (e.g., surveillance analytics).
• AI Chatbots: Help citizens access government services
efficiently.

9. Travel & Hospitality

Transformations:

• Dynamic pricing, personalized travel plans, virtual assistants.

Examples:

• Airlines (Delta, Emirates): Use ML to adjust ticket prices and

manage operations.
• Booking.com, Expedia: Recommend destinations and hotels
based on user behavior.
• AI Concierges: Hotel bots assist with check-ins, FAQs, and
services.

4. What are the main types of Machine Learning? Describe each

type with suitable examples.

Ans: - Machine Learning (ML) is broadly categorized into three main

types based on how the model learns from data:

Supervised Learning:

Definition:

In supervised learning, the model is trained on a labeled dataset,

meaning each input has a corresponding correct output. The goal is to
learn a mapping from inputs to output.

Used For:

• Classification (predicting categories)

• Regression (predicting continuous values)

Examples:

• Spam Detection: Email is labeled as "spam" or "not spam".

• House Price Prediction: Features like size, location, etc., are used
to predict price.
• Image Classification: Identifying objects (e.g., cat vs. dog) in
labeled images.

Example Algorithms:

• Linear Regression
• Logistic Regression
• Decision Trees
• Support Vector Machines (SVM)
• k-Nearest Neighbors (k-NN)

2. Unsupervised Learning:

Definition:

In unsupervised learning, the model is trained on unlabeled data. It

identifies hidden patterns or structures in the data without predefined
outputs.

Used For:

• Clustering
• Dimensionality Reduction
• Anomaly Detection

Examples:

• Customer Segmentation: Grouping customers by buying

behavior without predefined labels.
• Market Basket Analysis: Finding associations between products
(e.g., "people who buy bread also buy butter").
• Anomaly Detection: Detecting unusual transactions in banking.

Example Algorithms:

• k-Means Clustering
• Hierarchical Clustering
• Principal Component Analysis (PCA)
• DBSCAN

3. Reinforcement Learning:

Definition:

Reinforcement learning (RL) involves an agent that learns by

interacting with an environment. The agent receives rewards or
penalties and learns to make decisions that maximize cumulative reward
over time.

Used For:

• Decision-making in dynamic environments

Examples:

• Game Playing: AlphaGo learning to play Go by trial and error.

• Robotics: Robots learning to walk or grasp objects.
• Self-driving Cars: Learning to navigate roads by rewards (e.g.,
staying on track, avoiding collisions).

Key Concepts:

• Agent, Environment, Actions, Rewards

• Policy, Value Function

Popular Algorithms:

• Q-Learning
• Deep Q Networks (DQN)
• Proximal Policy Optimization (PPO)

5. Compare and contrast Supervised and Unsupervised Learning

in terms of data, algorithms, and applications.

Ans: - Supervised and Unsupervised Learning are two foundational

types of machine learning. Here's a detailed comparison in terms of data,
algorithms, and applications:

1. Data Requirements

Feature Supervised Learning Unsupervised Learning

Data Type Labeled data (input + Unlabeled data (only

output) input)
Label Requires labeled output No labels provided
Availability for every input
Example {Image: Dog, Label: {Image:?}, system finds
Data Dog} patterns on its own

2. Algorithms Used

Feature Supervised Learning Unsupervised Learning

Typical -Linear Regression -k-Means Clustering
Algorithms -Logistic Regression - Hierarchical Clustering
-Decision Trees -PCA
-Support Vector - DBSCAN
Machines
- k-NN
Training Learns a mapping Learns hidden patterns or
Process function from input to structure in the data
output
Goal Predict output labels or Discover structure,
values groupings, or features

3. Applications

Feature Supervised Learning Unsupervised

Learning
Use Cases -Spam detection - Customer segmentation
- Email classification - Market basket analysis
-Fraud detection - Anomaly detection
- Disease prediction - Image compression
- Price forecasting
Output Predictive Descriptive (clustering,
Type (classification/regression) dimensionality
reduction)

4. Learning Behavior

Feature Supervised Learning Unsupervised

Learning
Dependency Learns based on Learns by discovering
provided correct answers patterns on its own
Evaluation Accuracy, Precision, Harder to evaluate; uses
Recall (based on known metrics like silhouette
labels) score, cohesion

6. What is Reinforcement Learning? Explain its working with a

real-world scenario.
Ans: - Reinforcement Learning (RL) is a type of machine learning
where an agent learns to make decisions by performing actions in an
environment, receiving rewards or penalties as feedback. The goal is to
learn an optimal policy (strategy) that maximizes cumulative reward
over time.

How Reinforcement Learning Works:

The learning process in RL is based on trial and error. The agent:

1. Observes the current state of the environment.

2. Takes an action based on a policy.
3. Receives a reward (positive or negative) from the environment.
4. Moves to a new state.
5. Updates its strategy based on the reward and new state.

Key Components of RL:

Component Description
Agent The learner/decision-maker
Environment The world with which the agent interacts
State (S) A specific situation in the environment
Action (A) A set of choices available to the agent
Reward (R) Feedback signal for an action taken
The strategy that defines the agent’s actions in each
Policy (π)
state
Value Predicts the long-term return of a state or state-action
Function pair

Objective:

To maximize the cumulative reward (also called return) overtime.

Real-World Scenario: Self-Driving Car

Let’s apply RL to a self-driving car navigating through a city:

RL
Example for Self-Driving Car
Component
Agent The car’s AI system
RL
Example for Self-Driving Car
Component
Environment The road, traffic, pedestrians, and traffic lights
State (S) Current location, speed, nearby cars, traffic signals
Action (A) Accelerate, brake, turn left/right, stop
+1 for staying in the lane, -10 for collision, +5 for
Reward (R)
reaching destination safely
Policy (π) Strategy that decides the car’s actions at every state

The car tries different actions, learns from outcomes (rewards or

penalties), and improves its driving strategy over time.

Popular Algorithms in RL:

• Q-Learning
• Deep Q-Networks (DQN)
• SARSA
• Policy Gradient Methods
• Proximal Policy Optimization (PPO)

Where is RL Used?

• Game AI: AlphaGo, Dota 2 bots, Chess engines.

• Robotics: Teaching robots to walk, pick objects.
• Autonomous Vehicles: Navigation, path planning.
• Finance: Dynamic portfolio management.
• Industrial Automation: Smart energy systems, manufacturing
control.

7.Define Active Learning. How does it improve the performance of

a learning system compared to traditional methods?

Ans: - Active Learning is a machine learning approach where the model

actively selects the most informative data points from an unlabeled
dataset to be labeled by an oracle (usually a human expert).

Instead of passively learning from a large amount of labeled data, the

model asks for labels only for the most valuable examples, aiming to
achieve higher performance with fewer labeled instances.

Why Active Learning?

• In many real-world applications (like medical imaging or legal
documents), labeling data is expensive or time-consuming.
• Active learning minimizes labeling effort while still building an
accurate model.

How Active Learning Works (Steps):

1. Start with a small, labeled dataset and a large pool of unlabeled

data.
2. Train a model on the initial labeled data.
3. Select informative samples from the unlabeled pool (those about
which the model is most uncertain).
4. Query a human (oracle) to label these selected examples.
5. Add the new labels to the training set.
6. Retrain the model and repeat the process.

Query Strategies (How to Choose Data):

• Uncertainty Sampling: Choose data where the model is least

confident (e.g., probabilities near 0.5 in binary classification).
• Query by Committee: Use multiple models and pick samples they
disagree on.
• Expected Model Change: Choose samples expected to cause the
largest change in the model if labeled.

How Active Learning Improves Performance:

Feature Traditional Learning Active Learning

Labeling Labels large dataset
Labels only selected,
Effort upfront informative samples
May need thousands of Can reach similar accuracy
Efficiency
labeled examples with fewer labels
High, especially when Reduced, since fewer labels
Cost
experts are needed are required
Learning Faster improvement in
Slower, more data needed
Speed model performance

Real-World Example: Medical Image Classification

• Problem: Labeling MRI scans for tumor detection requires expert

radiologists.
• Traditional Approach: Label thousands of images, even if many
are easy or redundant.
• Active Learning Approach: The model asks the radiologist to
label only uncertain or edge-case images, leading to faster model
improvement with fewer expert hours.

Applications of Active Learning:

• Medical diagnostics
• Text classification (e.g., sentiment analysis)
• Fraud detection
• Speech recognition
• Image recognition (especially rare classes)

8. Explain the steps involved in a typical Machine Learning

workflow. Illustrate with a flow diagram.

Ans: - A Machine Learning (ML) workflow is a structured process used

to build, train, evaluate, and deploy ML models. Following a systematic
workflow ensures better model performance and reproducibility.

Typical ML Workflow Steps

1. Problem Definition

• Understand the business or research problem.

• Decide whether it’s a classification, regression, clustering, etc.

2. Data Collection

• Gather data from databases, APIs, sensors, or user inputs.

• Ensure it's relevant, sufficient, and representative.

3. Data Preprocessing

• Clean missing or inconsistent values.

• Encode categorical data.
• Normalize/Scale numerical data.
• Split into training and testing sets.

4. Exploratory Data Analysis (EDA)

• Use statistics and visualization to understand patterns,
distributions, and relationships.
• Identify outliers and correlations.

5. Feature Engineering

• Select or create new features that improve model performance.

• Reduce dimensionality if needed (e.g., PCA).

6. Model Selection

• Choose a suitable algorithm based on the problem type and data

characteristics (e.g., decision tree, SVM, neural network).

7. Model Training

• Train the model using the training dataset.

• Optimize the model parameters.

8. Model Evaluation

• Test the model using the test data.

• Use metrics like accuracy, precision, recall, F1-score, or RMSE
depending on the task.

9. Model Tuning (Hyperparameter Optimization)

• Use techniques like Grid Search or Random Search to find the

best hyperparameters.

10. Model Deployment

• Integrate the trained model into a production environment or

application.
• Ensure it’s scalable and monitored.

11. Monitoring & Maintenance

• Continuously track model performance over time.

• Retrain the model as new data becomes available (due to concept
drift).

Machine Learning Workflow Diagram (Textual Representation)

Fig: - Flowchart

9.Describe the importance of problem definition in a Machine

Learning project.

Ans: -The problem definition is the first and most critical step in any
Machine Learning (ML) project. A poorly defined problem can lead to
wasted time, irrelevant models, and incorrect conclusions — even if
the technical implementation is perfect.

Why is Problem Definition So Important?

1. Guides to the Entire ML Process

• It determines what kind of data to collect, what model to choose,

and how to evaluate success.
• Example: Is the goal to predict a number (regression) or classify
into categories (classification)?
2. Ensures Alignment with Business Goals

• ML should solve a real problem, not just be a technical

experiment.
• Well-defined problems translate business needs into ML tasks.
• Example: Instead of saying "Use ML in healthcare," say "Predict
the risk of heart disease using patient records."

3. Helps Choose the Right ML Approach

• Supervised vs. unsupervised vs. reinforcement learning depends

entirely on how the problem is framed.
• A mis defined problem might apply the wrong learning type,
leading to poor results.

4. Defines Success Metrics Clearly

• A proper definition helps determine what "success" looks like

(e.g., accuracy, precision, RMSE).
• Without clear goals, it’s hard to know if the model is performing
well.

5. Improves Communication with Stakeholders

• Clearly defined problems make it easier to explain goals and

results to non-technical stakeholders.
• Prevents misalignment between what developers build and what
users need.

10. Discuss the role of data collection and preprocessing in ensuring

the success of a Machine Learning model.

Ans: - The success of a Machine Learning (ML) model heavily depends

not just on the algorithm, but more crucially on the quality of the data
it's trained on. Proper data collection and preprocessing are foundational
steps that ensure the model can learn accurately and generalize well.

1. Role of Data Collection

Why it’s Important:

• Machine learning models learn from data. Poor or insufficient
data = poor model performance.
• Good data captures the real-world patterns the model is expected
to learn.

Key Considerations:

Aspect Importance
Quantity of More data can improve model generalization and
Data reduce overfitting.
Quality of Clean, relevant, and accurate data improves reliability
Data of predictions.
Data must cover all possible cases to avoid bias and
Diversity
ensure fairness.
Label In supervised learning, incorrect labels will misguide
Accuracy the model.
Data should be up-to-date, especially for dynamic
Timeliness
environments like stock markets.

2. Role of Data Preprocessing

Preprocessing prepares raw data into a clean, structured format suitable

for modeling. It helps eliminate noise, handle missing values, and
transform data for better learning.

Key Preprocessing Steps:

Step Description
Fix or remove incorrect, missing, or duplicate
Data Cleaning
data.
Handling Missing Options include deletion, mean/median
Values imputation, or using model-based estimators.
Normalize or scale features, especially for
Data Transformation
algorithms sensitive to feature magnitude.
Encoding Convert categories into numerical form using
Categorical Data techniques like one-hot or label encoding.
Identify and handle anomalies that can skew
Outlier Detection
model training.
Step Description
Creating new useful features from existing ones
Feature Engineering
(e.g., combining age and income).
Divide data into training, validation, and test sets
Data Splitting
to evaluate performance fairly.

11.How do you select an appropriate model for a given ML

problem? What factors influence model selection?

Ans: -Choosing the right machine learning model is crucial for

achieving good performance. The selection depends on the problem
type, data characteristics, and project constraints.

Steps to Select an Appropriate ML Model:

Understanding and the Problem Type

2. Analyze the Data

Data Feature Influence on Model Selection
Small: simpler models (e.g., linear regression)
Data Size
Large: complex models (e.g., neural networks)
Data High-dimensional data favors models like SVM or
Dimensionality regularized regression
Some models (e.g., XGBoost) handle missing
Missing Values
values better
Categorical: Decision Trees, Naive Bayes
Feature Types
Numerical: Linear Models, SVM
If data shows linear patterns → linear models;
Linearity
otherwise → non-linear models

3. Evaluate Model Complexity vs. Interpretability

Consideration Simple Models Complex Models

Linear Regression, Neural Networks,
Examples
Decision Trees Ensembles (XGBoost)
Slow to train but more
Speed Fast to train and interpret
flexible
Interpretability Easy to explain Often "black box"
Medical, finance, law Image, speech, and NLP
Use Case
(explainable AI needed) tasks

4. Check for Overfitting Risk

• Use cross-validation to evaluate how the model generalizes.

• Prefer regularized models or ensemble methods (like Random
Forest, XGBoost) if overfitting is a concern.

5. Consider Computational Resources

Resource Availability Suitable Models

Limited hardware Logistic regression, Naive Bayes
GPU or large RAM Deep learning models, large ensembles
Real-time predictions Fast models like decision trees or
needed lightweight neural nets

6. Use Automated Tools for Assistance (Optional)

• Tools like AutoML, TPOT, or H2O.ai can suggest the best
models automatically based on the dataset.
• Great for rapid prototyping or non-expert users.

12.Explain different techniques used for model evaluation and

validation. Why is cross-validation important?

Ans: - Evaluating a machine learning model is essential to ensure it

performs well on unseen data and not just on the training dataset.
Several techniques are used to measure a model’s accuracy,
generalizability, and robustness.

A. Common Model Evaluation Techniques

1. Train/Test Split

• Split the dataset into two parts:

o Training Set (e.g., 70–80%): Used to train the model.
o Test Set (e.g., 20–30%): Used to evaluate the model.
• Limitation: May lead to biased results if the test set isn’t
representative.

2. K-Fold Cross-Validation

• Data is split into K equal parts (folds).

• The model is trained on K-1 folds and tested on the remaining
fold.
• Repeat this process K times, each time using a different fold as
the test set.
• Final performance = average of all K runs.

Term Meaning
K = 5 or 10 Common choices for balanced evaluation
Stratified K- Ensures class balance in each fold (for classification
Fold tasks)

Why Important?

• Reduces variance due to randomness in data splitting.

• Provides a more reliable performance estimate.
3. Leave-One-Out Cross Validation (LOOCV)

• Special case of K-Fold where K = number of data points.

• Train on all data except one sample, test on that one sample —
repeat for all data points.
• Very accurate, but computationally expensive for large datasets.

4. Hold-Out Validation (Three-Way Split)

• Split the dataset into:

o Training Set
o Validation Set (for model tuning)
o Test Set (final unbiased evaluation)
• Ensures that hyperparameter tuning doesn't leak information
into the test set.

B. Evaluation Metrics

1. For Classification Models

Metric Description Use When?

Balanced class
Accuracy % of correct predictions
distribution
TP / (TP + FP) – how many When false
Precision
predicted positives were correct positives are costly
TP / (TP + FN) – how many When false
Recall
actual positives were identified negatives are costly
Harmonic mean of precision and
F1-Score Imbalanced classes
recall
Confusion Detailed error
Shows TP, FP, FN, TN
Matrix analysis
Area under the ROC curve (true
ROC-AUC Binary classifiers
pos. vs false pos. rates)

2. For Regression Models

Metric Description
Mean Squared Error (MSE) Penalizes large errors more
Metric Description
Root Mean Squared Error Square root of MSE for
(RMSE) interpretability
Mean Absolute Error (MAE) Measures average absolute error
R² Score (Coefficient of Proportion of variance explained by
Determination) the model

Why is Cross Validation Important?

Benefit Explanation
Better Generalization Tests model performance on multiple subsets
Estimate of data.
Reduces Overfitting
Prevents tuning to a specific split of data.
Risk
Helps compare multiple
Model Selection Aid
models/hyperparameters more fairly.
Makes full use of the data, especially
Efficient Data Use
important when datasets are small.

13. What is model deployment? Discuss the challenges faced

during the deployment of ML models in real-time systems.

Ans: -Model deployment is the process of integrating a trained

machine learning model into a real-world production environment,
where it can make predictions on live (unseen) data and provide value
to users or systems.

In simple terms: It’s taking the ML model out of the lab (Juypiter
Notebook) and putting it into action (e.g., websites, apps, APIs,
dashboards).

Model Deployment Pipeline (Overview):

1. Model Training – Train and validate model offline.

2. Serialization – Save model using formats like. Pkl,.
joblib, .onnx.
3. API Development – Wrap model into a REST API using Flask,
Fast API, etc.
4. Integration – Embed API into a web, mobile, or backend
system.
5. Monitoring – Track model accuracy, latency, and user feedback.
6. Maintenance – Retrain or update the model as needed.

Examples of Model Deployment:

Use Case Deployment Target

Spam filter in Gmail Backend server (real-time)
Product recommendation on Website backend (batch or real-
Amazon time)
Face recognition on smartphones On-device (edge deployment)
Stock price prediction system Financial dashboards/API

Challenges in Deploying ML Models

Despite successful training, real-time deployment introduces several

technical, operational, and business-level challenges:

1. Data Drift & Concept Drift

• Data Drift: Input data changes over time (e.g., user behavior
shifts).
• Concept Drift: The relationship between inputs and outputs
changes.
• Leads to degraded model performance over time.
• Solution: Continuous monitoring and retraining.

2. Model Performance in Production

• Models might perform well in development but poorly in

production due to:
o Noisy/incomplete live data
o Latency requirements
o Distribution mismatch

3. Integration with Existing Systems

• Challenge: Integrating ML APIs with legacy systems (written in

different languages or frameworks).
• Solution: Use platform-agnostic APIs, containers (e.g., Docker),
or ML pipelines.

4. Infrastructure & Scalability

• Issues: High traffic, concurrent users, limited

memory/CPU/GPU resources.
• Solution: Use cloud platforms (AWS Sage maker, GCP AI
Platform, Azure ML), load balancing, and serverless
deployment.

5. Security & Privacy

• Data sent to the model (especially personal data) must be

encrypted and handled securely.
• Challenge: Deploying models without exposing sensitive
business logic or user data.

6. Versioning & Model Management

• Which model is running? Can it be rolled back?

• Need for tools like MLflow, DVC, Kubeflow, or Model
Registry.

7. Monitoring & Logging

• Essential to track:
o Response time
o Accuracy drift
o System errors
• Without monitoring, it's hard to know when to retrain or update.

8. Collaboration & Ownership

• Developers, data scientists, and DevOps teams must work

together.
• Miscommunication or unclear ownership often delays
deployment.

14.Discuss various data quality issues in Machine Learning. How

do they affect model performance?
Ans:- In Machine Learning, the quality of data directly determines the
quality of the model. The phrase "garbage in, garbage out" applies
strongly poor data quality leads to poor model predictions, no matter
how advanced the algorithm.

Common Data Quality Issues & Their Impacts

1. Missing Values

Cause: Incomplete data collection, data corruption, human error.

Impact:

• Models might fail to train or predict.

• Introduces bias if missingness is systematic.

Handling:

• Imputation (mean, median, mode)

• Deletion (if missing values are minimal)
• Predictive filling using ML models

2. Noisy Data

Cause: Measurement errors, random variations, external disturbances.

Impact:

• Leads to poor model generalization.

• Increases overfitting risk.

Handling:

• Smoothing techniques
• Outlier detection and removal
• Data filtering

3. Inconsistent Data

Cause: Different formats, duplicate entries, or conflicting values.

Impact:

• Confuses the model, reduces accuracy.

• Errors during preprocessing and feature engineering.
Handling:

• Standardize units, formats, and naming conventions.

• Deduplicate records using rules or fuzzy matching.

4. Duplicate or Redundant Data

Cause: Merged datasets, repeated entries, backup files.

Impact:

• Biased model training (same data multiple times).

• Inflated importance of repeated patterns.

Handling:

• Remove duplicates using hash checks or attribute comparisons.

5. Imbalanced Data

Cause: Uneven distribution of classes (e.g., 95% no fraud, 5% fraud).

Impact:

• Model becomes biased toward majority class.

• Poor recall/precision for the minority class.

Handling:

• Resampling techniques (SMOTE, oversampling, undersampling)

• Cost-sensitive learning
• Use appropriate metrics (F1-score, AUC)

6. Irrelevant or Redundant Features

Cause: Including features not useful for prediction (e.g., user ID).
Impact:

• Increases noise, reduces model accuracy.

• Training unnecessarily slows down.

Handling:

• Feature selection techniques

• Dimensionality reduction (e.g., PCA)
7. Outliers

Cause: Data entry errors, fraud, rare events.

Impact:

• Learning Skews, especially for linear models.

• May mislead clustering or regression algorithms.

Handling:

• Use boxplots or Z-scores to detect and manage outliers.

• Decide whether to remove, cap, or analyze separately.

8. Data Leakage

Cause: Including information in the training data that won’t be available

during prediction.
Impact:

• Unrealistically high accuracy in training/testing.

• Poor performance in real-world deployment.

Handling:

• Carefully review feature sources.

• Ensure strict separation of training and test data.

Impact on Model Performance:

Data Issue Consequence on Model

Model may ignore features or make inaccurate
Missing Data
predictions
Noise Overfitting, instability
Inconsistency Misclassification or poor learning of patterns
Imbalanced
High overall accuracy but poor real-world usability
Classes
Increases complexity without benefit, lowers
Irrelevant Features
accuracy
Outliers Model biases or fails to generalize
Data Issue Consequence on Model
False confidence during validation, fails in
Data Leakage
production

15.Explain computational complexity in the context of ML

algorithms. Why is it an important consideration?

Ans:- Computational complexity refers to the amount of resources (like

time and memory) required by a machine learning algorithm to learn
from data or make predictions.

There are two main types:

• Time Complexity: How long an algorithm takes to run as the

input size grows.
• Space Complexity: How much memory an algorithm uses.

It is often expressed using Big O notation (e.g., O(n), O(n²), O(log n)),
where n represents the input size (like number of samples or features).

Why is Computational Complexity Important in ML?

Factor Importance
Determines how well the algorithm
Scalability
handles large datasets.
Affects hardware costs (CPU, GPU, RAM)
Resource Usage
and training time.
Critical in time-sensitive applications (e.g.,
Real-Time Processing
fraud detection, robotics).
Helps choose the right algorithm for the
Model Selection
available data and infrastructure.
Feasibility of More complex algorithms are slower to
Hyperparameter Tuning optimize and validate.

16.What is the importance of interpretability and explainability in

ML models? Give examples where these are critical.

Ans:- Definition:
• Interpretability: The extent to which a human can understand the
internal mechanics of a machine learning model.
• Explainability: The ability to describe why a model made a
specific decision or prediction in human-understandable terms.

Both are essential for building trustworthy, transparent, and accountable

ML systems — especially in high-stakes domains.

Why Interpretability & Explainability Matter:

1. Trust & Transparency

• Stakeholders (users, regulators, executives) need to trust the

model’s decisions.
• Example: A bank customer denied a loan should be able to
understand why.

2. Debugging & Model Improvement

• Helps data scientists identify model biases, feature importance,

or overfitting.
• Example: If the model heavily relies on a non-relevant feature, it
can be corrected.

3. Compliance with Regulations

• Laws like GDPR, HIPAA, and the proposed EU AI Act require

explainable AI, especially in finance, health, and legal domains.

4. Fairness and Bias Detection

• Explainability allows stakeholders to detect discrimination or

bias.
• Example: If a hiring algorithm favors one gender or race,
explainable methods can reveal it.

5. User Acceptance

• Users are more likely to adopt AI systems when they understand

and agree with decisions.
• Example: Doctors need to trust and understand medical diagnosis
predictions from an AI tool.
Critical Use Case Examples:

Domain Why Explainability is Critical

Doctors need to understand AI predictions for
Healthcare
diagnoses/treatment.
Loan approvals, credit scoring — must be
Finance
transparent and justifiable.
AI used in sentencing, parole, or legal advice must
Legal
be auditable.
Fairness in recruitment — avoid gender or ethnic
Hiring
bias.
Autonomous Understand why a vehicle made a decision in case
Driving of accidents.

17.List and explain some ethical issues in Machine Learning. How

can these be addressed in practice?

Ans:- Machine Learning (ML) offers powerful tools to solve real-

world problems, but it also raises ethical concerns when misused or
poorly designed. Ethics in ML ensures that models are fair,
accountable, transparent, and respect human rights.

Key Ethical Issues in Machine Learning

1. Bias and Discrimination

• Cause: Biased or unbalanced training data.

• Impact: Models may unfairly favor or discriminate against
certain groups (e.g., race, gender, age).
• Example: A hiring algorithm rejecting female candidates more
frequently.

Solution:

• Use fairness-aware algorithms.

• Audit datasets for bias and ensure diversity.
• Apply reweighing or resampling techniques.
• Use tools like IBM’s AI Fairness 360 or Google’s What-If Tool.

2. Lack of Transparency (Black Box Models)

• Complex models like deep learning are hard to interpret.
• Impact: Users and regulators can’t understand or trust decisions.

Solution:

• Use explainable AI (XAI) techniques like LIME, SHAP, or

transparent models where possible.
• Provide clear documentation and model decision rationale.

3. Privacy Violations

• Cause: Collecting or using personal data without consent.

• Impact: Data misuse, identity theft, and loss of trust.

Solution:

• Comply with privacy regulations like GDPR, HIPAA.

• Apply data anonymization or differential privacy.
• Use federated learning to train models without moving personal
data.

4. Surveillance and Misuse of ML

• ML used for mass surveillance, facial recognition, or social

scoring can infringe on civil liberties.
• Example: Governments use ML to monitor or profile citizens.

Solution:

• Establish ethical boundaries and legal limits on use cases.

• Implement policy review boards or AI ethics committees.

5. Job Displacement & Automation

• ML and AI are replacing human jobs in many sectors.

• Impact: Economic inequality and social unrest.

Solution:

• Promote retraining and upskilling programs.

• Design AI to augment human labor, not just replace it.
• Encourage inclusive innovation policies.
6. Data Ownership and Consent

• Users often don’t know how their data is being used or

monetized.

Solution:

• Ensure informed consent for data use.

• Adopt transparent data policies.
• Allow users to opt-out or delete their data.

7. Safety and Accountability

• Who is responsible if an ML system causes harm (e.g., self-

driving car accidents)?

Solution:

• Clearly define accountability frameworks.

• Maintain logs and audit trails for model decisions.
• Perform robust testing before deployment.

Best Practices to Address Ethical Issues

Practice Description
Embed ethical considerations from the
Ethics by Design
design phase, not as an afterthought.
Include ethicists, sociologists, and legal
Interdisciplinary Teams
experts in AI development.
Transparency & Maintain clear records of data, decisions,
Documentation and assumptions.
Bias Testing & Fairness Use tools to evaluate and mitigate bias in
Audits models.
Continuously check models for unintended
Ongoing Monitoring
behavior after deployment.
Thank You!

AI Introduction by Ahmed Banafa
100% (1)
AI Introduction by Ahmed Banafa
76 pages
Limitless Expanded Workbook
100% (1)
Limitless Expanded Workbook
101 pages
Full Employee Training and Development, 9e ISE Raymond Noe PDF All Chapters
No ratings yet
Full Employee Training and Development, 9e ISE Raymond Noe PDF All Chapters
40 pages
Unit 1 ML R23
No ratings yet
Unit 1 ML R23
32 pages
MACHINE LEARNING R23 Material
100% (11)
MACHINE LEARNING R23 Material
32 pages
Intro To ML - 1
No ratings yet
Intro To ML - 1
29 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Noteartificial Intelligence
No ratings yet
Noteartificial Intelligence
23 pages
Machine Learning Insights
No ratings yet
Machine Learning Insights
18 pages
Machine Learning Foundations - Overview
100% (1)
Machine Learning Foundations - Overview
24 pages
ML Mid - Google Docs Ass 1
No ratings yet
ML Mid - Google Docs Ass 1
5 pages
Machine Learning Foundations - Overview
No ratings yet
Machine Learning Foundations - Overview
10 pages
CLASS NOTES Unit 1 ML Material
No ratings yet
CLASS NOTES Unit 1 ML Material
42 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Nascom
No ratings yet
Nascom
204 pages
ML R23 Material
No ratings yet
ML R23 Material
79 pages
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
No ratings yet
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
233 pages
Unit 1 Introduction To AI
No ratings yet
Unit 1 Introduction To AI
57 pages
Deep Learning Exam Guide
No ratings yet
Deep Learning Exam Guide
19 pages
Module 1
No ratings yet
Module 1
46 pages
Code Planet. Machine Learning With Python. A Comprehensive Guide... 2025
No ratings yet
Code Planet. Machine Learning With Python. A Comprehensive Guide... 2025
231 pages
Unit 1 ML
No ratings yet
Unit 1 ML
14 pages
Assignment-1 ML Solution by Loknath Regmi
No ratings yet
Assignment-1 ML Solution by Loknath Regmi
41 pages
Assignment-1 ML Solution by Loknath Regmi
No ratings yet
Assignment-1 ML Solution by Loknath Regmi
41 pages
? Understanding Machine Learning
No ratings yet
? Understanding Machine Learning
3 pages
ML Unit 1
No ratings yet
ML Unit 1
37 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
29 pages
MLDAP Module1
No ratings yet
MLDAP Module1
43 pages
Comparative Analysis of RAG Fine-Tuning and Prompt Engineering in Chatbot Development
No ratings yet
Comparative Analysis of RAG Fine-Tuning and Prompt Engineering in Chatbot Development
4 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Class Notes ML 1
No ratings yet
Class Notes ML 1
108 pages
Module - 1
No ratings yet
Module - 1
132 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
5 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
30 pages
SK Sahidur Rahaman Bba504a 2024
No ratings yet
SK Sahidur Rahaman Bba504a 2024
9 pages
Machine Learning New
No ratings yet
Machine Learning New
41 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
ML Notes
No ratings yet
ML Notes
101 pages
ML Chatgpt
No ratings yet
ML Chatgpt
6 pages
ML Report
No ratings yet
ML Report
19 pages
What Is Machine Learning?: Approaches
No ratings yet
What Is Machine Learning?: Approaches
5 pages
OOP Assignments for AI & DS Semester 1
No ratings yet
OOP Assignments for AI & DS Semester 1
6 pages
Machine Learning 1
No ratings yet
Machine Learning 1
9 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
14 pages
Machine Learning 2
No ratings yet
Machine Learning 2
7 pages
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
No ratings yet
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
40 pages
AI Unit 4
No ratings yet
AI Unit 4
22 pages
Unit 2 AIML
No ratings yet
Unit 2 AIML
28 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Machine Learning Overview & Types
No ratings yet
Machine Learning Overview & Types
25 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
5 pages
Paper 3
No ratings yet
Paper 3
2 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Reviewer On AI and Machine Learning
No ratings yet
Reviewer On AI and Machine Learning
5 pages
Question Bank
No ratings yet
Question Bank
4 pages
Artificial Intelligence (AI) by CHATGPT
No ratings yet
Artificial Intelligence (AI) by CHATGPT
39 pages
Fo YCFm 3 WFN
No ratings yet
Fo YCFm 3 WFN
6 pages
Machine Learning in Unit-1
No ratings yet
Machine Learning in Unit-1
10 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Specification of Training Tools and Methods - Aeronautical Information Services
100% (1)
Specification of Training Tools and Methods - Aeronautical Information Services
190 pages
Karthik
No ratings yet
Karthik
10 pages
Unit 2 Es
No ratings yet
Unit 2 Es
9 pages
Top 10 Strategic Technology Trends For 2020
100% (1)
Top 10 Strategic Technology Trends For 2020
52 pages
Final
No ratings yet
Final
15 pages
Astronomy - October 2016
50% (2)
Astronomy - October 2016
80 pages
ML Sessional - I Ans
No ratings yet
ML Sessional - I Ans
18 pages
Reasearch 5
No ratings yet
Reasearch 5
5 pages
Statement of Purpose Galway
100% (2)
Statement of Purpose Galway
2 pages
Government Developmentin Developing
No ratings yet
Government Developmentin Developing
13 pages
CDS2002 Outline 2025
No ratings yet
CDS2002 Outline 2025
3 pages
18 Deeprl
No ratings yet
18 Deeprl
19 pages
PMM Unit1
No ratings yet
PMM Unit1
5 pages
Edc Notes
No ratings yet
Edc Notes
144 pages
Nefti-Meziani Et Al. (2023) - UK Cross-Sector RAS Development Task Force.
No ratings yet
Nefti-Meziani Et Al. (2023) - UK Cross-Sector RAS Development Task Force.
40 pages
Mapping of Human Brain
No ratings yet
Mapping of Human Brain
31 pages
Introduction To ImageNet Competition
No ratings yet
Introduction To ImageNet Competition
10 pages
Data Communication
No ratings yet
Data Communication
126 pages
Name: Hồ Viết Vĩnh Class: 2C22CACN Word count (without bibliography or subheadings) : 666 Literature Review Security Challenges in Artificial Intelligence (AI)
No ratings yet
Name: Hồ Viết Vĩnh Class: 2C22CACN Word count (without bibliography or subheadings) : 666 Literature Review Security Challenges in Artificial Intelligence (AI)
3 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
2 pages
Pestle Analysis
No ratings yet
Pestle Analysis
10 pages
Dynamic Planning With A LLM
No ratings yet
Dynamic Planning With A LLM
9 pages
Writing Task 1 and 2
No ratings yet
Writing Task 1 and 2
19 pages
Engineerinng Management Presentation Group-Bidya and Team
No ratings yet
Engineerinng Management Presentation Group-Bidya and Team
13 pages
Final Research Proposal
No ratings yet
Final Research Proposal
11 pages
Proposal
No ratings yet
Proposal
4 pages
Analysis of Machine Learning and Deep Learning Techniques For Prediction of Psychiatric Disorders Using EEG Datasets
No ratings yet
Analysis of Machine Learning and Deep Learning Techniques For Prediction of Psychiatric Disorders Using EEG Datasets
9 pages
Integration of Digitalization Into Occupational He
No ratings yet
Integration of Digitalization Into Occupational He
11 pages
Lex Eloquentia - 2
No ratings yet
Lex Eloquentia - 2
24 pages
104 Business Communications - 5 - 1733739096841
No ratings yet
104 Business Communications - 5 - 1733739096841
11 pages
C 202311281056454144
No ratings yet
C 202311281056454144
28 pages
AI Cheatsheet Withlinks Compressed
No ratings yet
AI Cheatsheet Withlinks Compressed
15 pages
Resume Shyamak Purohit
No ratings yet
Resume Shyamak Purohit
1 page
Certification Page
No ratings yet
Certification Page
1 page
Lab 07
No ratings yet
Lab 07
2 pages