0% found this document useful (0 votes)

56 views14 pages

Fam Question Bank CT

FAM

Uploaded by

himanshuahirrao456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views14 pages

Fam Question Bank CT

FAM

Uploaded by

himanshuahirrao456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

FAM QUESTION BANK CT-2

2 MARKS QUESTION
1. Define
i)data mining:
• It is the process of discovering pattern, relationship and useful information
from large dataset.
• It involves custering, classification, associated role mining and anomly
detection
• Data mining is a form of data analysis that focuses on finding valuable insights
within data.
ii)data analytics:
• It is the process of examing, cleaning, transforming and interpreting data to a
meaningful data.
• It combines statistical analysis, data mining, and visualization to inform
decision-making.
• Data analytics is the broader practice of working with data to answer questions
or make informed decisions.

2.Define
i)train data set:
• Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem
• We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns,
rules, and, features.
ii)test data set:
• After training the machine learning model with a specific dataset, the next step
is model testing.
• During this phase, the model's accuracy is assessed by evaluating its
performance on a separate test dataset.
• The test results provide a percentage accuracy measurement tailored to the
project or problem's criteria.
3.State different unsupervised algorithms
K-Means Clustering: Partitions data into K clusters based on feature similarity.
Hierarchical Clustering: Builds a tree-like structure of clusters (can be
divisive or agglomerative).
Principal Component Analysis (PCA): Reduces the dimensionality of data by
transforming it into a set of orthogonal components that capture the most
variance.
Independent Component Analysis (ICA): Separates a multivariate signal into
additive, independent components.
Apriori algorithm: Identifies frequent itemsets and generates association rules,
often used for market basket analysis.

4.State any four important supervise ML algorithms

1. Linear Regression: Used for regression tasks, it models the relationship
between a dependent variable and one or more independent variables by fitting
a linear equation.
2. Logistic Regression: Primarily used for binary classification, it models the
probability that a given instance belongs to a particular class.
3. Decision Trees: Tree-like structures used for classification and regression tasks.
They partition the dataset into smaller subsets based on features.
4. Random Forest: An ensemble method that combines multiple decision trees to
improve accuracy and reduce overfitting
5. Support Vector Machines (SVM): Used for both classification and regression.
SVM tries to find a hyperplane that best separates data points into different
classes.

5.Define mean absolute error

Mean Absolute Error (MAE) is a metric that calculates the average absolute
difference between predicted and actual values in a dataset. It measures the
accuracy of predictions by showing the average error magnitude, regardless of
direction, and is expressed as:
6.Define precision and recall

7.Define terms MSE, RMSE

MSE (Mean Square Error)
• It measures the amount of model in statistical models.
• It access the average square difference between the observed and predicted
values When a model has no error the MSE equals zero As model error
increases it's value increases. The mean square error is also known as the mean
Square derivation (MSD)
RMSE (Root mean square error)
• The root mean square error measure the average difference between a statistical
mode's predict value and the actual values
• Mathematically it is standard derivation of the residuals, residuals represent the
distance between the regression and the data points.
8. Define binary classification and multiclass classification
Binary classification is the simplest form of classification, where the target variable
has only two possible classes or outcomes. For instance, it can be used for tasks like
spam detection (spam or not spam), disease diagnosis (diseased or not diseased), or
customer churn prediction (churn or not churn).
Multiclass classification, also known as multinomial classification, deals with
problems where there are more than two classes or categories to predict. Examples
include image recognition with multiple object classes, text classification with
multiple topics, or sentiment analysis with multiple sentiment labels.

4 MARKS QUESTIONS
1. Explain any types of learning
Supervised learning.
• supervised learning is a machine learning technique that uses labelled dataset to
train algorithms to classify data or predict outcomes.
• The algorithms are trained on input data that has been labelled for a particular
output. The goal is to built an intelligent system that can learn from input
output training samples
• The training dataset is processed to build a function that maps new data an
expected output values the model con measure its accuracy and learn suer time.
unsupervised learning
• unsupervised learning is a technique that uses algorithm to Analyse unlabelled
dataset
• The algorithms discover hidden patterns or data grouping without the need for
human intention.
• The goal of unsupervised learning is to discover hidden and intersecting
patterns in unlabelled data.
• In unsupervised learning the model works on its own to discover patterns and
information the way previously undetected
2. Explain the machine learning life cycle
1.Gathering data
• It is a first step of ML life cycle to identify different data sources as data can be
collected from various sources such as file database, internet or mobile devices
• The quantity and quality of collected data will determine efficiency of output
• Step1: Identify various data sources
• Step 2 collect data
• Step 3: Integrate the data
2.Data preparation
• In data preparation we put data into a suitable place and prepare it to use for
machine learning.
• step 1: Data exploration.
• step 2 : Data processing

3. Data wrangling:
• It is the process of cleaning and converting raw data into usable format It
consist cleaning the data selecting variable's for use and transform the data into
proper format for analysis. cleaning of data includes quality issues.
4.Analyze Data
• clean data and prepare data is passed to analysis phase
• step 1: selecting of analytical technic
• step 2: Build the model
• Step 3. Review the result

5. train data set

• Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem
• We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns,
rules, and, features

6. test data set

• After training the machine learning model with a specific dataset, the next step
is model testing.
• During this phase, the model's accuracy is assessed by evaluating its
performance on a separate test dataset.
• The test results provide a percentage accuracy measurement tailored to the
project or problem's criteria.

7.Deployment
In this phase we deploy model into real world application If the above prepared model
is producing accurate result as per our requirement with acceptable speed the model
deployed or used in real application
3.Describe different metrics for classification

4.Explain any one unsupervised algorithm

K-means clustering is an algorithm that partitions data into K clusters by assigning
each point to the nearest cluster center, then updating the centers iteratively to
minimize the distance between points and their cluster centroids.
The steps of the K-means clustering algorithm are:
1. Choose the number of clusters (K): Define how many clusters you want the data to
be grouped into.
2. Initialize centroids: Randomly select K points from the dataset as the initial cluster
centroids
3. Assign clusters: Assign each data point to the nearest centroid based on the
Euclidean distance (or other distance metrics).
4. Update centroids: Recalculate the centroids as the mean of all data points assigned
to each cluster.
5. Repeat: Iterate steps 3 and 4 until the centroids no longer change significantly
(convergence).
6. Final clusters: Once convergence is reached, the data points are grouped into their
final clusters.
Advantages of K-means clustering:
1. Simple and efficient: K-means is easy to implement and computationally efficient,
making it suitable for large datasets.
2. Scalable: It can handle large amounts of data and works well when clusters are
spherical and evenly sized.
Disadvantage of K-means clustering:
1. Sensitive to initial centroids: K-means can converge to a suboptimal solution if the
initial centroids are poorly chosen, and it may not perform well with non-spherical or
overlapping clusters.

5.Explain the need of data pre-processing

• After gaining insights through data exploration, the next step is data pre-
processing.
• In this phase, the data is cleaned, transformed, and prepared for analysis.
• Pre-processing tasks may include handling missing values, normalizing or
scaling features, encoding categorical variables, and splitting the dataset into
training and testing sets.
• Data pre-processing ensures that the data is in a suitable format for machine
learning algorithms. Effective data preparation is critical for the success of a
machine learning project, as it sets the foundation for model training
and evaluation.
6.Elaborate simple linear regression algorithm

Simple Linear Regression is a type of linear regression algorithm used when there is
only one independent variable (predictor) that is used to predict the value of a
numerical dependent variable. It models the relationship between the independent
variable and the dependent variable as a linear equation, typically represented as y =
mx + b, where "y" is the dependent variable, "x" is the independent variable, "m" is
the slope, and "b" is the intercept.
Finding the best fit line:
• In linear regression, our primary objective involves discovering the optimal fit
line, where the aim is to minimize the error between predicted and actual
values.
• This optimal line is characterized by the smallest error. Varied weight values or
coefficients (ao, a₁) produce distinct regression lines, necessitating the
determination of the optimal a, and a, values. To achieve this, we employ a cost
function.
Cost Function:
• Different weight values or coefficients (ag, a₁) result in distinct regression
lines, while the cost function serves the purpose of estimating the coefficients
for the optimal fit line.
• This function is instrumental in optimizing the regression coefficients or
weights and serves as a gauge of the performance of a linear regression model.
• Utilizing the cost function allows us to evaluate the accuracy of the mapping
function, often referred to as the Hypothesis function, which maps input
variables to output variables.

7.Elaborate multiple linear regression algorithm

Multiple Linear Regression, on the other hand, is a linear regression technique used
when there are two or more independent variables (predictors) that are used
collectively to predict the value of a numerical dependent variable. It extends the
concept of Simple Linear Regression to include multiple predictors. The relationship
between the dependent variable and multiple independent variables is modeled as a
linear equation of the form y= bo + b1x 1+ b2x2 +… + bnXn, where "y" is the
dependent variable, "X1," "X2," ..., "Xn" are the independent variables, and "bo,"
"b₁," "b₂," .... "bn" are the coefficients to be determined through the regression
analysis
key points.
• In MLR the dependent target variable (Y) is typically expected- to be continues
or real where the predictor or independent variable can take on continuous or
categorial forms.
• Each feature variable is expected to exhibit a linear relation. ship with
dependent variable
• MLR's goal to establish a regression line that fits through a multidimensional
space of data points considerly the various predictor variables to predict the
dependent variable accurately
8. Explain different techniques of data cleaning
9. Explain confusion matrix with repect to :- accuracy , precision ,f1- score
recall
10. Explain data cleaning with respect to missing and outliers
11. What is Logistic Regression
Logistic regression is the appropriate regression analysis to conduct when the
dependent variable is dichotomous (binary). Like all regression analyses, logistic
regression is a predictive analysis. Logistic regression is used to describe data and to
explain the relationship between one dependent binary variable and one or more
nominal, ordinal, interval or ratio-level independent variables.
It is used when our dependent variable is dichotomous or binary. It just means a
variable that has only 2 outputs, for example, A person will survive this accident or
not, The student will pass this exam or not. The outcome can either be yes or no (2
outputs). This regression technique is similar to linear regression and can be used to
predict the Probabilities for classification problems.

Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Ass Bigd
No ratings yet
Ass Bigd
9 pages
Classification of Machine Learning
No ratings yet
Classification of Machine Learning
73 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
13 pages
Artificial Intelligence - Machine Learning Fundamentals
No ratings yet
Artificial Intelligence - Machine Learning Fundamentals
31 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Social Media Analytics Techniques
No ratings yet
Social Media Analytics Techniques
77 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
ML Systems & Data Science Guide
No ratings yet
ML Systems & Data Science Guide
26 pages
None
No ratings yet
None
16 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
6th - SEM Machine Learning Notes PDF
100% (1)
6th - SEM Machine Learning Notes PDF
36 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
9 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
Lec-7 Intro Machine Learning
No ratings yet
Lec-7 Intro Machine Learning
87 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Machine Learning Colloquium Guide
No ratings yet
Machine Learning Colloquium Guide
12 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Learning Progress Review Week 10
No ratings yet
Learning Progress Review Week 10
35 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Machine Learning File
No ratings yet
Machine Learning File
7 pages
Module 3 - Introduction To ML
No ratings yet
Module 3 - Introduction To ML
45 pages
Aasignment
No ratings yet
Aasignment
7 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
ML Exam
No ratings yet
ML Exam
32 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Unit 5
No ratings yet
Unit 5
77 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
8 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Data Science Foundations
No ratings yet
Data Science Foundations
4 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
Machine Learning Basics & kNN Guide
No ratings yet
Machine Learning Basics & kNN Guide
94 pages
Ds Unit 2
No ratings yet
Ds Unit 2
36 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Introduction to Data Science & ML
No ratings yet
Introduction to Data Science & ML
23 pages
3 Pred Analysis
No ratings yet
3 Pred Analysis
18 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
89 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Unit 3
No ratings yet
Unit 3
33 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
Data Science Unit-4 B.sc. III Sem. MDC
No ratings yet
Data Science Unit-4 B.sc. III Sem. MDC
6 pages
Unit3 - Machine Learning With Big Data
No ratings yet
Unit3 - Machine Learning With Big Data
74 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
ML Notes
No ratings yet
ML Notes
52 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Fam QB Ans
No ratings yet
Fam QB Ans
9 pages
Roblox Graphics & Telemetry Settings
No ratings yet
Roblox Graphics & Telemetry Settings
2 pages
Confidence Interval Practice Exam
No ratings yet
Confidence Interval Practice Exam
12 pages
Statictical Tolerancing PDF
No ratings yet
Statictical Tolerancing PDF
117 pages
SPSS Software Overview & Statistical Mean
No ratings yet
SPSS Software Overview & Statistical Mean
8 pages
Statistics Problem Set
No ratings yet
Statistics Problem Set
2 pages
Data Collection and Conclusion
No ratings yet
Data Collection and Conclusion
4 pages
Introduction To Survey Sampling5 PDF
No ratings yet
Introduction To Survey Sampling5 PDF
9 pages
Control Chart A Statistical Process Cont
No ratings yet
Control Chart A Statistical Process Cont
10 pages
02 - ASDM Workbook Part 1
No ratings yet
02 - ASDM Workbook Part 1
71 pages
ANOVA and Hypothesis Testing Guide
No ratings yet
ANOVA and Hypothesis Testing Guide
2 pages
Statistics
No ratings yet
Statistics
41 pages
Types of Transformations For Better Normal Distribution - by Tamil Selvan S - Towards Data Science
No ratings yet
Types of Transformations For Better Normal Distribution - by Tamil Selvan S - Towards Data Science
6 pages
AIC vs BIC: Model Selection Guide
No ratings yet
AIC vs BIC: Model Selection Guide
5 pages
Z-Scores and Quantiles Explained
No ratings yet
Z-Scores and Quantiles Explained
7 pages
MMW Module 4.2 - Statistics - Measures of Variation, Normal Distribution & Simple Regression
No ratings yet
MMW Module 4.2 - Statistics - Measures of Variation, Normal Distribution & Simple Regression
9 pages
ECG Image Classification with ML
100% (1)
ECG Image Classification with ML
16 pages
Hypothesis Testing Essentials
100% (5)
Hypothesis Testing Essentials
34 pages
291-Article Text-1636-1-10-20210628
No ratings yet
291-Article Text-1636-1-10-20210628
12 pages
88-Article Text-138-2-10-20230103
No ratings yet
88-Article Text-138-2-10-20230103
20 pages
CH09 Wooldridge 7e PPT 2pp
No ratings yet
CH09 Wooldridge 7e PPT 2pp
20 pages
Essentials of Econometrics Guide
7% (27)
Essentials of Econometrics Guide
12 pages
Errors in Chemical Analyses
No ratings yet
Errors in Chemical Analyses
11 pages
Dote 2011 L1
No ratings yet
Dote 2011 L1
35 pages
Forecasting With Excel
No ratings yet
Forecasting With Excel
20 pages
Inferential Stats BA-1
No ratings yet
Inferential Stats BA-1
3 pages
AHP Sampling
0% (1)
AHP Sampling
2 pages
Statistics
No ratings yet
Statistics
11 pages
Exam. Code: 107202 Subject Code: 2045: Five Questions
No ratings yet
Exam. Code: 107202 Subject Code: 2045: Five Questions
3 pages
List of AMOS Fit Indices
No ratings yet
List of AMOS Fit Indices
6 pages
Hotelling T2 Control Chart Guide
No ratings yet
Hotelling T2 Control Chart Guide
8 pages

Fam Question Bank CT

Uploaded by

Fam Question Bank CT

Uploaded by

FAM QUESTION BANK CT-2

4.State any four important supervise ML algorithms

5.Define mean absolute error

7.Define terms MSE, RMSE

5. train data set

6. test data set

4.Explain any one unsupervised algorithm

5.Explain the need of data pre-processing

7.Elaborate multiple linear regression algorithm

You might also like