Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
56 views14 pages

Fam Question Bank CT

FAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views14 pages

Fam Question Bank CT

FAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

FAM QUESTION BANK CT-2

2 MARKS QUESTION
1. Define
i)data mining:
• It is the process of discovering pattern, relationship and useful information
from large dataset.
• It involves custering, classification, associated role mining and anomly
detection
• Data mining is a form of data analysis that focuses on finding valuable insights
within data.
ii)data analytics:
• It is the process of examing, cleaning, transforming and interpreting data to a
meaningful data.
• It combines statistical analysis, data mining, and visualization to inform
decision-making.
• Data analytics is the broader practice of working with data to answer questions
or make informed decisions.

2.Define
i)train data set:
• Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem
• We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns,
rules, and, features.
ii)test data set:
• After training the machine learning model with a specific dataset, the next step
is model testing.
• During this phase, the model's accuracy is assessed by evaluating its
performance on a separate test dataset.
• The test results provide a percentage accuracy measurement tailored to the
project or problem's criteria.
3.State different unsupervised algorithms
K-Means Clustering: Partitions data into K clusters based on feature similarity.
Hierarchical Clustering: Builds a tree-like structure of clusters (can be
divisive or agglomerative).
Principal Component Analysis (PCA): Reduces the dimensionality of data by
transforming it into a set of orthogonal components that capture the most
variance.
Independent Component Analysis (ICA): Separates a multivariate signal into
additive, independent components.
Apriori algorithm: Identifies frequent itemsets and generates association rules,
often used for market basket analysis.

4.State any four important supervise ML algorithms


1. Linear Regression: Used for regression tasks, it models the relationship
between a dependent variable and one or more independent variables by fitting
a linear equation.
2. Logistic Regression: Primarily used for binary classification, it models the
probability that a given instance belongs to a particular class.
3. Decision Trees: Tree-like structures used for classification and regression tasks.
They partition the dataset into smaller subsets based on features.
4. Random Forest: An ensemble method that combines multiple decision trees to
improve accuracy and reduce overfitting
5. Support Vector Machines (SVM): Used for both classification and regression.
SVM tries to find a hyperplane that best separates data points into different
classes.

5.Define mean absolute error


Mean Absolute Error (MAE) is a metric that calculates the average absolute
difference between predicted and actual values in a dataset. It measures the
accuracy of predictions by showing the average error magnitude, regardless of
direction, and is expressed as:
6.Define precision and recall

7.Define terms MSE, RMSE


MSE (Mean Square Error)
• It measures the amount of model in statistical models.
• It access the average square difference between the observed and predicted
values When a model has no error the MSE equals zero As model error
increases it's value increases. The mean square error is also known as the mean
Square derivation (MSD)
RMSE (Root mean square error)
• The root mean square error measure the average difference between a statistical
mode's predict value and the actual values
• Mathematically it is standard derivation of the residuals, residuals represent the
distance between the regression and the data points.
8. Define binary classification and multiclass classification
Binary classification is the simplest form of classification, where the target variable
has only two possible classes or outcomes. For instance, it can be used for tasks like
spam detection (spam or not spam), disease diagnosis (diseased or not diseased), or
customer churn prediction (churn or not churn).
Multiclass classification, also known as multinomial classification, deals with
problems where there are more than two classes or categories to predict. Examples
include image recognition with multiple object classes, text classification with
multiple topics, or sentiment analysis with multiple sentiment labels.

4 MARKS QUESTIONS
1. Explain any types of learning
Supervised learning.
• supervised learning is a machine learning technique that uses labelled dataset to
train algorithms to classify data or predict outcomes.
• The algorithms are trained on input data that has been labelled for a particular
output. The goal is to built an intelligent system that can learn from input
output training samples
• The training dataset is processed to build a function that maps new data an
expected output values the model con measure its accuracy and learn suer time.
unsupervised learning
• unsupervised learning is a technique that uses algorithm to Analyse unlabelled
dataset
• The algorithms discover hidden patterns or data grouping without the need for
human intention.
• The goal of unsupervised learning is to discover hidden and intersecting
patterns in unlabelled data.
• In unsupervised learning the model works on its own to discover patterns and
information the way previously undetected
2. Explain the machine learning life cycle
1.Gathering data
• It is a first step of ML life cycle to identify different data sources as data can be
collected from various sources such as file database, internet or mobile devices
• The quantity and quality of collected data will determine efficiency of output
• Step1: Identify various data sources
• Step 2 collect data
• Step 3: Integrate the data
2.Data preparation
• In data preparation we put data into a suitable place and prepare it to use for
machine learning.
• step 1: Data exploration.
• step 2 : Data processing

3. Data wrangling:
• It is the process of cleaning and converting raw data into usable format It
consist cleaning the data selecting variable's for use and transform the data into
proper format for analysis. cleaning of data includes quality issues.
4.Analyze Data
• clean data and prepare data is passed to analysis phase
• step 1: selecting of analytical technic
• step 2: Build the model
• Step 3. Review the result

5. train data set


• Now the next step is to train the model, in this step we train our model to
improve its performance for better outcome of the problem
• We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns,
rules, and, features

6. test data set


• After training the machine learning model with a specific dataset, the next step
is model testing.
• During this phase, the model's accuracy is assessed by evaluating its
performance on a separate test dataset.
• The test results provide a percentage accuracy measurement tailored to the
project or problem's criteria.

7.Deployment
In this phase we deploy model into real world application If the above prepared model
is producing accurate result as per our requirement with acceptable speed the model
deployed or used in real application
3.Describe different metrics for classification

4.Explain any one unsupervised algorithm


K-means clustering is an algorithm that partitions data into K clusters by assigning
each point to the nearest cluster center, then updating the centers iteratively to
minimize the distance between points and their cluster centroids.
The steps of the K-means clustering algorithm are:
1. Choose the number of clusters (K): Define how many clusters you want the data to
be grouped into.
2. Initialize centroids: Randomly select K points from the dataset as the initial cluster
centroids
3. Assign clusters: Assign each data point to the nearest centroid based on the
Euclidean distance (or other distance metrics).
4. Update centroids: Recalculate the centroids as the mean of all data points assigned
to each cluster.
5. Repeat: Iterate steps 3 and 4 until the centroids no longer change significantly
(convergence).
6. Final clusters: Once convergence is reached, the data points are grouped into their
final clusters.
Advantages of K-means clustering:
1. Simple and efficient: K-means is easy to implement and computationally efficient,
making it suitable for large datasets.
2. Scalable: It can handle large amounts of data and works well when clusters are
spherical and evenly sized.
Disadvantage of K-means clustering:
1. Sensitive to initial centroids: K-means can converge to a suboptimal solution if the
initial centroids are poorly chosen, and it may not perform well with non-spherical or
overlapping clusters.

5.Explain the need of data pre-processing


• After gaining insights through data exploration, the next step is data pre-
processing.
• In this phase, the data is cleaned, transformed, and prepared for analysis.
• Pre-processing tasks may include handling missing values, normalizing or
scaling features, encoding categorical variables, and splitting the dataset into
training and testing sets.
• Data pre-processing ensures that the data is in a suitable format for machine
learning algorithms. Effective data preparation is critical for the success of a
machine learning project, as it sets the foundation for model training
and evaluation.
6.Elaborate simple linear regression algorithm

Simple Linear Regression is a type of linear regression algorithm used when there is
only one independent variable (predictor) that is used to predict the value of a
numerical dependent variable. It models the relationship between the independent
variable and the dependent variable as a linear equation, typically represented as y =
mx + b, where "y" is the dependent variable, "x" is the independent variable, "m" is
the slope, and "b" is the intercept.
Finding the best fit line:
• In linear regression, our primary objective involves discovering the optimal fit
line, where the aim is to minimize the error between predicted and actual
values.
• This optimal line is characterized by the smallest error. Varied weight values or
coefficients (ao, a₁) produce distinct regression lines, necessitating the
determination of the optimal a, and a, values. To achieve this, we employ a cost
function.
Cost Function:
• Different weight values or coefficients (ag, a₁) result in distinct regression
lines, while the cost function serves the purpose of estimating the coefficients
for the optimal fit line.
• This function is instrumental in optimizing the regression coefficients or
weights and serves as a gauge of the performance of a linear regression model.
• Utilizing the cost function allows us to evaluate the accuracy of the mapping
function, often referred to as the Hypothesis function, which maps input
variables to output variables.

7.Elaborate multiple linear regression algorithm


Multiple Linear Regression, on the other hand, is a linear regression technique used
when there are two or more independent variables (predictors) that are used
collectively to predict the value of a numerical dependent variable. It extends the
concept of Simple Linear Regression to include multiple predictors. The relationship
between the dependent variable and multiple independent variables is modeled as a
linear equation of the form y= bo + b1x 1+ b2x2 +… + bnXn, where "y" is the
dependent variable, "X1," "X2," ..., "Xn" are the independent variables, and "bo,"
"b₁," "b₂," .... "bn" are the coefficients to be determined through the regression
analysis
key points.
• In MLR the dependent target variable (Y) is typically expected- to be continues
or real where the predictor or independent variable can take on continuous or
categorial forms.
• Each feature variable is expected to exhibit a linear relation. ship with
dependent variable
• MLR's goal to establish a regression line that fits through a multidimensional
space of data points considerly the various predictor variables to predict the
dependent variable accurately
8. Explain different techniques of data cleaning
9. Explain confusion matrix with repect to :- accuracy , precision ,f1- score
recall
10. Explain data cleaning with respect to missing and outliers
11. What is Logistic Regression
Logistic regression is the appropriate regression analysis to conduct when the
dependent variable is dichotomous (binary). Like all regression analyses, logistic
regression is a predictive analysis. Logistic regression is used to describe data and to
explain the relationship between one dependent binary variable and one or more
nominal, ordinal, interval or ratio-level independent variables.
It is used when our dependent variable is dichotomous or binary. It just means a
variable that has only 2 outputs, for example, A person will survive this accident or
not, The student will pass this exam or not. The outcome can either be yes or no (2
outputs). This regression technique is similar to linear regression and can be used to
predict the Probabilities for classification problems.

You might also like