Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views15 pages

InSem Question Paper Answer

Uploaded by

shrutikarpe2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

InSem Question Paper Answer

Uploaded by

shrutikarpe2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Q1. A) explain supervised and unsupervised machine learning with suitable example?

Supervised and Unsupervised Machine Learning


1. Supervised Learning
 Definition:
Supervised learning is a type of machine learning where the model is trained using a labeled
dataset. This means the input data is provided along with the correct output, and the algorithm
learns the mapping between them.
 Goal:
To predict outcomes for new, unseen data based on the patterns learned from labeled data.
 Examples:
o Classification: Identifying whether an email is spam or not spam.
o Regression: Predicting house prices based on features like size, location, and number of
rooms.
2. Unsupervised Learning
 Definition:
Unsupervised learning is a type of machine learning where the model is trained on an unlabeled
dataset. Here, the system tries to find hidden patterns, groupings, or structures in the data without
predefined output labels.
 Goal:
To discover the underlying structure of the data and group similar items together.
 Examples:
o Clustering: Customer segmentation in marketing (grouping customers based on purchasing
behavior).
o Dimensionality Reduction: Reducing high-dimensional data for visualization (e.g., PCA).
B) Explain support vector machines (SVM algorithm with suitable example
Support Vector Machine (SVM)

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However, primarily, it is used for Classification problems in
Machine Learning. The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane. SVM chooses the extreme
points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine. Consider the below diagram in which there are two
different categories that are classified using a decision boundary or hyperplane:
SVM Classification

SVM can be of two types:


Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes
by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.

Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be
classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-
linear SVM classifier.
Working of SVM:

1. SVM finds the best decision boundary (hyperplane) that separates data points of different classes.
2. It tries to maximize the margin between the classes.
3. The data points that are closest to the hyperplane and influence its position are called support vectors.
4. In case of non-linearly separable data, SVM uses kernel functions to convert data into higher dimensions
where it becomes linearly separable.
Important Terms:

 Hyperplane: A line or surface that separates the classes.


 Margin: Distance between the hyperplane and the nearest support vectors.
 Support Vectors: Data points that are closest to the hyperplane and help define it.
 Kernel Trick: A method to transform non-linear data into higher dimensions.
Types of SVM:

1. Linear SVM – Used when data is linearly separable.


2. Non-linear SVM – Used with kernel functions (like Polynomial, RBF) for non-linear data.
Advantages:

 Effective in high-dimensional spaces.


 Memory efficient (uses only support vectors).
 Works well when there is a clear margin of separation.
Disadvantages:

 Not suitable for large datasets.


 Doesn’t perform well with overlapping classes (noisy data).
 Choosing the right kernel and parameters is complex.
Applications:

 Email spam detection


 Image classification
 Handwriting recognition
 Face detection

C) Explain working principle of Logistic Regression.


Logistic regression is a probabilistic model that organizes the instances in terms of probabilities. Because the
classification is probabilistic, a natural method for optimizing the parameters is to ensure that the predicted
probability of the observed class for each training occurrence is as large as possible. This goal is achieved by using the
notion of maximum like hood estimation in order to learn the parameters of the model. The likelihood of the training
data is defined as the product of the probabilities of the observed labels of each training instance. Clearly, larger
values of this objective function are better. By using the negative logarithm of this value, one obtains a loss function
in minimization form. Therefore, the output node uses the negative log-likelihood as a loss function. This loss
function replaces the squared error used in the Widrow-Hoff method. The output layer can be formulated with the
sigmoid activation function, which is very common in neural network design.

 Logistic regression is another supervised learning algorithm which is used to solve the classification problems. In
classification problems, we have dependent variables in a binary or discrete format such as 0 or 1.
 Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam or
not spam, etc.

 It is a predictive analysis algorithm which works on the concept of probability.

 Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term how
they are used.

 Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function
is used to model the data in logistic regression. The function can be represented as:

Where,

f(x)= Output between the 0 and 1 value.

x= input to the function

e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows: It uses the concept of
threshold levels, values above the threshold level are rounded up to 1, and values below the threshold level are
rounded up to 0.

Working of Logistic Regression:

1. Input Features:
Start with input features x1,x2,...,xnx_1, x_2, ..., x_nx1,x2,...,xn just like linear regression.

The model calculates a weighted sum:

z=w0+w1x1+w2x2+...+wnxn

2. Apply Sigmoid Function:


Logistic regression uses the sigmoid function to map the output between 0 and 1:

The output is a probability value.


3. Threshold Function:
If probability ≥ 0.5 → Predict Class = 1

If probability < 0.5 → Predict Class = 0

4. Loss Function:
Uses Log Loss / Binary Cross Entropy to calculate error:

5. Optimization:
The model uses Gradient Descent to update weights and minimize the loss.

Q2. A) What is classification? Explain Decision Tree Algorithm with suitable example.

Classification and Decision Tree Algorithm

1. What is Classification?

 Classification is a supervised machine learning technique where the goal is to assign input data into
predefined categories or classes.

 Example: Classifying emails as Spam or Not Spam, predicting whether a patient has a disease or no disease.

2. Decision Tree Algorithm

 Definition:
A Decision Tree is a supervised learning algorithm used for classification and regression tasks.
It splits the dataset into branches using feature-based decisions, and each branch ends with a leaf node
representing the final output.

3. How it Works

1. The process starts from a Root Node (best feature).

2. At each node, the data is split based on a condition or test.

3. Branches represent the outcomes of the decision.

4. Finally, the tree ends at Leaf Nodes with a prediction or class label.

4. Key Terms

Term Description

Root Node First node that starts the decision tree

Internal Node A decision-making node

Leaf Node Final output or class label

Branch Outcome of a decision/test

Splitting Dividing data based on features

Pruning Removing unnecessary nodes to avoid overfitting

5. Feature Selection Methods

1. Information Gain – Used in ID3 algorithm


2. Gain Ratio – Used in C4.5 algorithm

3. Gini Index – Used in CART algorithm

6. Advantages

 Simple to understand and visualize.

 Handles both numerical and categorical data.

 No need for feature scaling (normalization/standardization).

7. Disadvantages

 Can overfit the data if the tree is too large.

 Unstable – small changes in data may lead to a different tree.

 Biased towards features with many categories.

8. Applications

 Email Spam Detection

 Medical Diagnosis

 Loan Approval (predicting eligibility)

 Customer Segmentation in marketing

9. Example (Diagram to Draw in Exam)

 Draw a simple tree like this:

[Weather?] ← Root Node

/ \

Sunny Rainy

/ \

[Humidity?] [Wind?]

/ \ / \

High Low Strong Weak

(Play=No) (Play=Yes) (Play=No) (Play=Yes)

B) Explain Naïve Bayes Classifier?

The Naive Bayes algorithm is a classification algorithm based on Bayes' theorem. The algorithm assumes that the
features are independent of each other, which is why it is called "naive." It calculates the probability of a sample
belonging to a particular class based on the probabilities of its features. For example, a phone may be considered as
smart if it has touch-screen, internet facility, good camera, etc. Even if all these features are dependent on each
other, but all these features independently contribute to the probability of that the phone is a smart phone.

In Bayesian classification, the main interest is to find the posterior probabilities i.e. the probability of a label given
some observed features, P(L | features). With the help of Bayes theorem, we can express this in quantitative form as
follows −
Bayes' Theorem

Where:

In the Naive Bayes algorithm, we use Bayes' theorem to calculate the probability of a sample belonging to a
particular class. We calculate the probability of each feature of the sample given the class and multiply them
to get the likelihood of the sample belonging to the class. We then multiply the likelihood with the prior
probability of the class to get the posterior probability of the sample belonging to the class. We repeat this
process for each class and choose the class with the highest probability as the class of the sample.

Working of Naive Bayes

1. Convert data into a frequency table or likelihood table.


2. Calculate Prior Probability for each class.
3. Calculate Likelihood for each feature given the class.
4. Apply the Naive Bayes formula to compute Posterior Probability for each class.
5. Pick the class with the highest posterior probability.
Types of Naive Bayes

Type Use Case Example Data Type

Gaussian Predicting continuous values (e.g. height, weight) Continuous

Multinomial Text classification (e.g. spam detection) Discrete (counts)

Bernoulli Sentiment analysis (0/1 features) Binary

C) Explain working principle of Linear Regression.

Linear Regression may be a supervised machine learning algorithm where the


anticipated output is continuous and features a constant slope. It’s to predict values within endless range, (e.g. sales,
price) instead of trying to classify them into categories (e.g. cat, dog). It’s used whenever we want to predict the
worth of a variable supported the worth of another variable.

The variable we would like to predict is named the variable (or sometimes, the result variable).

In linear regression, we measure the linear relationship between two or more than two variables. Based on this
relationship, we perform predictions that follow this linear pattern.

Equation of Linear Regression (Simple Linear):

Y=mX+cY = mX + cY=mX+c

Where:

 YYY = Predicted value (dependent variable)


 XXX = Input feature (independent variable)
 mmm = Slope of the line (coefficient)
 ccc = Intercept (constant)
Working of Linear Regression:
1. Input data: Collect input-output data pairs (X, Y).
2. Fit a line: Model tries to fit the best straight line through the data points.
3. Find m and c: Using a method like Least Squares, it finds slope (m) and intercept (c) to minimize error.
4. Predict: For a new value of X, calculate Y using the formula Y=mX+cY = mX + cY=mX+c.
Error Measurement:

 Cost Function (MSE):


MSE=1n∑(Yactual−Ypredicted)2MSE = \frac{1}{n} \sum (Y_{actual} - Y_{predicted})^2MSE=n1∑(Yactual−Ypredicted)2

 Goal: Minimize this error

Advantages:

 Easy to understand and implement


 Fast and efficient
 Good for linear data
Disadvantages:

 Works only for linearly related data


 Sensitive to outliers
 Can’t model complex relationships
Applications:

 Predicting house prices


 Sales forecasting
 Stock price prediction
 Weather forecasting

Q3. A) What Artificial Neural Network? Explain layer with diagram.

Artificial Neural Network (ANN)

1. Definition

An Artificial Neural Network (ANN) is a computational model inspired by the structure and functioning of the human
brain. It consists of layers of interconnected nodes (neurons) that can process and transmit information.

 ANNs are used for pattern recognition, classification, regression, image processing, speech recognition, and
many other machine learning tasks.

 They are also called Neural Networks or Neural Nets.

2. Structure of ANN
An ANN is made up of artificial neurons (units) arranged in different layers. The complexity of the network depends
on the dataset and the number of hidden layers.

A typical ANN consists of three types of layers:

(a) Input Layer

 The first layer of the network.

 Receives raw input data (features) from the external world.

 Each neuron in this layer represents one feature of the input dataset.

 Example: In digit recognition, pixels of an image are given as input neurons.

(b) Hidden Layer(s)

 The middle layers between input and output.

 Perform most of the computation in the network.

 Each neuron receives inputs from the previous layer, computes a weighted sum, applies an activation
function, and passes the output to the next layer.

 Activation functions (Sigmoid, Tanh, ReLU) introduce non-linearity, allowing the ANN to learn complex
patterns.

 There can be one hidden layer (shallow ANN) or multiple hidden layers (Deep Neural Network, DNN).

(c) Output Layer

 The final layer of the network.

 Produces the prediction or classification result based on the task.

 Example:

o For binary classification → single output neuron (Yes/No).

o For multi-class classification → multiple output neurons (each representing a class).

3. How ANN Learns

1. Input data passes through the network from input → hidden → output.

2. Each connection has a weight that decides the strength of the signal.

3. Neurons also have a bias term to improve flexibility.

4. During training:

o The output is compared with the target (actual answer).

o Error is calculated using a loss function.

o Backpropagation + Gradient Descent are used to adjust weights and reduce error.

4. Key Points

 Weights: Control the importance of input features.

 Bias: Helps shift the activation function.

 Activation Function: Introduces non-linearity (e.g., Sigmoid, Tanh, ReLU).

 Shallow Network: Only one hidden layer.


 Deep Network: Multiple hidden layers for hierarchical learning.

5. Diagram (for Exam)

Draw a simple diagram like this:

Input Layer → Hidden Layer(s) → Output Layer

(x1,x2,x3...) (neurons) (prediction)

Show arrows (connections with weights), hidden nodes, and activation functions.

B) What is need of activation function? Explain ReLn type of activation function.

Activation Function and ReLU

1. Need of Activation Function

 In an Artificial Neural Network (ANN), each neuron computes a weighted sum of inputs.

 Without an activation function, the output would be only a linear combination of inputs.

 This means the entire neural network would behave like a linear model, no matter how many layers it has.

 Real-world problems are non-linear (e.g., image recognition, speech processing).

 Activation functions introduce non-linearity into the network, allowing it to learn complex patterns and
relationships.

2. Types of Activation Functions

 Sigmoid

 Tanh

 ReLU (Rectified Linear Unit)

 Leaky ReLU

 Softmax

3. ReLU (Rectified Linear Unit) Activation Function

 Definition:
The ReLU activation function is one of the most widely used in deep learning.
It is defined as:

f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)

 How it works:

o If the input is positive, it outputs the same value.

o If the input is negative, it outputs 0.

4. Graph of ReLU

 On the x-axis: input values

 On the y-axis: output values

 For x < 0 → output = 0

 For x ≥ 0 → output = x
5. Advantages of ReLU

 Simple and computationally efficient.

 Helps networks train faster.

 Avoids the vanishing gradient problem (common in sigmoid/tanh).

6. Disadvantage

 Can suffer from the “dying ReLU problem” – some neurons may get stuck outputting only 0.

7. Example Usage

 In image classification networks like CNNs, ReLU is widely used in hidden layers to speed up training and
improve performance.

C) Write TensorFlow cod for subtraction of constants


import tensorflow as tf

# Define a constant tensor

x = tf.constant([10, 20, 30, 40], dtype=tf.int32)

# Subtract a constant (e.g., 5) from the tensor

y = tf.subtract(x, 5)

# Print the result

print("Input:", x.numpy())

print("Result after subtraction:", y.numpy())

Q4. A) Explain the architecture of perceptron with respect to biological neurons.

The Perceptron

1. Introduction and Historical Background

 The Perceptron is one of the earliest and most fundamental models of neural networks.

 It is a linear binary classifier, designed to separate data into two classes.

 Invented by Frank Rosenblatt in 1957 at the Cornell Aeronautical Laboratory.

 Inspired by earlier work of McCulloch and Pitts (1943), who introduced the Threshold Logic Unit (TLU) for
modeling logical functions like AND/OR.

 Early hype (e.g., New York Times articles) exaggerated its potential, claiming it could “walk, talk, and think.”
While unrealistic, this shows the excitement around early AI research.

 Initially designed as hardware (Mark I Perceptron), later implemented in software (IBM 704).

2. Definition and Functioning

The perceptron simulates a biological neuron:

 Inputs (x₁, x₂, …, xn) are multiplied by weights (w₁, w₂, …, wn).

 A bias (b) is added to shift the decision boundary.

 A weighted sum is computed:


z=∑(wi⋅xi)+bz = \sum (w_i \cdot x_i) + bz=∑(wi⋅xi)+b

 The result is passed through the Heaviside Step Function (activation):

o f(z) = 1 if z ≥ 0

o f(z) = 0 if z < 0

 Output is binary (0 or 1).

Main Steps of Perceptron:

1. Takes input.

2. Weights inputs and sums them.

3. Passes sum through activation function.

4. Produces output (0 or 1).

3. Single-Layer Perceptron (SLP)

 Consists of one input layer and one output layer.

 Can only solve linearly separable problems (e.g., AND, OR).

 Limitation: Cannot solve non-linear problems like XOR.

4. Multi-Layer Perceptron (MLP)

To overcome SLP’s limitations, Multi-Layer Perceptrons were introduced:

 Consist of:

o Input Layer – receives features.

o Hidden Layers – process data through nonlinear transformations.

o Output Layer – produces final prediction.

 Use non-linear activation functions (sigmoid, tanh, ReLU).

 Trained using Backpropagation Algorithm with Gradient Descent.

 Can solve non-linear problems and learn complex relationships.

 Basis of modern deep learning (image recognition, NLP, speech).

5. Biological Inspiration

 Biological Neuron:

o Dendrites = Inputs

o Synapses = Weights

o Soma = Summation + Activation

o Axon = Output

 Perceptron mimics this process in an artificial form.

6. Perceptron Learning Algorithm

The perceptron uses a supervised learning algorithm:

1. Initialize weights (small random values).


2. For each training example:

o Compute output.

o Compare with actual label.

o If correct → no change.

o If incorrect → update weights:

wnew=wold+η(ytrue−ypred)xw_{new} = w_{old} + \eta (y_{true} - y_{pred}) xwnew=wold+η(ytrue−ypred)x

where η = learning rate.

3. Repeat until all samples are classified correctly (only if data is linearly separable).

⚠️If data is not linearly separable, algorithm never converges (XOR problem).

7. Applications of Perceptron / MLP

 Pattern recognition

 Image and speech recognition

 Natural Language Processing

 Spam filtering

 Medical diagnosis

 Stock market prediction

8. Advantages and Limitations

Advantages:

 Simple and easy to implement.

 Works well for linearly separable problems.

 Foundation of modern neural networks.

Limitations:

 Cannot handle non-linear problems (e.g., XOR).

 Convergence not guaranteed for complex datasets.

 Learning is slow for large datasets.

Diagram (Exam Representation)

x1 ---- w1 \

x2 ---- w2 >--- Σ + b --> Activation --> Output (y)

x3 ---- w3 /

(Biological analogy: dendrites → soma → axon)

B) List types of deep learning framework? Explain pytorch in details?

Types of Deep Learning Frameworks

1. TensorFlow (by Google Brain)


2. PyTorch (by Facebook AI Research – FAIR)

3. Keras (high-level API, runs on TensorFlow)

4. Caffe (developed by Berkeley AI Research – image classification)

5. Theano (developed by Université de Montréal – first major DL library, now discontinued)

6. MXNet (Apache, scalable and used by Amazon)

7. CNTK (Microsoft Cognitive Toolkit – speech recognition)

8. Deeplearning4j (Java-based, enterprise use)

9. Chainer (flexible, define-by-run computation, inspiration for PyTorch)

10. JAX (Google, designed for high-performance ML and scientific computing)

PyTorch (Explained in Detail)

1. Introduction & History

 PyTorch is an open-source deep learning framework developed by Facebook AI Research (FAIR) in 2016.

 Inspired by Torch (Lua-based library), but implemented in Python for simplicity.

 Became popular because of its dynamic computation graph and ease of debugging compared to TensorFlow
(earlier versions).

 Widely used in research and production (e.g., OpenAI, Tesla Autopilot, Hugging Face Transformers).

2. Architecture of PyTorch

PyTorch provides:

 Tensors → Core data structure (similar to NumPy arrays, but with GPU support).

 Autograd Module → Automatic differentiation for backpropagation.

 NN Module → Building neural networks (layers, loss functions).

 Optim Module → Optimization algorithms (SGD, Adam, RMSProp).

 TorchVision, TorchText, TorchAudio → Pretrained models & datasets for vision, NLP, and audio tasks.

3. Key Features of PyTorch

 Dynamic Computational Graph (Define-by-Run):

o Graphs are created at runtime, making it easy to modify networks on the fly.

o More intuitive than TensorFlow’s (old) static graphs.

 GPU Acceleration:

o Built-in CUDA support allows computations on GPU for high performance.

 Pythonic Nature:

o Feels like writing regular Python code, integrates with NumPy, SciPy, Pandas.

 Autograd (Automatic Differentiation):

o Tracks operations on tensors and computes gradients automatically.

 TorchScript:
o Converts Python models into a deployable, optimized format for production.

 Strong Ecosystem:

o torchvision (images), torchaudio (audio), torchtext (NLP).

o Hugging Face Transformers integrates deeply with PyTorch.

4. Advantages of PyTorch

 Simple, flexible, and easy to debug.

 Dynamic graph helps in research prototyping.

 Rich ecosystem with pre-trained models.

 Strong GPU/TPU support.

 Large community support.

 Industry + Academia adoption (Google Colab, Kaggle, OpenAI, Tesla).

5. Applications of PyTorch

 Computer Vision: Image classification, object detection, GANs (e.g., StyleGAN, YOLO).

 Natural Language Processing (NLP): Machine translation, chatbots, transformers (BERT, GPT).

 Reinforcement Learning: Game AI, robotics.

 Medical Imaging: Cancer detection, brain MRI analysis.

 Autonomous Vehicles: Self-driving car perception modules.

Diagram: PyTorch Workflow

Data → Tensors → Model (nn.Module) → Forward Pass

Loss Function (criterion)

Optimizer (update weights)

C) Write a short note on keras framework?

Keras vs Tensorflow vs Pytorch

One of the key roles played by deep learning frameworks for the implementations of the machine learning models is
the constructing and deploying of the models. They are the components that empower the artificial intelligence
systems in terms of learning, the memory establishment and also implementation. In this article, we'll see three
prominent deep learning frameworks: TensorFlow, PyTorch and also Keras are founded by Google, Facebook, and also
Python respectively and they are quite widely used among the researchers and also the practitioners. In general,
frameworks like these are created very differently and are a lot stronger and weaker in different areas, making them
very powerful tools for many machine learning projects.

Keras

Keras is a high-level deep learning API meant to be very user-friendly and so that the code would also be very
interchangeable among the different systems. It was born within the group of the projects referred to as the
TensorFlow but can also work in the conjunction with the Microsoft Cognitive Toolkit (CNTK). It was built in
expectation to feedback and to iterate so quickly that as the new approaches to the solving deep learning problems
emerge. On the one hand, Keras is simple and easy-to-use for everyone including non-professionals, but it serves also
the experienced people providing the sufficient tools for them on the other hand. Its building blocks architecture is
much like Legos, it is not only customizable but also could be extended.

Key Features:

 Keras provides a usual interface which is utilized for the creating the neural networks.

Therefore, both beginners and also the experienced developers may conveniently start a

project.

 It is based on the principle of modularity, which makes adding and updating the features

much more simple and also convenient.

 Modular codes in Keras make it a very easy-to-use neural network library of TensorFlow

especially when they are used in the prototyping.

Pros:

 User friendliness, ergonomics, that even a non-experienced user can handle it.

 Assessment by playwright. For advanced users, they can build upon the premise created

by the playwright.

 Integration cross different states like TensorFlow and also other frameworks without any

difficulty.

Cons:

 They might not have the level of functionality found in TensorFlow and in PyTorch, as

the latter are much more advanced.

 The restrictedness of the upper frameworks compared to the lower ones.

You might also like