0% found this document useful (0 votes)

5 views12 pages

Nasscomm Test

The document provides an overview of machine learning types, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with data preprocessing techniques and libraries such as NumPy, Pandas, and Scikit-learn. It also discusses descriptive and inferential statistics, evaluation metrics for classification and regression models, and core techniques in natural language processing (NLP). Key concepts include the importance of data preprocessing, the distinction between descriptive and inferential statistics, and various performance metrics for assessing machine learning models.

Uploaded by

22r11a66c9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views12 pages

Nasscomm Test

Uploaded by

22r11a66c9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

1.

Types of Machine Learning (ML)

Data= Training data + Testing data

1) Gathering Data, 2) Preparing the Data, 3) Choosing a Model, 4) Training the Model, 5)
Evaluating the Model, 6) Hyperparameter Tuning, and 7) Making Predictions
 Supervised Learning: The model is trained on labeled data.
 Classification: Where the output is a categorical variable (e.g., spam vs. non-spam emails, yes vs.
no).
 Regression: Where the output is a continuous variable (e.g., predicting house prices, stock prices).
Common algorithms include
Linear Regression (Regression)
Types of Linear Regression Linear regression can be further divided into two types of the algorithm:
Simple Linear Regression: If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression.
Multiple Linear regression: If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear
Regression.
y =m1x1+m2x2+...+mnxn +b
Logistic Regression (Classification)
Support Vector Machines (Both Classification & Regression)

Decision Trees (Both Classification & Regression)

Random Forest (Both Classification & Regression)
KNN (Both Classification & Regression)
It is used in applications like email spam detection and loan default prediction.
 Unsupervised Learning: The model identifies patterns and structures in unlabeled data.
Common techniques are
Clustering (K-means)
Association Rule Mining
Dimensionality reduction (PCA).
Applications include customer segmentation and anomaly detection.
 Semi-Supervised Learning: A hybrid approach that uses a small amount of labeled data and a large
amount of unlabeled data. It's useful in cases where labeling data is expensive or time-consuming, such as
medical imaging.
 Reinforcement Learning: In this type, agents learn by interacting with their environment and
receiving feedback in the form of rewards or penalties. Used in robotics, gaming, and autonomous
vehicles.
Preprocessing Libraries
Data preprocessing is the process of transforming raw data into a clean, usable format to improve the
performance and accuracy of machine learning models. Real-world data is often incomplete, inconsistent,
noisy, or unstructured, and preprocessing is a crucial step before model building.
Importance of Preprocessing:
1. Improves Model Accuracy: Clean data ensures that the model learns accurate patterns.
2. Handles Missing or Corrupted Data: Prevents model errors or biases.
3. Speeds Up Training: Reduces unnecessary computations.
4. Ensures Consistency: Brings data into a standard format.
5. Enables Better Feature Engineering: Easier to derive meaningful features from well-structured
data.
Some widely used preprocessing libraries in Python include:
1. NumPy (Numerical Python)
Purpose: NumPy is a fundamental library that provides support for large, multi-dimensional arrays and
mathematical operations.
Role in Preprocessing:
 Used at the initial stages when dealing with raw numerical data.
 Essential for mathematical computation, especially in scientific and statistical data.
 Acts as the base library for other tools like Pandas and Scikit-learn.
Typical Operations:
 Creating arrays and matrices
 Applying vectorized operations
 Handling missing values using masks
2. Pandas
Purpose: Pandas is a powerful tool for data manipulation and analysis, especially for structured
(tabular) data.
Role in Preprocessing:
 Used for loading, cleaning, and transforming data from external sources like CSV, Excel, or
databases.
 Offers tools to handle missing values, data types, column transformations, and feature
engineering.
Typical Operations:
 Dropping or imputing null values
 Merging, filtering, and grouping data
 Creating dummy variables for categorical data
Ex: (.fillna() , .dropna() ,.get_dummies() )
3. Scikit-learn (sklearn.preprocessing)
Purpose: Scikit-learn is a machine learning library that includes tools for data preprocessing, model
training, and evaluation.
Role in Preprocessing:
 Used after initial cleaning with Pandas to apply standard preprocessing techniques.
 Especially important for preparing features before feeding data to ML models.
Typical Operations:
 Standardization and Normalization (e.g., StandardScaler, MinMaxScaler)
 Encoding categorical variables (LabelEncoder, OneHotEncoder)
 Imputation of missing values (SimpleImputer)

🔹 4. NLTK / SpaCy (for Text Data)

Purpose: NLTK and SpaCy are libraries for Natural Language Processing (NLP).
Role in Preprocessing:
 Applied when working with unstructured text data such as documents, social media posts, or chat
messages.
 These libraries help transform raw text into numerical or symbolic formats understandable by ML
models.
Typical Operations:
 Tokenization, stopword removal, stemming, and lemmatization
 Named Entity Recognition (NER) and POS tagging
📌 SpaCy is generally preferred for production use due to its speed and modern architecture.

🔹 5. OpenCV (for Image Data)

Purpose: OpenCV is widely used in computer vision for image and video processing.
Role in Preprocessing:
 Used to clean and standardize image data before feeding it into models like Convolutional Neural
Networks (CNNs).
 Useful for handling noise, scaling, color correction, and geometric transformations.
Typical Operations:
 Resizing images
 Converting images to grayscale
 Applying filters (Gaussian blur, sharpening)
 Image thresholding and edge detection

🔁 Ideal Workflow Summary

Step Library Role in Preprocessing

1 NumPy Raw numerical operations

2 Pandas Data loading, cleaning, transformation

3 Scikit-learn Feature scaling, encoding, imputation

4 NLTK / SpaCy Text preprocessing (if applicable)

5 OpenCV Image preprocessing (if applicable)

✅ Conclusion

Using preprocessing libraries in the correct order ensures:

 Clean and well-structured data
 Accurate and efficient machine learning results
 Better model performance and interpretability
Choosing the right library at each stage—depending on whether the data is numerical, textual, or visual—
helps streamline the machine learning workflow and enhances the model's predictive capabilities.

Descriptive Statistics (describe the basic characteristics of data in a study. It doesn’t make predictions or
test hypotheses — instead, it provides simple summaries about the sample and measures.)
Descriptive statistics is a fundamental concept in data science that involves summarizing and describing
the important characteristics of a dataset. It helps in understanding the structure and distribution of data
before applying complex machine learning algorithms. Descriptive statistics is often the first step in
exploratory data analysis (EDA).
Types of Descriptive Statistics
1. Measures of Central Tendency
These measures represent the center point or typical value of a dataset.
 Mean: The average of all values. It is sensitive to outliers.
 Median: The middle value when data is sorted. It is robust to outliers.
 Mode: The most frequently occurring value(s) in the dataset.
2. Measures of Dispersion (Spread)
These measures show how spread out the data values are.
 Range: Difference between the maximum and minimum values.
 Variance: Measures the average squared deviation from the mean.
 Standard Deviation: Square root of variance, indicates how much values deviate from the
mean.
 Interquartile Range (IQR): Difference between the third quartile (Q3) and the first
quartile (Q1), used to detect outliers.
TABLE

Chart: pie, bar, histogram, boxplot

Conclusion
Descriptive statistics provide a snapshot of the dataset. Measures like mean, median, and
mode show where data is centered, while range, variance, and standard deviation reveal how
spread out it is. These metrics form the foundation for deeper statistical analysis, guiding
decisions and interpretations in research, business, healthcare, and more.

Inferential Statistics
Inferential statistics involves drawing conclusions or making predictions about a population
based on a sample of data. It goes beyond just summarizing the data (as in descriptive
statistics) — it allows for generalizations, estimations, and decision-making.

🔹 Difference Between Descriptive and Inferential Statistics

Feature Descriptive Statistics Inferential Statistics

Makes predictions or inferences

Purpose Summarizes data
about a population

Works with entire Works with a sample to infer

Scope
dataset about the population

Techniques Mean, median, mode, Hypothesis testing, confidence

Used SD, range intervals, regression, etc.

Charts, graphs, Probabilities, p-values,

Output
summary numbers confidence estimates

Key Concepts in Inferential Statistics

1. Population vs Sample
 Population: The complete set of all possible observations.
 Sample: A subset of the population, used to represent the whole.
2. Estimation Inferential statistics involves estimating population parameters (like mean or
proportion) using sample data.
 Point Estimation: A single value estimate (e.g., sample mean).
 Interval Estimation: A range of values (confidence interval) likely to contain the
population parameter.
3. Hypothesis Testing It is used to test assumptions about a population parameter.
 Null Hypothesis (H₀): A default assumption (e.g., no difference).
 Alternative Hypothesis (H₁): Opposes the null (e.g., there is a difference).
 p-value: Probability of obtaining results at least as extreme as observed, assuming H₀ is
true.
 Significance Level (α): Threshold (commonly 0.05) for rejecting the null hypothesis.
Common tests include:
 Z-test, T-test (for comparing means)
 Chi-Square Test (for categorical data)
 ANOVA (for comparing more than two means)
4. Confidence Intervals
A confidence interval gives a range of values within which the true population parameter is
likely to fall. For example, a 95% confidence interval implies that we are 95% confident
the parameter lies within the interval.
5. Regression Analysis Used to infer relationships between variables. Helps in prediction and
understanding variable impacts.
Applications in Data Science
 Drawing conclusions from data when full data is not available.
 Predicting trends and outcomes.
 Testing the effectiveness of models or changes (A/B testing).
 Supporting decision-making with statistically valid evidence.
Tools in Python
 SciPy: Functions for hypothesis tests like ttest_ind(), chisquare(), etc.
 Statsmodels: More advanced statistical modeling including hypothesis testing and
regression.
 Pandas & NumPy: Basic support for summary statistics and data preparation.

Conclusion
Inferential statistics allows data scientists to move from the known to the unknown. By
analyzing sample data, it helps in making reliable decisions and predictions about larger
populations. It forms the backbone of statistical analysis in machine learning, research, and
data-driven strategies.
Evaluation Metrics
Metrics are quantitative measures used to evaluate the performance of machine learning
models. The choice of metric depends on the type of problem (classification or regression)
and the specific goals of the task.
1. For Classification Problems
 Accuracy: Proportion of correctly predicted instances.

 Precision: Proportion of true positive predictions out of all positive predictions.

 Recall (Sensitivity): Proportion of actual positives that are correctly identified.

 F1-Score: Harmonic mean of precision and recall.

 ROC-AUC Score: Measures model's ability to distinguish between classes.

2. For Regression Problems
 Mean Absolute Error (MAE): Average of absolute differences between predicted and
actual values.

 Mean Squared Error (MSE): Average of squared differences.

 Root Mean Squared Error (RMSE): Square root of MSE, penalizes larger errors more.

 R² Score (Coefficient of Determination): Indicates how well predictions match actual

data.

Conclusion
Metrics provide objective ways to compare and select models. Choosing the right metric is
essential to evaluate model performance accurately and align with business or project
goals.

Significance of Classification Metrics

✅ Accuracy
 Significance: Gives a quick sense of how often the model is correct.
 When useful: Good for balanced datasets with equal class distribution.
 Limitation: Misleading with imbalanced data (e.g., predicting all zeros in a cancer dataset gives
high accuracy if most cases are negative).

✅ Precision
 Significance: Measures exactness—how many predicted positives are actually positive.
 When useful: When false positives are costly or dangerous, e.g. spam filters, fraud detection.
 High precision means less noise in positive predictions.

✅ Recall
 Significance: Measures completeness—how many actual positives were correctly predicted.
 When useful: In high-risk domains like disease diagnosis, where false negatives must be
minimized.
 High recall ensures safety nets for positive cases.

✅ F1 Score
 Significance: Balances precision and recall—especially when there is class imbalance.
 When useful: In scenarios where both false positives and false negatives matter (e.g., hiring
systems, recommendation engines).
 Ideal when we want a harmonious trade-off.

✅ ROC-AUC
 Significance: Shows model's ability to rank predictions correctly across different thresholds.
 When useful: Useful for comparing multiple classifiers regardless of thresholds.
 High AUC indicates a strong ability to differentiate classes.

🔹 Significance of Regression Metrics

✅ Mean Squared Error (MSE)
 Significance: Penalizes large errors more, helping to minimize big mistakes.
 When useful: When large errors are more problematic than small ones.
 Helps sensitive tuning of predictions.

✅ Root Mean Squared Error (RMSE)

 Significance: Easier to interpret as it’s in the same unit as the target variable.
 When useful: Helps in understanding average prediction error in real-world terms (e.g., "$10K
error in housing prices").

✅ R-Squared (R²)
 Significance: Explains how much of the variation in output is explained by the model.
 When useful: Helps in assessing fit quality.
 High R² indicates the model captures the underlying pattern well.

🎯 Overall Importance

Metric Helps Evaluate Ideal When...

Accuracy Overall correctness Classes are balanced

Relevance of positive False positives are

Precision
predictions costly

Completeness of False negatives are

Recall
detecting positives dangerous

Balance of relevance Both errors matter;

F1 Score
and completeness class imbalance

Comparing classifiers;
ROC-AUC Class ranking ability
imbalanced data

Average error Large errors must be

MSE/RMSE
magnitude penalized

Model's explanatory Understanding strength

R²
power of model fit

Natural Language Processing (NLP)

Natural Language Processing (NLP) is a field at the intersection of computer science,
artificial intelligence, and linguistics. It focuses on enabling machines to understand,
interpret, generate, and respond to human languages in a meaningful way.
NLP allows computers to process and analyze large volumes of natural language data,
making it possible to perform tasks such as translation, sentiment detection, speech
recognition, and more.

🔍 Core NLP Techniques

1. Tokenization
 Definition: The process of breaking text into smaller units called tokens — usually words or
sentences.
 Purpose: It's the first step in text processing, used for cleaning and structuring raw text.
 Example:
o Input: "I love NLP!"
o Output: ["I", "love", "NLP", "!"]

2. Stopword Removal
 Definition: Removing common words that do not add significant meaning to a sentence.
 Examples of Stopwords: "the", "is", "in", "and", "a", etc.
 Purpose: To reduce noise in text and focus on important terms.

3. Stemming
 Definition: Reducing words to their root or base form, often by chopping off prefixes or suffixes.
 Example:
o "Running", "runs", "ran" → "run"
 Tools: Porter Stemmer, Snowball Stemmer
 Note: Can sometimes result in non-words or grammatically incorrect forms.

4. Lemmatization
 Definition: Converts a word to its base dictionary form (lemma) while considering the context.
 Difference from Stemming: Lemmatization produces meaningful words.
 Example:
o "Better" → "Good" (based on context)
o "Running" → "Run"
 Tool: WordNet Lemmatizer

5. Word Embeddings
These techniques convert text into numerical vectors that represent semantic meaning.
a. TF-IDF (Term Frequency-Inverse Document Frequency)
 Definition: Reflects how important a word is in a document relative to a collection of documents.
 Formula:
o TF = (Frequency of the term in the document)
o IDF = log(Total documents / Documents with the term)
 Use: Common in text classification, information retrieval, and document similarity.
b. Word2Vec
 Definition: A neural network-based model that converts words into dense vectors where words
with similar meanings have similar vectors.
 Two models:
o CBOW (Continuous Bag of Words): Predicts a word from context
o Skip-Gram: Predicts context from a word
 Use: Helps in semantic understanding of words.

💡 Real-World Applications of NLP

1. Sentiment Analysis
 Goal: Identify the emotional tone behind a body of text (positive, negative, neutral).
 Example: Analyzing product reviews or tweets to gauge customer sentiment.
 Use Case: Businesses use it to track customer satisfaction and brand reputation.

2. Chatbots and Virtual Assistants

 Goal: Understand and respond to human input using NLP.
 Examples:
o Siri, Alexa, Google Assistant
o Customer service bots on websites
 Techniques Used:
o Intent detection
o Entity recognition
o Dialogue management

3. Machine Translation
 Automatically translating text from one language to another.
 Tools: Google Translate, DeepL

4. Speech Recognition
 Converts spoken language into text using NLP and voice-processing techniques.
 Use Case: Voice-activated assistants and transcription software.

5. Text Summarization
 Producing concise summaries of large documents while retaining important information.

🔚 Conclusion
NLP plays a vital role in making machines understand human language. Its techniques
like tokenization, stemming, lemmatization, and vectorization through word embeddings
enable numerous applications — from sentiment analysis to virtual assistants —
transforming how humans interact with technology.
Let me know if you want a visual diagram or flowchart of the NLP pipeline!

PRCV Unit-2
No ratings yet
PRCV Unit-2
24 pages
Data Science Bootcamp Insights
No ratings yet
Data Science Bootcamp Insights
161 pages
Google Professional Machine Learning Engineer Updated Dumps
100% (1)
Google Professional Machine Learning Engineer Updated Dumps
54 pages
Machine Learning Crash Course For BCA 5th Semester
No ratings yet
Machine Learning Crash Course For BCA 5th Semester
21 pages
FDP AIML Day1 Part1
No ratings yet
FDP AIML Day1 Part1
61 pages
Unit 7 Evaluation
No ratings yet
Unit 7 Evaluation
13 pages
BCS602 Model Question Paper Solved (Search Creators)
No ratings yet
BCS602 Model Question Paper Solved (Search Creators)
37 pages
Machine Learning With Data Science
No ratings yet
Machine Learning With Data Science
31 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Exam Preparation Notes
No ratings yet
Exam Preparation Notes
31 pages
Orange - AI417 - 10 - QP (P2)
No ratings yet
Orange - AI417 - 10 - QP (P2)
8 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
7 Data Preprocessing Steps in Machine Learning
No ratings yet
7 Data Preprocessing Steps in Machine Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Unit I 1
No ratings yet
Unit I 1
203 pages
Data Science
No ratings yet
Data Science
132 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Unit 3
No ratings yet
Unit 3
97 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
BCS 402 Lesson 5
No ratings yet
BCS 402 Lesson 5
16 pages
ML Lab File
No ratings yet
ML Lab File
33 pages
ML LAB Manual
No ratings yet
ML LAB Manual
24 pages
Machine Learning Introduction
100% (1)
Machine Learning Introduction
20 pages
ML Notes All
No ratings yet
ML Notes All
32 pages
Report Print
No ratings yet
Report Print
22 pages
ML Unit I Data Preprocessing &unit IV Cost Function and Unit V Pruning Topic
No ratings yet
ML Unit I Data Preprocessing &unit IV Cost Function and Unit V Pruning Topic
11 pages
Big Data Lecture # 08
No ratings yet
Big Data Lecture # 08
21 pages
ML 1
No ratings yet
ML 1
9 pages
Ds Unit 2
No ratings yet
Ds Unit 2
36 pages
ml2 250401 105339
No ratings yet
ml2 250401 105339
10 pages
EPS DL Handout1 Introduction Compressed
No ratings yet
EPS DL Handout1 Introduction Compressed
46 pages
10 Ai PREboard X
No ratings yet
10 Ai PREboard X
4 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
Practical Assignment ML
No ratings yet
Practical Assignment ML
50 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
Library
No ratings yet
Library
23 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
18 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
3 pages
Supervised ML with Flask & Docker
No ratings yet
Supervised ML with Flask & Docker
30 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
4 pages
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
No ratings yet
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
1 page
Data Pre Process I
No ratings yet
Data Pre Process I
6 pages
ML Unit 1 Question Bank
No ratings yet
ML Unit 1 Question Bank
5 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Data Preparation
No ratings yet
Data Preparation
19 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
Data Science & ML Essentials Guide
No ratings yet
Data Science & ML Essentials Guide
5 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
UNIT2
No ratings yet
UNIT2
20 pages
ML Training PDF
No ratings yet
ML Training PDF
6 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
30 pages
Unit 2
No ratings yet
Unit 2
18 pages
Explainable Ensemble Deep Learning-Based Model For Brain Tumor Detection and Classification
No ratings yet
Explainable Ensemble Deep Learning-Based Model For Brain Tumor Detection and Classification
18 pages
ML Data Preprocessing Guide
No ratings yet
ML Data Preprocessing Guide
5 pages
Class X - AI - 10 MQP 04 With Answers
No ratings yet
Class X - AI - 10 MQP 04 With Answers
8 pages
Aiml Notes
No ratings yet
Aiml Notes
12 pages
Early Detection of Mental Health Issues Using Soci
No ratings yet
Early Detection of Mental Health Issues Using Soci
9 pages
Stressor Classification of Filipino Political Tweets Using Lda, SVM, Xgboost, Logistic Regression
No ratings yet
Stressor Classification of Filipino Political Tweets Using Lda, SVM, Xgboost, Logistic Regression
11 pages
Q1. What Are: Python Standard Library Ans
No ratings yet
Q1. What Are: Python Standard Library Ans
6 pages
Improved Detection Network Model Based On YOLOv5 For Warning Safety in Construction Sites
No ratings yet
Improved Detection Network Model Based On YOLOv5 For Warning Safety in Construction Sites
12 pages
DA Caravan 6672064
No ratings yet
DA Caravan 6672064
26 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Extracting Aspects and Mining Opinions in Product Reviews Using Supervised Learning Algorithm
No ratings yet
Extracting Aspects and Mining Opinions in Product Reviews Using Supervised Learning Algorithm
5 pages
Bone Fracture Identification With Deep Learning Model Using Resnet50
No ratings yet
Bone Fracture Identification With Deep Learning Model Using Resnet50
14 pages
Heart Disease Prediction Project Documentation
No ratings yet
Heart Disease Prediction Project Documentation
22 pages
Impact Analysis of Adverbs For Sentiment Classification On
No ratings yet
Impact Analysis of Adverbs For Sentiment Classification On
15 pages
ML Lab File
No ratings yet
ML Lab File
48 pages
Evaluation JMLT Postprint-Color
No ratings yet
Evaluation JMLT Postprint-Color
28 pages
A Hybrid CNN + Lstm-Based Intrusion Detection System For Industrial Iot Networks
No ratings yet
A Hybrid CNN + Lstm-Based Intrusion Detection System For Industrial Iot Networks
14 pages
NLG Evaluation Methods Survey
No ratings yet
NLG Evaluation Methods Survey
75 pages
AI and Employability Skills Exam Blueprint
No ratings yet
AI and Employability Skills Exam Blueprint
21 pages
20.k1.0038 Proposal Project Report Kelar
No ratings yet
20.k1.0038 Proposal Project Report Kelar
29 pages
Class 10 Ai Sample Paper - 3
No ratings yet
Class 10 Ai Sample Paper - 3
4 pages
Most Expected Questions 2024 AI 417 Class 10
No ratings yet
Most Expected Questions 2024 AI 417 Class 10
109 pages
SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification
No ratings yet
SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification
9 pages
TSP CMC 34400
No ratings yet
TSP CMC 34400
16 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
A Semiautomated Deep Learning Approach For Pancreas Segmentation
No ratings yet
A Semiautomated Deep Learning Approach For Pancreas Segmentation
21 pages
Hybrid ML for Fetal Health Monitoring
No ratings yet
Hybrid ML for Fetal Health Monitoring
15 pages
Clrernet: Improving Confidence of Lane Detection With Laneiou
No ratings yet
Clrernet: Improving Confidence of Lane Detection With Laneiou
10 pages
Feature Detection & Matching
No ratings yet
Feature Detection & Matching
75 pages

Nasscomm Test

Uploaded by

Nasscomm Test

Uploaded by

1.

Types of Machine Learning (ML)

Data= Training data + Testing data

Decision Trees (Both Classification & Regression)

🔹 4. NLTK / SpaCy (for Text Data)

🔹 5. OpenCV (for Image Data)

🔁 Ideal Workflow Summary

Step Library Role in Preprocessing

1 NumPy Raw numerical operations

2 Pandas Data loading, cleaning, transformation

3 Scikit-learn Feature scaling, encoding, imputation

4 NLTK / SpaCy Text preprocessing (if applicable)

5 OpenCV Image preprocessing (if applicable)

Using preprocessing libraries in the correct order ensures:

Chart: pie, bar, histogram, boxplot

🔹 Difference Between Descriptive and Inferential Statistics

Feature Descriptive Statistics Inferential Statistics

Makes predictions or inferences

Works with entire Works with a sample to infer

Techniques Mean, median, mode, Hypothesis testing, confidence

Charts, graphs, Probabilities, p-values,

Key Concepts in Inferential Statistics

 Precision: Proportion of true positive predictions out of all positive predictions.

 Recall (Sensitivity): Proportion of actual positives that are correctly identified.

 F1-Score: Harmonic mean of precision and recall.

 ROC-AUC Score: Measures model's ability to distinguish between classes.

 Mean Squared Error (MSE): Average of squared differences.

 R² Score (Coefficient of Determination): Indicates how well predictions match actual

Significance of Classification Metrics

🔹 Significance of Regression Metrics

✅ Root Mean Squared Error (RMSE)

Metric Helps Evaluate Ideal When...

Accuracy Overall correctness Classes are balanced

Relevance of positive False positives are

Completeness of False negatives are

Balance of relevance Both errors matter;

Average error Large errors must be

Model's explanatory Understanding strength

Natural Language Processing (NLP)

🔍 Core NLP Techniques

💡 Real-World Applications of NLP

2. Chatbots and Virtual Assistants

You might also like