0% found this document useful (0 votes)

8 views11 pages

L-6 NLP

The document discusses enhancements in artificial intelligence, particularly in feature extraction and multi-scale detection for improved accuracy. It covers various aspects of Natural Language Processing (NLP), including definitions, applications, and techniques like the Bag of Words algorithm, stemming, and lemmatization. Additionally, it explains the importance of text normalization, automatic summarization, and TF-IDF calculations in processing and analyzing textual data.

Uploaded by

peehukalura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views11 pages

L-6 NLP

Uploaded by

peehukalura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Artificial Intelligence Book 10 Part B

Enhancing the model with better feature extraction and multi-scale detection can improve
accuracy. Continuous testing and fine-tuning would ensure reliable performance on the road.

Unit 6: Natural Language Processing

A. Short answer type questions.
1. Define NLP.
Ans: Natural Language Processing is a field of artificial intelligence that enables computers to
understand and interpret human (natural) language. NLP takes a verbal or written input, processes
it and analyses it, based on which appropriate action can be taken.

2. How do companies use NLP to get feedback from customers regarding their products and
services?
Ans: Companies use Natural Language Processing applications, such as sentiment analysis, to
identify the emotions in the text and to categorise opinion about their products and services as
‘good’, ‘bad’ or ‘neutral’. This process can be used to identify emotions in text even when it is not
clearly expressed and enables companies to understand what customers think about their brand
and image. It helps not only to understand what people like or dislike but understand what affects a
customer’s choice in deciding what to buy.

3. Name some popular virtual assistants that use NLP to help us in our daily lives.
Ans: Some popular virtual assistants are Google Assistant, Copilot and Siri.

4. List the common applications of script bots.

Ans: Script bots are used for simple functions like answering frequently asked questions, setting
appointments and on messaging apps to give predefined responses.

5. Give an example of sentences using a word with the same spelling but different meanings.
Ans: “The bat is hanging upside down on the tree”
“Anju bought a new bat for the cricket match finale”

6. Give the stem and lemma of the word 'studies'.

Ans: Stem: studi. Lemma: study.

7. What does the word "bag" in the "Bag of Words" algorithm symbolise?
Ans: The name “bag” symbolises that the algorithm is not concerned with where the words occur in
the corpus i.e. the sequence of tokens, but aims at getting unique words from the corpus and the
frequency of their occurrence.
Artificial Intelligence Book 10 Part B

8. List the steps involved in the "BoW" algorithm.

Ans:
Step 1: Text Normalisation - Collect data and pre-process it.
Step 2: Create Dictionary - Make a list of all the unique words occurring in the corpus. (Vocabulary).
Step 3: Create document vectors for each document - Find out how many times the unique words
from the document have occurred.
Step 4: Create document vectors for all the documents.

B. Long answer type questions.

1. How does the human brain process sound?
Ans: Our brain keeps processing the sounds that it hears and tries to make sense out of them.
Sound travels through air, enters the ear and reaches the eardrum through the ear canal. The
sound striking the eardrum is converted into a neuron impulse and gets transported to the brain.
This signal is then processed by the brain to derive its meaning and helps us give the required
response.

2. How does automatic summarization help us make sense of a large amount of textual data?
Ans: Automatic summarization helps us make sense of large amounts of textual data by condensing
key information into a shorter, coherent version while retaining the most important points. It
allows us to:
• Save Time – Instead of reading lengthy documents, reports, or articles, a summary provides a
quick understanding of the main ideas.
• Improve Comprehension – Summarization highlights essential concepts, making complex
information easier to grasp.
• Enhance Decision-Making – Professionals can make informed decisions based on concise
insights extracted from large datasets.
• Enable Efficient Searching – Summaries help users quickly determine whether a document is
relevant to their needs.
• Support Information Overload Management – With the vast amount of digital text available,
summarization tools help filter and prioritize important content.

3. What is meant by "perfect syntax, no meaning" in the context of a language? Illustrate with
an example?
Ans: Sometimes, a sentence can have a correct syntax but it does not mean anything. For example,
“Purple elephants dance gracefully on my ceiling.”
This statement is correct grammatically but does not make any sense.

4. How does text normalisation help in processing text?

Ans: Text normalization helps process text by standardising it, removing inconsistencies like case
variations, punctuation, and extra spaces. It ensures uniformity in spelling, numbers, and date
Artificial Intelligence Book 10 Part B

formats, making text easier for machines to analyse. By eliminating noise and handling variations, it
improves the accuracy of tasks like sentiment analysis and machine learning.

5. Describe the following steps involved in text normalisation.

a. Sentence Segmentation b. Tokenization
Ans:
a. Sentence Segmentation: In sentence segmentation, the entire corpus is divided into sentences.
Based on punctuation marks the entire corpus is split into sentences.

b. Tokenization: After segmenting the sentences, each sentence is further divided into tokens.
Tokenization is the process of separating a piece of text into smaller units called tokens. Token
is a term used for any word or number or special character occurring in a sentence. Under
tokenisation, every word, number and special character is considered as a separate unit or
token.

6. State the difference between stemming and lemmatization. Give examples to illustrate your
answer.
Ans: In Stemming, the words left in the corpus are reduced to their root words. Stemming is the
process in which the affixes of words are removed and the words are converted to their base form
or “stem”. Stemming does not take into account if the stemmed word is meaningful or not. It just
removes the affixes; hence it is faster. For example, the words – ‘programmer, programming and
programs’ are reduced to ‘program’ which is meaningful, but ‘universal’ and ‘beautiful’ are reduced
to ‘univers’ and ‘beauti’ respectively after removal of the affix and are not meaningful.
Lemmatisation too has a similar function, removal of affixes. But the difference is that in
lemmatization, the word we get after affix removal, known as lemma, is a meaningful one.
Lemmatization understands the context in which the word is used and makes sure that lemma is a
word with meaning. Hence it takes a longer time to execute than stemming. For example:
‘universal’ and ‘beautiful’ are reduced to ‘universe’ and ‘beauty’ respectively after removal of the
affix and are meaningful.

7. Explain how the BoW algorithm creates a document vector using an example.
Ans: Let us understand the steps involved in implementing a BoW by taking an example of three
documents with one sentence each.
Document 1: Hema is learning about AI
Document 2: Hema asked the smart robot KiBo about AI
Document 3: KiBo explained the basic concepts
Step 1: Text Normalisation - Collecting data and pre-processing it.
Document 1: [hema, is, learning, about. ai]
Document 2: [hema, asked, the, smart, robot, kibo, about, ai]
Document 3: [kibo, explained, the, basic, concepts]
Artificial Intelligence Book 10 Part B

No tokens have been removed in the stopwords removal step because we have very little data and
since the frequency of all the words is almost the same, no word can be said to have lesser value
than the other.
Step 2: Create Dictionary - Make a list of all the unique words occurring in the corpus. (Vocabulary)
Listing the unique words from all three documents:

hema is learning about ai asked the

smart robot kibo explained basic concepts
Step 3: Create document vector
In this step, a table with frequency of unique words in each document is created. The vocabulary
i.e. unique words are written in the top row of the table. For each document, in case the word
exists, the number of times the word occurs is written in the rows below. If the word does not
occur in that document, a 0 is put under it.
For example, for the first document:

hema is learning about ai asked the smart robot kibo explained basic concepts
1 1 1 1 1 0 0 0 0 0 0 0 0

Step 4: Create document vectors for all documents

hema is learning about ai asked the smart robot kibo explained basic concepts
1 1 1 1 1 0 0 0 0 0 0 0 0
1 0 0 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 1 0 0 1 1 1 1

In this table, the header row contains the vocabulary of the corpus and three rows below it
corresponds to the three different documents

8. “In text processing, we pay special attention to the frequency of words occurring in the text."
Elaborate.
Ans: In text processing we pay special attention to the frequency of words occurring in the text,
since it gives us valuable insights into the content of the document. Based on the frequency of
words that occur in the graph, we can see three categories of words. The words that have the
highest occurrence across all the documents of the corpus are considered to have negligible value.
These words, termed as stop words, do not add much meaning to the text and are usually removed
at the pre-processing stage. The words that have moderate occurrence in the corpus are called
frequent words. These words are valuable since they relate to subject or topic of the documents
and occur in sufficient number throughout the documents. The less common words are termed as
rare words. These words appear the least frequently but contribute greatly to the corpus’ meaning.
When processing text, we only take frequent and rare words into consideration.
Artificial Intelligence Book 10 Part B

9. Samiksha, a student of class X was exploring the Natural Language Processing domain. She got
stuck while performing the text normalisation. Help her to normalise the text on the segmented
sentences given below: [CBSE Exam]

Document 1: Akash and Ajay are best friends.

Document 2: Akash likes to play football but Ajay prefers to play online games.
Ans:
Normalization Steps Applied:
1. Lowercasing – All text is converted to lowercase to maintain consistency.
2. Removing Punctuation – Periods (.) are removed to ensure uniform tokenization.
3. Tokenization (if needed) – The text can be split into words for further processing.
4. Lemmatization/Stemming (if needed) – Since no word variations exist here, this step is
optional.

Normalised Text:
• Document 1: akash and ajay are best friends
• Document 2: akash likes to play football but ajay prefers to play online games

10. Through a step-by-step process, calculate TF-IDF for the given corpus: [CBSE Exam]

Document 1: Johny Johny Yes Papa, Document 2: Eating sugar? No Papa

Document 3: Telling lies? No Papa Document 4: Open your mouth, Ha! Ha! Ha!

Ans: Step 1: Tokenization (Removing Punctuation & Lowercasing)

We preprocess the text by removing punctuation and converting all words to lowercase.
Processed Documents:
1. Document 1: johny johny yes papa
2. Document 2: eating sugar no papa
3. Document 3: telling lies no papa
4. Document 4: open your mouth ha ha ha

Step 2: TF is calculated for each term in a document.

Term Doc 1 (TF) Doc 2 (TF) Doc 3 (TF) Doc 4 (TF)

johny 2/4 = 0.5 0 0 0

yes 1/4 = 0.25 0 0 0

papa 1/4 = 0.25 1/4 = 0.25 1/4 = 0.25 0

Artificial Intelligence Book 10 Part B

Term Doc 1 (TF) Doc 2 (TF) Doc 3 (TF) Doc 4 (TF)

eating 0 1/3 = 0.33 0 0

sugar 0 1/3 = 0.33 0 0

no 0 1/3 = 0.33 1/3 = 0.33 0

telling 0 0 1/3 = 0.33 0

lies 0 0 1/3 = 0.33 0

open 0 0 0 1/4 = 0.25

your 0 0 0 1/4 = 0.25

mouth 0 0 0 1/4 = 0.25

ha 0 0 0 3/4 = 0.75

Step 3: Compute Inverse Document Frequency (IDF) IDF is calculated as:

Where:
• N = 4 (Total number of documents)
• df(w) is the number of documents containing the term.

Term df(w) IDF(w) = log(4/df(w))

johny 1 log(4/1) = 1.39

yes 1 log(4/1) = 1.39

papa 3 log(4/3) = 0.29

eating 1 log(4/1) = 1.39

sugar 1 log(4/1) = 1.39

Artificial Intelligence Book 10 Part B

Term df(w) IDF(w) = log(4/df(w))

no 2 log(4/2) = 0.69

telling 1 log(4/1) = 1.39

lies 1 log(4/1) = 1.39

open 1 log(4/1) = 1.39

your 1 log(4/1) = 1.39

mouth 1 log(4/1) = 1.39

ha 1 log(4/1) = 1.39

Step 4: Compute TF-IDF. Multiplying the TF values by their corresponding IDF values:

Doc 1 (TF- Doc 2 (TF- Doc 3 (TF- Doc 4 (TF-

Term IDF)
IDF) IDF) IDF)

0.5 × 1.39 = 0
johny 0 0
0.70
0.25 × 1.39 = 0
yes 0 0
0.35
0.25 × 0.29 = 0.25 × 0.29 = 0.25 × 0.29 = 0
papa
0.07 0.07 0.07
0.33 × 1.39 = 0
eating 0 0
0.46
0.33 × 1.39 = 0
sugar 0 0
0.46
0.33 × 0.69 = 0.33 × 0.69 = 0
no 0
0.23 0.23
0.33 × 1.39 = 0
telling 0 0
0.46
0.33 × 1.39 = 0
lies 0 0
0.46
0.25 × 1.39 =
open 0 0 0 0.35
Artificial Intelligence Book 10 Part B

Doc 1 (TF- Doc 2 (TF- Doc 3 (TF- Doc 4 (TF-

Term IDF)
IDF) IDF) IDF)

0.25 × 1.39 =
your 0 0 0 0.35

0.25 × 1.39 =
mouth 0 0 0 0.35

0.75 × 1.39 =
ha 0 0 0 1.04

Conclusion
• "Johny" has the highest importance in Document 1.
• "Eating" and "sugar" are most important in Document 2.
• "Telling" and "lies" are most significant in Document 3.
• "Ha" has the highest TF-IDF in Document 4 because it appears three times.

11. With reference to data processing, expand the term ‘TF-IDF’. Also, give any two applications of
TF-IDF. [CBSE Exam]

Ans: TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a statistical measure
used in data processing and Natural Language Processing (NLP) to evaluate how important a word is
in a document relative to a collection (corpus).
Any two applications of TF-IDF are as follows:
Topic modelling: It helps in predicting the topic for a corpus.

Text summarization and keyword extraction: This can be used to help summarise articles more
efficiently or to even determine keywords for a document.

12. Create a document vector table from the following documents by implementing all the four steps
of Bag of words model. Also, depict the outcome of each. [CBSE Exam]

Document 1: Neha and Soniya are classmates.

Document 2: Neha likes dancing but Soniya loves to study mathematics.
Ans: Step 1: Text Normalization
Document 1: [neha, and, soniya, are, classmates]
Document 2: [neha, likes, dancing, but, soniya, loves, to, study, mathematics]

Step 2: Create Dictionary (Vocabulary) Unique words from all documents:

[neha, and, soniya, are, classmates, likes, dancing, but, loves, to, study, mathematics]
Artificial Intelligence Book 10 Part B

Step 3: Create Document Vector for Document 1

neha and soniya are classmates likes dancing but loves to study Mathematics
1 1 1 1 1 0 0 0 0 0 0 0

Step 4: Create Document Vector all documents: 1 and 2

neha and soniya are classmates likes dancing but loves to study Mathematics
1 1 1 1 1 0 0 0 0 0 0 0
1 0 1 0 0 1 1 1 1 1 1 1

13. What are stopwords? Why are they removed during text pre-processing?
Ans: Stopwords are common words that appear frequently in a language but do not carry
significant meaning in text analysis. Examples include "is," "the," "and," "in," "on," "a," "to,"
"with," etc. These words are generally not useful for tasks like text classification or sentiment
analysis.
Removal of stopwords during text pre-processing
• Reduces Noise in Text Data – Stopwords do not contribute meaningful information and can
clutter the analysis. Removing them helps focus on important words.
• Improves Computational Efficiency – Processing fewer words reduces memory and
computation time, making NLP models faster.
• Enhances Text Mining Accuracy – By eliminating redundant words, algorithms like TF-IDF and
Bag of Words (BoW) produce more meaningful results.

14. How does text classification help us get information easily and efficiently?
Ans: Text classification in NLP can be used to automatically classify or predict a category to which a
text belongs without human intervention. Text classification groups documents into predefined
categories based on the content and organises it in a way that you find easy to get the information
you need. For example, email services use text classification for spam filtering by identifying the
contents of each email automatically.

15. Define chatbots. What are its types?

Ans: A chatbot is one of the most popular NLP applications. Chatbots, sometimes known as 'Chat
Robots', are user-friendly agents that can converse with humans in natural language. while also
carrying out tasks like scheduling appointments, sending reminders, and responding to questions
on websites and messaging applications. Chatbots first identify the meaning of the question asked
by the user, collect all the information needed to respond to it, and then provide the proper
response. As you interact with chatbots, you realise that some of them are traditional chatbots or
scripted bots while others are AI-powered and have more capabilities. Based on this, chatbots are
broadly divided into two categories, namely script bots and smart bots.
Artificial Intelligence Book 10 Part B

16. What is the outcome provided by the Bag of Words (BoW) algorithm?
Ans: The Bag of Words (BoW) algorithm converts a collection of text documents into a numerical
representation by creating a document-term matrix (DTM).
Key Outcomes:
• Document-Term Matrix (DTM):
o Each row represents a document.
o Each column represents a unique word (feature).
o The values indicate the frequency of words in each document.
• Text Representation as Vectors:
o Each document is transformed into a vector of word counts, making it suitable for
machine learning and NLP tasks.
• Foundation for Further NLP Analysis:
o Used in text classification, clustering, sentiment analysis, and topic modelling by
providing structured data for algorithms.

Case -based Questions

1. Imagine you are developing an application to diagnose depression in people based on their social
media posts. Which application of NLP can you use to achieve this? Justify.
Ans: For diagnosing depression based on social media posts, Sentiment Analysis (also known as
Opinion Mining) is the key NLP application used. Sentiment Analysis helps in determining the
emotional tone of text by analysing words, phrases, and context. It can classify posts as positive,
negative, or neutral, and advanced models can detect emotions like sadness, hopelessness, or
anxiety—which are indicators of depression. By leveraging machine learning and deep learning, the
system can track patterns over time and provide insights into a person’s mental health. This
application is valuable for early detection, allowing timely intervention and support.

2. Think of a situation where you have been asked to create an application that summarises news
on climate change from various blogs by your company. Can NLP help you build this application?
If yes, which feature of NLP will enable you to accomplish this task? Explain.
Ans: Yes, NLP can help build an application that summarizes climate change news from various
blogs. The key NLP feature used for this task is Text Summarization.
Text Summarization helps in automatically generating concise summaries while retaining essential
information. It works in two ways: Extractive Summarization, which selects key sentences directly
from the text, and Abstractive Summarization, which generates a new summary using natural
language understanding. This feature enhances readability, saves time, and ensures users receive
key insights without reading long articles.
Artificial Intelligence Book 10 Part B

3. Consider that you are building a chatbot to answer FAQs (frequently asked questions) on a
messaging app for a company that provides mobile connectivity services. Which type of chatbot
will you use? What are the advantages that this chatbot will provide?
Ans: For answering FAQs about mobile connectivity services, a Rule-Based Chatbot (Retrieval-
Based Chatbot) is ideal. It provides predefined responses based on keyword detection or intent
matching, ensuring quick and accurate replies. This chatbot offers several advantages, including
instant responses, consistent information, 24/7 availability, cost-effectiveness, and scalability. By
automating customer support, it improves user experience while reducing the workload on human
agents.

4. You have been assigned a project where you have to categorise e-books according to their genre
and type, like fiction, non-fiction, autobiographies, etc. Name the feature of NLP that will help
you with your task. How does it work? [CBSE]
Ans: The NLP feature that helps in categorizing e-books by genre and type is Text Classification. It
works by analysing the content of books and assigning them to relevant categories such as fiction,
non-fiction, and autobiographies. The process begins with data preprocessing, where the text is
tokenized, stopwords are removed, and words are stemmed or lemmatized. Next, feature
extraction methods like Bag of Words (BoW), TF-IDF, or Word Embeddings convert the text into
numerical form. A machine learning model (such as Naïve Bayes or SVM) or a deep learning model
(like LSTMs or Transformers) is then trained using labelled e-book data. Finally, when a new book is
processed, the trained model predicts its genre based on textual patterns. This automation helps
in efficiently organising large e-book collections.

Unit 7: Advanced Python

A. Short answer type questions.
1. Write a short note on Anaconda distribution.
Ans: Anaconda distribution is a powerful and widely used open source distribution of Python
language for scientific computations, machine learning and data science tasks. It is an essential tool
for data scientists, researchers and developers as it includes essential pre-installed libraries. It
simplifies the process of managing software packages and dependencies.

2. How to execute commands in Jupyter notebook?

Ans: Once you have launched Jupyter Notebook within your virtual environment, you can execute
commands by creating and running Python code cells within a notebook.
• Create a New Notebook or Open an Existing One.
• Once you have a notebook open, you'll see an empty code cell where you can enter Python
code. Click on the cell to select it, and then type or paste your Python code into the cell.

Error Messages
No ratings yet
Error Messages
53 pages
XEV 9e Brochure
No ratings yet
XEV 9e Brochure
27 pages
800 Hotmail Valid by Megalodon
No ratings yet
800 Hotmail Valid by Megalodon
15 pages
5.hmt-B19162a-M02 - Piping Diagram of Ballast Water System - 1.0
No ratings yet
5.hmt-B19162a-M02 - Piping Diagram of Ballast Water System - 1.0
6 pages
NLP Unsolved Ans
No ratings yet
NLP Unsolved Ans
3 pages
Pressure 1
No ratings yet
Pressure 1
10 pages
Step by Step Guide Book On Home Wiring
100% (4)
Step by Step Guide Book On Home Wiring
50 pages
NLP Sem Questions and Answers
100% (1)
NLP Sem Questions and Answers
72 pages
Singh Surender - Biostatistics & Research Methodolgy
No ratings yet
Singh Surender - Biostatistics & Research Methodolgy
18 pages
Deploy DFS on Windows Server 2012 R2
No ratings yet
Deploy DFS on Windows Server 2012 R2
53 pages
Module 3 - DSV
No ratings yet
Module 3 - DSV
17 pages
NLP Class X AI
No ratings yet
NLP Class X AI
36 pages
Computer Programming Mid Exam 2011
No ratings yet
Computer Programming Mid Exam 2011
4 pages
Maintaining Training Facilities
No ratings yet
Maintaining Training Facilities
97 pages
Computer Science 2
No ratings yet
Computer Science 2
24 pages
NLP Worksheet for Students
No ratings yet
NLP Worksheet for Students
10 pages
c7 PDF
No ratings yet
c7 PDF
34 pages
Ch-6 Natural Language Processing Q&A's
No ratings yet
Ch-6 Natural Language Processing Q&A's
8 pages
Chapter 6 - NLP Question Answer
No ratings yet
Chapter 6 - NLP Question Answer
7 pages
UNIX Security Case Study Insights
No ratings yet
UNIX Security Case Study Insights
5 pages
Natural Language Processing Notes Class 10 AI
No ratings yet
Natural Language Processing Notes Class 10 AI
24 pages
Fuzzy Logic for Computing Students
No ratings yet
Fuzzy Logic for Computing Students
69 pages
Resume 1
No ratings yet
Resume 1
1 page
Genai Unit !
No ratings yet
Genai Unit !
71 pages
NLP Notes
No ratings yet
NLP Notes
16 pages
NLP Revision Notes and Applications
No ratings yet
NLP Revision Notes and Applications
4 pages
NLP Applications and Techniques
No ratings yet
NLP Applications and Techniques
7 pages
Warning!
No ratings yet
Warning!
6 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
NLP Unit-1 Merged
No ratings yet
NLP Unit-1 Merged
41 pages
AI in Fashion Market - Segmentation Detailed Study With Forecast - Facts and Trends
No ratings yet
AI in Fashion Market - Segmentation Detailed Study With Forecast - Facts and Trends
2 pages
Dip Computation Methods
No ratings yet
Dip Computation Methods
20 pages
Python - Roshni (3)
No ratings yet
Python - Roshni (3)
5 pages
NLP Unit 1 PDF
No ratings yet
NLP Unit 1 PDF
15 pages
Natural Language Processing Notes Class 10 AI
No ratings yet
Natural Language Processing Notes Class 10 AI
25 pages
NLP Q&A for Class X AI Course
No ratings yet
NLP Q&A for Class X AI Course
7 pages
NLP-Questions Class 10 Ai
No ratings yet
NLP-Questions Class 10 Ai
8 pages
2 Marks
No ratings yet
2 Marks
11 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
20 pages
NLP Notes
No ratings yet
NLP Notes
10 pages
Annex B - GK Style Guide For Entries
No ratings yet
Annex B - GK Style Guide For Entries
2 pages
Proc 471 Definition of Powers and Duties
No ratings yet
Proc 471 Definition of Powers and Duties
36 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
517-C-30070-Assignment - Chapter NLP
No ratings yet
517-C-30070-Assignment - Chapter NLP
9 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
Board QP Solution and Notes
No ratings yet
Board QP Solution and Notes
36 pages
5.5 Representing Data - Encryption
No ratings yet
5.5 Representing Data - Encryption
12 pages
NLP m2
No ratings yet
NLP m2
71 pages
NLP and Evaluation
No ratings yet
NLP and Evaluation
23 pages
Planning Pack 2016
No ratings yet
Planning Pack 2016
48 pages
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
No ratings yet
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
7 pages
Boq1 Replacing Ac at Central Pharmacy Fo
No ratings yet
Boq1 Replacing Ac at Central Pharmacy Fo
11 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
T20 and T24 SP and AP
No ratings yet
T20 and T24 SP and AP
2 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
NLP Q&A1a Text Processing
No ratings yet
NLP Q&A1a Text Processing
16 pages
CSE 101 - Introduction To Computers I: Topic
No ratings yet
CSE 101 - Introduction To Computers I: Topic
39 pages
NLP for Tech Enthusiasts
No ratings yet
NLP for Tech Enthusiasts
40 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
Computerized Enrollment System For Mary
No ratings yet
Computerized Enrollment System For Mary
30 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
PDF NLP
No ratings yet
PDF NLP
7 pages
NM Plus Hydrogen Generator: Carrier Grade
No ratings yet
NM Plus Hydrogen Generator: Carrier Grade
4 pages
MAT1023 Ruhuna
No ratings yet
MAT1023 Ruhuna
80 pages
NLP Qa
No ratings yet
NLP Qa
10 pages
NLP Ai X
No ratings yet
NLP Ai X
6 pages
Unit-6 Natural Language Processing
No ratings yet
Unit-6 Natural Language Processing
7 pages
Q ClassX AI Ch7
No ratings yet
Q ClassX AI Ch7
6 pages
Class 10 AI: NLP Question Bank
No ratings yet
Class 10 AI: NLP Question Bank
11 pages
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
No ratings yet
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
11 pages
Ai Notes
No ratings yet
Ai Notes
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
HT Test Reopts July CTPT 2020
No ratings yet
HT Test Reopts July CTPT 2020
6 pages
Unit 6 (NLP)
No ratings yet
Unit 6 (NLP)
8 pages
NLP and Evaluation - MCQ
No ratings yet
NLP and Evaluation - MCQ
10 pages
NLP Techniques and Applications
No ratings yet
NLP Techniques and Applications
17 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
LP V Oral Questions and Answers
No ratings yet
LP V Oral Questions and Answers
4 pages
Q - ClassX - AI - NATURAL LANGUAGE PROCESSING
No ratings yet
Q - ClassX - AI - NATURAL LANGUAGE PROCESSING
10 pages
Subjective Ai 417 2023
No ratings yet
Subjective Ai 417 2023
43 pages
Important 2 Marks
No ratings yet
Important 2 Marks
11 pages
NLP - CH-6
No ratings yet
NLP - CH-6
4 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Ch-6 NLP
No ratings yet
Ch-6 NLP
4 pages
NLP Notes
No ratings yet
NLP Notes
3 pages

L-6 NLP

Uploaded by

L-6 NLP

Uploaded by

Artificial Intelligence Book 10 Part B

Unit 6: Natural Language Processing

4. List the common applications of script bots.

6. Give the stem and lemma of the word 'studies'.

8. List the steps involved in the "BoW" algorithm.

B. Long answer type questions.

4. How does text normalisation help in processing text?

5. Describe the following steps involved in text normalisation.

hema is learning about ai asked the

Step 4: Create document vectors for all documents

Document 1: Akash and Ajay are best friends.

Document 1: Johny Johny Yes Papa, Document 2: Eating sugar? No Papa

Ans: Step 1: Tokenization (Removing Punctuation & Lowercasing)

Step 2: TF is calculated for each term in a document.

Term Doc 1 (TF) Doc 2 (TF) Doc 3 (TF) Doc 4 (TF)

johny 2/4 = 0.5 0 0 0

yes 1/4 = 0.25 0 0 0

papa 1/4 = 0.25 1/4 = 0.25 1/4 = 0.25 0

Term Doc 1 (TF) Doc 2 (TF) Doc 3 (TF) Doc 4 (TF)

eating 0 1/3 = 0.33 0 0

sugar 0 1/3 = 0.33 0 0

no 0 1/3 = 0.33 1/3 = 0.33 0

telling 0 0 1/3 = 0.33 0

lies 0 0 1/3 = 0.33 0

open 0 0 0 1/4 = 0.25

your 0 0 0 1/4 = 0.25

mouth 0 0 0 1/4 = 0.25

Step 3: Compute Inverse Document Frequency (IDF) IDF is calculated as:

Term df(w) IDF(w) = log(4/df(w))

johny 1 log(4/1) = 1.39

yes 1 log(4/1) = 1.39

papa 3 log(4/3) = 0.29

eating 1 log(4/1) = 1.39

sugar 1 log(4/1) = 1.39

Term df(w) IDF(w) = log(4/df(w))

telling 1 log(4/1) = 1.39

lies 1 log(4/1) = 1.39

open 1 log(4/1) = 1.39

your 1 log(4/1) = 1.39

mouth 1 log(4/1) = 1.39

Doc 1 (TF- Doc 2 (TF- Doc 3 (TF- Doc 4 (TF-

Doc 1 (TF- Doc 2 (TF- Doc 3 (TF- Doc 4 (TF-

Document 1: Neha and Soniya are classmates.

Step 2: Create Dictionary (Vocabulary) Unique words from all documents:

Step 3: Create Document Vector for Document 1

Step 4: Create Document Vector all documents: 1 and 2

15. Define chatbots. What are its types?

Case -based Questions

Unit 7: Advanced Python

2. How to execute commands in Jupyter notebook?

You might also like