Artificial Intelligence Book 10 Part B
Enhancing the model with better feature extraction and multi-scale detection can improve
accuracy. Continuous testing and fine-tuning would ensure reliable performance on the road.
Unit 6: Natural Language Processing
A. Short answer type questions.
1. Define NLP.
Ans: Natural Language Processing is a field of artificial intelligence that enables computers to
understand and interpret human (natural) language. NLP takes a verbal or written input, processes
it and analyses it, based on which appropriate action can be taken.
2. How do companies use NLP to get feedback from customers regarding their products and
services?
Ans: Companies use Natural Language Processing applications, such as sentiment analysis, to
identify the emotions in the text and to categorise opinion about their products and services as
‘good’, ‘bad’ or ‘neutral’. This process can be used to identify emotions in text even when it is not
clearly expressed and enables companies to understand what customers think about their brand
and image. It helps not only to understand what people like or dislike but understand what affects a
customer’s choice in deciding what to buy.
3. Name some popular virtual assistants that use NLP to help us in our daily lives.
Ans: Some popular virtual assistants are Google Assistant, Copilot and Siri.
4. List the common applications of script bots.
Ans: Script bots are used for simple functions like answering frequently asked questions, setting
appointments and on messaging apps to give predefined responses.
5. Give an example of sentences using a word with the same spelling but different meanings.
Ans: “The bat is hanging upside down on the tree”
“Anju bought a new bat for the cricket match finale”
6. Give the stem and lemma of the word 'studies'.
Ans: Stem: studi. Lemma: study.
7. What does the word "bag" in the "Bag of Words" algorithm symbolise?
Ans: The name “bag” symbolises that the algorithm is not concerned with where the words occur in
the corpus i.e. the sequence of tokens, but aims at getting unique words from the corpus and the
frequency of their occurrence.
Artificial Intelligence Book 10 Part B
8. List the steps involved in the "BoW" algorithm.
Ans:
Step 1: Text Normalisation - Collect data and pre-process it.
Step 2: Create Dictionary - Make a list of all the unique words occurring in the corpus. (Vocabulary).
Step 3: Create document vectors for each document - Find out how many times the unique words
from the document have occurred.
Step 4: Create document vectors for all the documents.
B. Long answer type questions.
1. How does the human brain process sound?
Ans: Our brain keeps processing the sounds that it hears and tries to make sense out of them.
Sound travels through air, enters the ear and reaches the eardrum through the ear canal. The
sound striking the eardrum is converted into a neuron impulse and gets transported to the brain.
This signal is then processed by the brain to derive its meaning and helps us give the required
response.
2. How does automatic summarization help us make sense of a large amount of textual data?
Ans: Automatic summarization helps us make sense of large amounts of textual data by condensing
key information into a shorter, coherent version while retaining the most important points. It
allows us to:
• Save Time – Instead of reading lengthy documents, reports, or articles, a summary provides a
quick understanding of the main ideas.
• Improve Comprehension – Summarization highlights essential concepts, making complex
information easier to grasp.
• Enhance Decision-Making – Professionals can make informed decisions based on concise
insights extracted from large datasets.
• Enable Efficient Searching – Summaries help users quickly determine whether a document is
relevant to their needs.
• Support Information Overload Management – With the vast amount of digital text available,
summarization tools help filter and prioritize important content.
3. What is meant by "perfect syntax, no meaning" in the context of a language? Illustrate with
an example?
Ans: Sometimes, a sentence can have a correct syntax but it does not mean anything. For example,
“Purple elephants dance gracefully on my ceiling.”
This statement is correct grammatically but does not make any sense.
4. How does text normalisation help in processing text?
Ans: Text normalization helps process text by standardising it, removing inconsistencies like case
variations, punctuation, and extra spaces. It ensures uniformity in spelling, numbers, and date
Artificial Intelligence Book 10 Part B
formats, making text easier for machines to analyse. By eliminating noise and handling variations, it
improves the accuracy of tasks like sentiment analysis and machine learning.
5. Describe the following steps involved in text normalisation.
a. Sentence Segmentation b. Tokenization
Ans:
a. Sentence Segmentation: In sentence segmentation, the entire corpus is divided into sentences.
Based on punctuation marks the entire corpus is split into sentences.
b. Tokenization: After segmenting the sentences, each sentence is further divided into tokens.
Tokenization is the process of separating a piece of text into smaller units called tokens. Token
is a term used for any word or number or special character occurring in a sentence. Under
tokenisation, every word, number and special character is considered as a separate unit or
token.
6. State the difference between stemming and lemmatization. Give examples to illustrate your
answer.
Ans: In Stemming, the words left in the corpus are reduced to their root words. Stemming is the
process in which the affixes of words are removed and the words are converted to their base form
or “stem”. Stemming does not take into account if the stemmed word is meaningful or not. It just
removes the affixes; hence it is faster. For example, the words – ‘programmer, programming and
programs’ are reduced to ‘program’ which is meaningful, but ‘universal’ and ‘beautiful’ are reduced
to ‘univers’ and ‘beauti’ respectively after removal of the affix and are not meaningful.
Lemmatisation too has a similar function, removal of affixes. But the difference is that in
lemmatization, the word we get after affix removal, known as lemma, is a meaningful one.
Lemmatization understands the context in which the word is used and makes sure that lemma is a
word with meaning. Hence it takes a longer time to execute than stemming. For example:
‘universal’ and ‘beautiful’ are reduced to ‘universe’ and ‘beauty’ respectively after removal of the
affix and are meaningful.
7. Explain how the BoW algorithm creates a document vector using an example.
Ans: Let us understand the steps involved in implementing a BoW by taking an example of three
documents with one sentence each.
Document 1: Hema is learning about AI
Document 2: Hema asked the smart robot KiBo about AI
Document 3: KiBo explained the basic concepts
Step 1: Text Normalisation - Collecting data and pre-processing it.
Document 1: [hema, is, learning, about. ai]
Document 2: [hema, asked, the, smart, robot, kibo, about, ai]
Document 3: [kibo, explained, the, basic, concepts]
Artificial Intelligence Book 10 Part B
No tokens have been removed in the stopwords removal step because we have very little data and
since the frequency of all the words is almost the same, no word can be said to have lesser value
than the other.
Step 2: Create Dictionary - Make a list of all the unique words occurring in the corpus. (Vocabulary)
Listing the unique words from all three documents:
hema is learning about ai asked the
smart robot kibo explained basic concepts
Step 3: Create document vector
In this step, a table with frequency of unique words in each document is created. The vocabulary
i.e. unique words are written in the top row of the table. For each document, in case the word
exists, the number of times the word occurs is written in the rows below. If the word does not
occur in that document, a 0 is put under it.
For example, for the first document:
hema is learning about ai asked the smart robot kibo explained basic concepts
1 1 1 1 1 0 0 0 0 0 0 0 0
Step 4: Create document vectors for all documents
hema is learning about ai asked the smart robot kibo explained basic concepts
1 1 1 1 1 0 0 0 0 0 0 0 0
1 0 0 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 1 0 0 1 1 1 1
In this table, the header row contains the vocabulary of the corpus and three rows below it
corresponds to the three different documents
8. “In text processing, we pay special attention to the frequency of words occurring in the text."
Elaborate.
Ans: In text processing we pay special attention to the frequency of words occurring in the text,
since it gives us valuable insights into the content of the document. Based on the frequency of
words that occur in the graph, we can see three categories of words. The words that have the
highest occurrence across all the documents of the corpus are considered to have negligible value.
These words, termed as stop words, do not add much meaning to the text and are usually removed
at the pre-processing stage. The words that have moderate occurrence in the corpus are called
frequent words. These words are valuable since they relate to subject or topic of the documents
and occur in sufficient number throughout the documents. The less common words are termed as
rare words. These words appear the least frequently but contribute greatly to the corpus’ meaning.
When processing text, we only take frequent and rare words into consideration.
Artificial Intelligence Book 10 Part B
9. Samiksha, a student of class X was exploring the Natural Language Processing domain. She got
stuck while performing the text normalisation. Help her to normalise the text on the segmented
sentences given below: [CBSE Exam]
Document 1: Akash and Ajay are best friends.
Document 2: Akash likes to play football but Ajay prefers to play online games.
Ans:
Normalization Steps Applied:
1. Lowercasing – All text is converted to lowercase to maintain consistency.
2. Removing Punctuation – Periods (.) are removed to ensure uniform tokenization.
3. Tokenization (if needed) – The text can be split into words for further processing.
4. Lemmatization/Stemming (if needed) – Since no word variations exist here, this step is
optional.
Normalised Text:
• Document 1: akash and ajay are best friends
• Document 2: akash likes to play football but ajay prefers to play online games
10. Through a step-by-step process, calculate TF-IDF for the given corpus: [CBSE Exam]
Document 1: Johny Johny Yes Papa, Document 2: Eating sugar? No Papa
Document 3: Telling lies? No Papa Document 4: Open your mouth, Ha! Ha! Ha!
Ans: Step 1: Tokenization (Removing Punctuation & Lowercasing)
We preprocess the text by removing punctuation and converting all words to lowercase.
Processed Documents:
1. Document 1: johny johny yes papa
2. Document 2: eating sugar no papa
3. Document 3: telling lies no papa
4. Document 4: open your mouth ha ha ha
Step 2: TF is calculated for each term in a document.
Term Doc 1 (TF) Doc 2 (TF) Doc 3 (TF) Doc 4 (TF)
johny 2/4 = 0.5 0 0 0
yes 1/4 = 0.25 0 0 0
papa 1/4 = 0.25 1/4 = 0.25 1/4 = 0.25 0
Artificial Intelligence Book 10 Part B
Term Doc 1 (TF) Doc 2 (TF) Doc 3 (TF) Doc 4 (TF)
eating 0 1/3 = 0.33 0 0
sugar 0 1/3 = 0.33 0 0
no 0 1/3 = 0.33 1/3 = 0.33 0
telling 0 0 1/3 = 0.33 0
lies 0 0 1/3 = 0.33 0
open 0 0 0 1/4 = 0.25
your 0 0 0 1/4 = 0.25
mouth 0 0 0 1/4 = 0.25
ha 0 0 0 3/4 = 0.75
Step 3: Compute Inverse Document Frequency (IDF) IDF is calculated as:
Where:
• N = 4 (Total number of documents)
• df(w) is the number of documents containing the term.
Term df(w) IDF(w) = log(4/df(w))
johny 1 log(4/1) = 1.39
yes 1 log(4/1) = 1.39
papa 3 log(4/3) = 0.29
eating 1 log(4/1) = 1.39
sugar 1 log(4/1) = 1.39
Artificial Intelligence Book 10 Part B
Term df(w) IDF(w) = log(4/df(w))
no 2 log(4/2) = 0.69
telling 1 log(4/1) = 1.39
lies 1 log(4/1) = 1.39
open 1 log(4/1) = 1.39
your 1 log(4/1) = 1.39
mouth 1 log(4/1) = 1.39
ha 1 log(4/1) = 1.39
Step 4: Compute TF-IDF. Multiplying the TF values by their corresponding IDF values:
Doc 1 (TF- Doc 2 (TF- Doc 3 (TF- Doc 4 (TF-
Term IDF)
IDF) IDF) IDF)
0.5 × 1.39 = 0
johny 0 0
0.70
0.25 × 1.39 = 0
yes 0 0
0.35
0.25 × 0.29 = 0.25 × 0.29 = 0.25 × 0.29 = 0
papa
0.07 0.07 0.07
0.33 × 1.39 = 0
eating 0 0
0.46
0.33 × 1.39 = 0
sugar 0 0
0.46
0.33 × 0.69 = 0.33 × 0.69 = 0
no 0
0.23 0.23
0.33 × 1.39 = 0
telling 0 0
0.46
0.33 × 1.39 = 0
lies 0 0
0.46
0.25 × 1.39 =
open 0 0 0 0.35
Artificial Intelligence Book 10 Part B
Doc 1 (TF- Doc 2 (TF- Doc 3 (TF- Doc 4 (TF-
Term IDF)
IDF) IDF) IDF)
0.25 × 1.39 =
your 0 0 0 0.35
0.25 × 1.39 =
mouth 0 0 0 0.35
0.75 × 1.39 =
ha 0 0 0 1.04
Conclusion
• "Johny" has the highest importance in Document 1.
• "Eating" and "sugar" are most important in Document 2.
• "Telling" and "lies" are most significant in Document 3.
• "Ha" has the highest TF-IDF in Document 4 because it appears three times.
11. With reference to data processing, expand the term ‘TF-IDF’. Also, give any two applications of
TF-IDF. [CBSE Exam]
Ans: TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a statistical measure
used in data processing and Natural Language Processing (NLP) to evaluate how important a word is
in a document relative to a collection (corpus).
Any two applications of TF-IDF are as follows:
Topic modelling: It helps in predicting the topic for a corpus.
Text summarization and keyword extraction: This can be used to help summarise articles more
efficiently or to even determine keywords for a document.
12. Create a document vector table from the following documents by implementing all the four steps
of Bag of words model. Also, depict the outcome of each. [CBSE Exam]
Document 1: Neha and Soniya are classmates.
Document 2: Neha likes dancing but Soniya loves to study mathematics.
Ans: Step 1: Text Normalization
Document 1: [neha, and, soniya, are, classmates]
Document 2: [neha, likes, dancing, but, soniya, loves, to, study, mathematics]
Step 2: Create Dictionary (Vocabulary) Unique words from all documents:
[neha, and, soniya, are, classmates, likes, dancing, but, loves, to, study, mathematics]
Artificial Intelligence Book 10 Part B
Step 3: Create Document Vector for Document 1
neha and soniya are classmates likes dancing but loves to study Mathematics
1 1 1 1 1 0 0 0 0 0 0 0
Step 4: Create Document Vector all documents: 1 and 2
neha and soniya are classmates likes dancing but loves to study Mathematics
1 1 1 1 1 0 0 0 0 0 0 0
1 0 1 0 0 1 1 1 1 1 1 1
13. What are stopwords? Why are they removed during text pre-processing?
Ans: Stopwords are common words that appear frequently in a language but do not carry
significant meaning in text analysis. Examples include "is," "the," "and," "in," "on," "a," "to,"
"with," etc. These words are generally not useful for tasks like text classification or sentiment
analysis.
Removal of stopwords during text pre-processing
• Reduces Noise in Text Data – Stopwords do not contribute meaningful information and can
clutter the analysis. Removing them helps focus on important words.
• Improves Computational Efficiency – Processing fewer words reduces memory and
computation time, making NLP models faster.
• Enhances Text Mining Accuracy – By eliminating redundant words, algorithms like TF-IDF and
Bag of Words (BoW) produce more meaningful results.
14. How does text classification help us get information easily and efficiently?
Ans: Text classification in NLP can be used to automatically classify or predict a category to which a
text belongs without human intervention. Text classification groups documents into predefined
categories based on the content and organises it in a way that you find easy to get the information
you need. For example, email services use text classification for spam filtering by identifying the
contents of each email automatically.
15. Define chatbots. What are its types?
Ans: A chatbot is one of the most popular NLP applications. Chatbots, sometimes known as 'Chat
Robots', are user-friendly agents that can converse with humans in natural language. while also
carrying out tasks like scheduling appointments, sending reminders, and responding to questions
on websites and messaging applications. Chatbots first identify the meaning of the question asked
by the user, collect all the information needed to respond to it, and then provide the proper
response. As you interact with chatbots, you realise that some of them are traditional chatbots or
scripted bots while others are AI-powered and have more capabilities. Based on this, chatbots are
broadly divided into two categories, namely script bots and smart bots.
Artificial Intelligence Book 10 Part B
16. What is the outcome provided by the Bag of Words (BoW) algorithm?
Ans: The Bag of Words (BoW) algorithm converts a collection of text documents into a numerical
representation by creating a document-term matrix (DTM).
Key Outcomes:
• Document-Term Matrix (DTM):
o Each row represents a document.
o Each column represents a unique word (feature).
o The values indicate the frequency of words in each document.
• Text Representation as Vectors:
o Each document is transformed into a vector of word counts, making it suitable for
machine learning and NLP tasks.
• Foundation for Further NLP Analysis:
o Used in text classification, clustering, sentiment analysis, and topic modelling by
providing structured data for algorithms.
Case -based Questions
1. Imagine you are developing an application to diagnose depression in people based on their social
media posts. Which application of NLP can you use to achieve this? Justify.
Ans: For diagnosing depression based on social media posts, Sentiment Analysis (also known as
Opinion Mining) is the key NLP application used. Sentiment Analysis helps in determining the
emotional tone of text by analysing words, phrases, and context. It can classify posts as positive,
negative, or neutral, and advanced models can detect emotions like sadness, hopelessness, or
anxiety—which are indicators of depression. By leveraging machine learning and deep learning, the
system can track patterns over time and provide insights into a person’s mental health. This
application is valuable for early detection, allowing timely intervention and support.
2. Think of a situation where you have been asked to create an application that summarises news
on climate change from various blogs by your company. Can NLP help you build this application?
If yes, which feature of NLP will enable you to accomplish this task? Explain.
Ans: Yes, NLP can help build an application that summarizes climate change news from various
blogs. The key NLP feature used for this task is Text Summarization.
Text Summarization helps in automatically generating concise summaries while retaining essential
information. It works in two ways: Extractive Summarization, which selects key sentences directly
from the text, and Abstractive Summarization, which generates a new summary using natural
language understanding. This feature enhances readability, saves time, and ensures users receive
key insights without reading long articles.
Artificial Intelligence Book 10 Part B
3. Consider that you are building a chatbot to answer FAQs (frequently asked questions) on a
messaging app for a company that provides mobile connectivity services. Which type of chatbot
will you use? What are the advantages that this chatbot will provide?
Ans: For answering FAQs about mobile connectivity services, a Rule-Based Chatbot (Retrieval-
Based Chatbot) is ideal. It provides predefined responses based on keyword detection or intent
matching, ensuring quick and accurate replies. This chatbot offers several advantages, including
instant responses, consistent information, 24/7 availability, cost-effectiveness, and scalability. By
automating customer support, it improves user experience while reducing the workload on human
agents.
4. You have been assigned a project where you have to categorise e-books according to their genre
and type, like fiction, non-fiction, autobiographies, etc. Name the feature of NLP that will help
you with your task. How does it work? [CBSE]
Ans: The NLP feature that helps in categorizing e-books by genre and type is Text Classification. It
works by analysing the content of books and assigning them to relevant categories such as fiction,
non-fiction, and autobiographies. The process begins with data preprocessing, where the text is
tokenized, stopwords are removed, and words are stemmed or lemmatized. Next, feature
extraction methods like Bag of Words (BoW), TF-IDF, or Word Embeddings convert the text into
numerical form. A machine learning model (such as Naïve Bayes or SVM) or a deep learning model
(like LSTMs or Transformers) is then trained using labelled e-book data. Finally, when a new book is
processed, the trained model predicts its genre based on textual patterns. This automation helps
in efficiently organising large e-book collections.
Unit 7: Advanced Python
A. Short answer type questions.
1. Write a short note on Anaconda distribution.
Ans: Anaconda distribution is a powerful and widely used open source distribution of Python
language for scientific computations, machine learning and data science tasks. It is an essential tool
for data scientists, researchers and developers as it includes essential pre-installed libraries. It
simplifies the process of managing software packages and dependencies.
2. How to execute commands in Jupyter notebook?
Ans: Once you have launched Jupyter Notebook within your virtual environment, you can execute
commands by creating and running Python code cells within a notebook.
• Create a New Notebook or Open an Existing One.
• Once you have a notebook open, you'll see an empty code cell where you can enter Python
code. Click on the cell to select it, and then type or paste your Python code into the cell.